不要相信人工智能代理。

不要相信人工智能代理。
Don't trust AI agents

原始链接: https://nanoclaw.dev/blog/nanoclaw-security-model

## NanoClaw：一种安全至上的AI代理方法构建AI代理时，核心原则应该是**不信任**。传统的安全措施，如白名单，不足以应对，因为有决心的或被攻陷的代理可以绕过它们。NanoClaw建立在假设代理*会*发生错误行为并控制损害的架构之上。与依赖于应用层安全并通常直接在主机上运行的OpenClaw不同，NanoClaw利用**每个代理的容器化**。每个代理在其自身隔离的Docker或Apple容器内运行，拥有全新的、短暂的文件系统和有限的权限。这可以防止代理之间的数据泄露，并限制它们对显式挂载目录的访问。 NanoClaw通过优先考虑**简单性和可审计性**来进一步降低风险。其代码库有意保持较小（2-3千行），并大量利用现有的、维护良好的SDK，如Anthropic的Agent SDK。这与OpenClaw等复杂项目形成对比，后者难以全面审查且容易出现漏洞。最终，NanoClaw倡导“为不信任而设计”的理念——安全性不是关于*信任*代理的行为，而是围绕它构建强大的屏障，以限制潜在的危害，即使面对提示注入或代理幻觉。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录不要相信AI代理 (nanoclaw.dev) 6点由 gronky_ 19分钟前 | 隐藏 | 过去 | 收藏 | 讨论帮助指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

When you’re building with AI agents, they should be treated as untrusted and potentially malicious. The right approach isn’t better permission checks or smarter allowlists. It’s architecture that assumes agents will misbehave and contains the damage when they do.

That’s the principle I built NanoClaw on.

Don’t trust the process

OpenClaw runs directly on the host machine by default. It has an opt-in Docker sandbox mode, but it’s turned off out of the box, and most users never turn it on. Without it, security relies entirely on application-level checks: allowlists, confirmation prompts, a set of “safe” commands. These checks come from a place of implicit trust that the agent isn’t going to try to do something wrong. Once you adopt the mindset that an agent is potentially malicious, it’s obvious that application-level blocks aren’t enough. They don’t provide hermetic security. A determined or compromised agent can find ways around them.

In NanoClaw, container isolation is a core part of the architecture. Each agent runs in its own container, on Docker or an Apple Container on macOS. Containers are ephemeral, created fresh per invocation and destroyed afterward. The agent runs as an unprivileged user and can only see directories that have been explicitly mounted in. A container boundary is enforced by the OS.

Don’t trust other agents

Even when OpenClaw’s sandbox is enabled, all agents share the same container. You might have one agent as a personal assistant and another for work, in different WhatsApp groups or Telegram channels. They’re all in the same environment, which means information can leak between agents that are supposed to be accessing different data.

Agents shouldn’t trust each other any more than you trust them. In NanoClaw, each agent gets its own container, filesystem, and Claude session history. Your personal assistant can’t see your work agent’s data because they run in completely separate sandboxes.

What gets mounted is controlled by an external allowlist at ~/.config/nanoclaw/mount-allowlist.json, outside the project directory, so a compromised agent can’t modify its own permissions. Sensitive paths (.ssh, .gnupg, .aws, .env, private_key, credentials) are blocked by default. The host application code is mounted read-only, so nothing an agent does can persist after the container is destroyed.

People in your groups shouldn’t be trusted either. Non-main groups are untrusted by default. Other groups, and the people in them, can’t message other chats, schedule tasks for other groups, or view other groups’ data. Anyone in a group could send a prompt injection, and the security model accounts for that.

Don’t trust what you can’t read

OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies. This breaks the basic premise of open source security. Chromium has 35+ million lines, but you trust Google’s review processes. Most open source projects work the other way: they stay small enough that many eyes can actually review them. Nobody has reviewed OpenClaw’s 400,000 lines. It was written in weeks with no proper review process. Complexity is where vulnerabilities hide, and Microsoft’s analysis confirmed this: OpenClaw’s risks could emerge through normal API calls, because no one person could see the full picture.

Lines of code comparison: OpenClaw at ~400,000 lines vs NanoClaw at ~3,000 lines

NanoClaw is one process and a handful of files. We rely heavily on Anthropic’s Agent SDK, the wrapper around Claude Code, for session management, memory compaction, and a lot more, instead of reinventing the wheel. A competent developer can review the entire codebase in an afternoon. This is a deliberate constraint, not a limitation. Our contribution guidelines accept bug fixes, security fixes, and simplifications only.

New functionality comes through skills: instructions with a full working reference implementation that a coding agent merges into your codebase. You only add the integrations you need. Every installation ends up as 2,000 to 3,000 lines of code that fits the owner’s exact requirements, with no config bloat and no tangle of conditional logic making it impossible to audit. The core is actually getting smaller over time: WhatsApp support, for example, is being pulled out and packaged as a skill.

Design for distrust

If a hallucination or a misbehaving agent can cause a security issue, then the security model is broken. Security has to be enforced outside the agentic surface, not depend on the agent behaving correctly. Containers, mount restrictions, and filesystem isolation all exist so that even when an agent does something unexpected, the blast radius is contained.

None of this eliminates risk. An AI agent with access to your data is inherently a high-risk arrangement. But the right response is to make that trust as narrow and as verifiable as possible. Don’t trust the agent. Build walls around it.

You can read NanoClaw’s source code and full security model; they’re short enough to read in an afternoon.