When you’re building with AI agents, they should be treated as untrusted and potentially malicious. The right approach isn’t better permission checks or smarter allowlists. It’s architecture that assumes agents will misbehave and contains the damage when they do.
That’s the principle I built NanoClaw on.
Don’t trust the process
OpenClaw runs directly on the host machine by default. It has an opt-in Docker sandbox mode, but it’s turned off out of the box, and most users never turn it on. Without it, security relies entirely on application-level checks: allowlists, confirmation prompts, a set of “safe” commands. These checks come from a place of implicit trust that the agent isn’t going to try to do something wrong. Once you adopt the mindset that an agent is potentially malicious, it’s obvious that application-level blocks aren’t enough. They don’t provide hermetic security. A determined or compromised agent can find ways around them.
In NanoClaw, container isolation is a core part of the architecture. Each agent runs in its own container, on Docker or an Apple Container on macOS. Containers are ephemeral, created fresh per invocation and destroyed afterward. The agent runs as an unprivileged user and can only see directories that have been explicitly mounted in. A container boundary is enforced by the OS.
Don’t trust other agents
Even when OpenClaw’s sandbox is enabled, all agents share the same container. You might have one agent as a personal assistant and another for work, in different WhatsApp groups or Telegram channels. They’re all in the same environment, which means information can leak between agents that are supposed to be accessing different data.
Agents shouldn’t trust each other any more than you trust them. In NanoClaw, each agent gets its own container, filesystem, and Claude session history. Your personal assistant can’t see your work agent’s data because they run in completely separate sandboxes.
What gets mounted is controlled by an external allowlist at ~/.config/nanoclaw/mount-allowlist.json, outside the project directory, so a compromised agent can’t modify its own permissions. Sensitive paths (.ssh, .gnupg, .aws, .env, private_key, credentials) are blocked by default. The host application code is mounted read-only, so nothing an agent does can persist after the container is destroyed.
People in your groups shouldn’t be trusted either. Non-main groups are untrusted by default. Other groups, and the people in them, can’t message other chats, schedule tasks for other groups, or view other groups’ data. Anyone in a group could send a prompt injection, and the security model accounts for that.
Don’t trust what you can’t read
OpenClaw has nearly half a million lines of code, 53 config files, and over 70 dependencies. This breaks the basic premise of open source security. Chromium has 35+ million lines, but you trust Google’s review processes. Most open source projects work the other way: they stay small enough that many eyes can actually review them. Nobody has reviewed OpenClaw’s 400,000 lines. It was written in weeks with no proper review process. Complexity is where vulnerabilities hide, and Microsoft’s analysis confirmed this: OpenClaw’s risks could emerge through normal API calls, because no one person could see the full picture.

NanoClaw is one process and a handful of files. We rely heavily on Anthropic’s Agent SDK, the wrapper around Claude Code, for session management, memory compaction, and a lot more, instead of reinventing the wheel. A competent developer can review the entire codebase in an afternoon. This is a deliberate constraint, not a limitation. Our contribution guidelines accept bug fixes, security fixes, and simplifications only.
New functionality comes through skills: instructions with a full working reference implementation that a coding agent merges into your codebase. You only add the integrations you need. Every installation ends up as 2,000 to 3,000 lines of code that fits the owner’s exact requirements, with no config bloat and no tangle of conditional logic making it impossible to audit. The core is actually getting smaller over time: WhatsApp support, for example, is being pulled out and packaged as a skill.
Design for distrust
If a hallucination or a misbehaving agent can cause a security issue, then the security model is broken. Security has to be enforced outside the agentic surface, not depend on the agent behaving correctly. Containers, mount restrictions, and filesystem isolation all exist so that even when an agent does something unexpected, the blast radius is contained.
None of this eliminates risk. An AI agent with access to your data is inherently a high-risk arrangement. But the right response is to make that trust as narrow and as verifiable as possible. Don’t trust the agent. Build walls around it.
You can read NanoClaw’s source code and full security model; they’re short enough to read in an afternoon.