允许列出一些 Bash 命令,通常等同于允许列出所有命令。
Allowlisting some Bash commands is often the same as allowlisting all

原始链接: https://www.joinformal.com/blog/allowlisting-some-bash-commands-is-often-the-same-as-allowlisting-all-with-claude-code/

## 代理编码与安全风险:摘要 Formal Labs 利用像 Claude Code 这样的代理编码工具来提高软件开发速度,但认识到授予这些工具广泛权限会带来固有的安全风险。允许文件编辑和常见的开发命令(如 `go test`、`go build`、`eslint`、`make`、`pnpm run`)可能会无意中赋予代理在开发者机器上执行*任何*命令的能力。 这是因为许多开发者工具被设计为执行任意代码——单元测试可以运行脚本,构建过程利用代码生成,而代码检查器接受可执行配置。即使是看似安全的命令,如 `go build`,也带有允许执行其他程序的标志。文件监听器(例如 Next.js 开发中使用的)也存在另一个漏洞点。 虽然命令白名单*可能*会降低恶意命令执行的可能性,但这是一种复杂且不可靠的方法。核心问题是开发者工具的构建并未考虑到潜在的恶意代码提供者。 Formal Labs 提倡**沙箱化**作为更可靠的解决方案。在受限环境中运行代理工具可以限制其潜在影响,无论它们尝试执行什么命令。Cursor 和 Claude Code 等工具开始集成沙箱化功能,为安全的代理开发提供了一条有希望的途径。

## 有限访问的幻觉 这个Hacker News讨论的核心在于,试图通过允许列表指定命令(如`vim`或`sudo`)来授予有限的root权限是徒劳的。主要论点是,大多数命令,即使看似无害的命令,都可以被利用来获取完全的root权限——这一概念在[GTFOBins](https://gtfobins.org/)上有充分记录。 用户分享了经验,即出于安全或监管原因限制访问的尝试最终都失败了,因为即使是有限的命令也蕴含着内在的力量。访问控制列表(ACL)等解决方案被提及,但被认为过于复杂且经常被忽视。有人建议使用挂载命名空间和叠层文件系统进行沙箱化,或利用Bazel等构建系统进行测试隔离,但这些方法并非万无一失。 对话还涉及了AI代理需要访问系统带来的风险,以及安全地允许它们在线访问的挑战。一个反复出现的主题是,限制*如何*做某事通常不如控制*做什么*更有效,并且假设代码(甚至测试代码)本质上是安全的是一种危险的谬误。
相关文章

原文

Introduction

At Formal , we are heavy users of agentic coding tools for software development, and we’re trying to continue performing local development with these tools on our (admittedly beefy) laptops for as long as possible. One example is Claude Code. These tools feel particularly magical when verification loops are as fast as possible.

Having to explicitly approve every file edit Claude Code makes is time intensive — and so is having to explicitly approve every command Claude Code wants to run to get feedback on that code change before we review it ourselves. Some examples include running go build, go test, restarting a docker container, and running our linter!

Claude Code supports allowlisting Bash commands as well as file edits to your directory without requiring approvals, which can dramatically speed up development. What if, however, we do not want Claude Code to be able to run certain commands on our laptops? Enabling file edits and particular Bash commands often used in software development often enables Claude Code to run any command!

We use TypeScript and Go, so the examples in this post will be specific to those languages.

We’re defining “able” as “could Claude Code perform these actions,” irrespective of the probability that Claude would output text that would cause Claude Code to perform these actions.

go test

What’s the worst a unit test could do? Well, a unit test could execute arbitrary bash scripts.

If you allowlist running go test and editing files without approval,
Claude Code could run any other command without approval via the following flow:

  • Edit a test file to use exec.Command
  • Run go test

 

go generate

Okay, that makes sense — go test is effectively running arbitrary code. What about making sure our code builds? Well, a prerequisite for building is code generation.

We do have some go generate directives, however, and running go generate as part of your build pipeline allows arbitrary code execution if the coding agent can edit files that will be used by go generate.

 

Running go generate produces

go build*

Okay, so let’s not run go generate without manual review. What about making sure your edited code builds correctly? Well, Claude Code can run formal ls via go build too if Claude Code can specify arguments after go build! go help build shows that there is a -toolexec argument:

Running go build -toolexec ‘formal ls’ produces

It seems like our Formal Desktop fails to connect to the desktop agent when trying
to be run by go build! Good thing we won’t be supporting that kind of functionality
soon.

eslint

What about just running our eslint linter? Eslint supports JavaScript files as configs, so Claude Code could add an execSync in an eslint.config.js.

Sure enough, eslint tries to run formal ls on startup:

make or pnpm run

However, allowlisting any pnpm run command could enable Claude Code to run any command solely by editing the package.json’s scripts config, no understanding of custom eslint rules required! Allowlisting a make command would allow executing any command as well for similar reasons.

Claude Code May Be Able to Run Any Command

If using file watchers like next dev –turbopack or jest with watchman, Claude Code could still execute any command without Bash being allowlisted! We perform frontend development using next dev –turbopack, which spins up a Next.js server with automatic building and hot reloading when a file is edited. Adding the following code in any API route will have this command be executed at startup when the file is saved:



docker

What about rebuilding docker containers? Since Docker is a tool for executing code, being able to run docker commands enables Claude Code to run any command in a container.

To interact with the host, Claude Code could mount the host filesystem and run in privileged mode. In fact, the docker daemon by default runs with root, so being able to run docker commands may enable Claude Code to run commands as root as well against the host filesystem:



Hardcoded docker commands don’t fare much better: you can configure mounts and privileged settings by editing the Docker Compose file, and run privileged commands that interact with the host through theUSER and RUN instructions in Dockerfiles.

Allowlisting Bash Commands Is a Fraught Exercise

The combination of running developer tools against your codebase and editing your codebase often allows running any code. Development software is designed to execute developer-provided code, and malicious code provided by a malicious developer was not part of the threat model (if this kind of software had a threat model to begin with!). A lot of tools have some method of running arbitrary code as a configuration feature, not a bug.

The challenges of allowlisting only some commands but not others is not specific to Claude Code or Cursor: a lot of Unix binaries were not designed to isolate user privileges, and similarly our developer tools for executing code were not designed as if a malicious developer was able to run them.

In fact, even find supports a -exec argument that allows for arbitrary code execution.

This might be part of the reason why the Claude Code npm source has a “Glob” tool with the following prompt:

But Does Command Allowlisting Make Running Unwanted Commands Less Likely?

Sure, Claude Code could perform all of these convoluted code edits or command arguments, but would Claude Code be less likely to emit these kinds of commands and changes than a more conventional Bash command?

Intuitively, we expect this to be true: we would expect that Claude Code would emit a Bash:(curl) tool call more often than a Bash:(go build -toolexec ‘curl’) call. We have not found a great way, however, to precisely quantify that reduction likelihood.

Should we worry more about allowlisting make and pnpm run than go build with file edits?

In addition, active attempts at prompt injection to the inputs we are providing to Claude Code may significantly change our likelihood estimates. Still, viewing Claude Code as unhindered at the model and prompt level from emitting any kind of command is a simplifying assumption: the model providers are working on model alignment.

Still, the definition of implicit “wanted” and “unwanted” commands from a prompt is remarkably squishy. In our experience, we have seen Claude Code attempt to run psql and AWS CLI commands. At first blush, this may seem alarming — but it depends on what resource these
commands are run against! There are likely many Claude Code users who want it to run psql and AWS CLI.

In addition, we have a containerized test Postgres database in our compose stack, and running psql commands on that database is an expected part of our test and development workflow. Determining the risk profile of the same psql or AWS CLI command based on the resource we are interacting with can be tricky for agentic coding tools (and for humans too)!

An Alternative Form of Permissions Restriction: Sandboxing!

Running these tools on a different host means that these agentic tools are limited by the permissions of the host irrespective of what commands they run. We do still want to run these agentic tools on a privileged host, so we’re thrilled to see that Cursor, Claude Code, and Codex have all been releasing sandboxing tools! For OS X users, a lot of these sandboxes are using sandbox-exec under the hood. This is the same technique Chromium uses despite macOS considering it deprecated since 2017.

In addition, we recommend sandboxing those watchman processes as well.

联系我们 contact @ memedata.com