展示HN:yolo-cage – 不会泄露机密的AI编码代理
Show HN: yolo-cage – AI coding agents that can't exfiltrate secrets

原始链接: https://github.com/borenstein/yolo-cage

## YOLO-Cage:安全的AI代码探索 YOLO-Cage 是一种旨在降低使用像 Claude Code 这样强大的 AI 编码助手风险的工具。它认识到,即使使用权限提示,用户疲劳也可能导致安全漏洞。YOLO-Cage 不仅仅依赖持续的用户监督,而是为 AI 驱动的开发创建隔离的沙箱——每个分支一个。 该系统通过强制分支隔离、阻止危险的 Git/GitHub 命令(如合并拉取请求或删除仓库)以及过滤所有外发网络流量以查找密钥和对已知数据泄露站点的访问,来限制 AI 的“爆炸半径”。它利用 Vagrant VM 和 microK8s 设置来隔离 AI,并通过出口代理监控流量。 关键命令包括 `create`(沙箱设置)、`attach`(通过 tmux 访问)和 `port-forward`(访问 Web 应用程序)。虽然 YOLO-Cage 显著降低了风险,但并非万无一失;像 DNS 泄露这样的复杂攻击仍然可能发生。建议用户使用限定范围的凭据并审查安全审计以全面了解。 [https://github.com/borenstein/yolo-cage](https://github.com/borenstein/yolo-cage)

## YOLO-Cage:沙盒化AI编码代理 一位开发者“borenstein”创建了**yolo-cage**,一个用于运行AI编码代理(如Claude)而无需冒安全漏洞风险的沙盒环境。由于在使用多个代理进行金融分析工具时,不断出现权限提示而感到疲惫,目标是启用一种“YOLO”(你只活一次)模式,并限制潜在损害范围。 值得注意的是,整个隔离系统完全由AI编写。然而,该项目引发了争论。人们对依赖AI构建自身安全以及过滤特定命令的有效性表示担忧。Borenstein澄清说,他的角色侧重于*工程流程*,而AI负责处理机械编码,将其比作建筑师与绘图员的关系。 该项目利用Vagrant虚拟机提供比Docker容器更强的隔离性,解决了内核级安全问题。构建yolo-cage的动机源于对现有OEM解决方案的不满以及对控制权的渴望,承认这可能是一种临时解决方案,直到大型公司提供更好的集成解决方案。
相关文章

原文

You're a responsible engineer. You'd never just let an AI run roughshod through your most sensitive systems and codebases.

That's why you'd never just shut off the safeguards for a tool like Claude Code. It asks permission for every dangerous action! Safe!

So you wait. And you answer. Decision fatigue sets in. And that's when it happens.

Agent deletes entire repo

Permission prompts neglect the weakest part of the thread model: a tired user. What if we could empower the agent while limiting its blast radius, thus deferring your decisions until PR review?

That would be great! And that would be yolo-cage.

Escape attempts blocked

curl -fsSL https://github.com/borenstein/yolo-cage/releases/latest/download/yolo-cage -o yolo-cage
chmod +x yolo-cage && sudo mv yolo-cage /usr/local/bin/
yolo-cage build --interactive --up

Then create a sandbox and start coding:

yolo-cage create feature-branch
yolo-cage attach feature-branch   # Claude in tmux, YOLO mode

Prerequisites: Vagrant with libvirt (Linux) or QEMU (macOS, experimental), 8GB RAM, 4 CPUs, GitHub PAT (repo scope), Claude account. See setup docs for details.


Secrets in HTTP/HTTPS - egress proxy scans request bodies, headers, URLs:

  • sk-ant-*, AKIA*, ghp_*, SSH private keys, generic credential patterns

Git operations - dispatcher enforces branch isolation:

  • Push to any branch except the one assigned at sandbox creation
  • git remote, git clone, git config, git credential

GitHub CLI - dispatcher blocks dangerous commands:

  • gh pr merge, gh repo delete, gh api

GitHub API - proxy blocks at HTTP layer:

  • PUT /repos/*/pulls/*/merge, DELETE /repos/*, webhook modifications

Exfiltration sites: pastebin.com, file.io, transfer.sh, etc.

See Architecture for the full threat model.


┌──────────────────────────────────────────────────────────────────────────┐
│ Vagrant VM (MicroK8s)                                                    │
│                                                                          │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │ Sandbox Pod                                                        │  │
│  │                                                                    │  │
│  │  Claude Code (YOLO mode)                                           │  │
│  │       │                                                            │  │
│  │       ├── git/gh ──▶ Dispatcher ──▶ GitHub                         │  │
│  │       │              • Branch enforcement                          │  │
│  │       │              • TruffleHog pre-push                         │  │
│  │       │                                                            │  │
│  │       └── HTTP/S ──▶ Egress Proxy ──▶ Internet                     │  │
│  │                      • Secret scanning                             │  │
│  │                      • Domain blocklist                            │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

One sandbox per branch. Agents can only push to their assigned branch. All outbound traffic is filtered.


Command Description
create <branch> Create sandbox
attach <branch> Attach (Claude in tmux)
shell <branch> Attach (bash)
list List sandboxes
delete <branch> Delete sandbox
port-forward <branch> <port> Forward port from sandbox
up / down Start/stop VM
upgrade [--rebuild] Upgrade to latest version
version Show version

Access web apps running inside a sandbox:

yolo-cage port-forward feature-x 8080           # localhost:8080 → pod:8080
yolo-cage port-forward feature-x 9000:3000      # localhost:9000 → pod:3000
yolo-cage port-forward feature-x 8080 --bind 0.0.0.0  # LAN accessible

See Configuration for proxy bypass, hooks, and resource limits.



This reduces risk. It does not eliminate it.

  • DNS exfiltration - data encoded in DNS queries
  • Timing side channels - information leaked via response timing
  • Steganography - secrets hidden in images or binary data
  • Sophisticated encoding - bypassing pattern matching

Use scoped credentials. Don't use production secrets where exfiltration would be catastrophic. See Security Audit to test it yourself.


MIT. See LICENSE.

联系我们 contact @ memedata.com