``` 沙箱化不受信任的 Python ```

``` 沙箱化不受信任的 Python ```
Sandboxing Untrusted Python

原始链接: https://gist.github.com/mavdol/2c68acb408686f1e038bf89e5705b28c

## AI 代理沙箱日益增长的需求 Python 的设计 – 高度内省且可变 – 使在语言内部进行代码沙箱化变得极其困难，因此普遍认为沙箱化 Python 环境更安全。由于 AI 代理的兴起，特别是随着不受信任的代码和概率系统使用的增加，这变得至关重要。核心问题是安全漏洞，例如 LLM 中的提示注入，允许恶意指令绕过预期保护并访问敏感数据。这些缺陷扩展到其他 AI 工具，对技术用户和非技术用户都构成风险。解决方案不是更好的提示，而是强大的隔离。有效的隔离意味着限制代理的访问权限 – 访问特定文件、具有只读凭据的数据库，以及仅允许白名单 API – 采用文件系统、网络、凭据和运行时隔离的层次。当前解决方案包括基础设施级别的沙箱化，例如微型虚拟机（Firecracker，非常适合代理级别隔离）、容器（Docker，安全性较低）和系统调用拦截（gVisor，适合任务级别隔离）。新兴技术，如 WebAssembly (WASM)，提供了有前景的低开销、细粒度任务隔离，尽管目前在 C 扩展和 ML 库方面存在限制。关键在于设计系统，预见故障并优先通过分层安全进行遏制。

## 对不可信 Python 代码的沙箱化：Hacker News 讨论总结最近 Hacker News 的讨论集中在安全执行不可信 Python 代码的挑战上，尤其是在不断发展的 AI 代理领域。核心问题是 Python 本身缺乏内置的沙箱化能力。讨论中探讨了几种方法。虽然 Docker 和虚拟机等解决方案提供了隔离性，但它们通常被认为对于细粒度控制来说过于重量级。像 `sandbox-2` 这样的替代方案是操作系统级别的解决方案，而不是语言级别的。一个有前景的方法是使用 WebAssembly (WASM)，结合 `capsule` (使用 wasmtime 和 componentize-py) 等项目，在沙箱环境中运行 CPython，通过将恶意代码隔离在 WASM 容器内来限制其影响。讨论强调了权衡：WASM 提供了一种轻量级、跨平台的解决方案，但存在性能问题。其他建议包括 QEMU 和利用现有服务，如 Judge0。一个关键点是需要序列化数据（如 JSON），以防止在沙箱和主机环境之间传递可执行行为。最终，对话强调了随着 Python 在概率 AI 系统中变得越来越普遍，对安全不可信代码执行的需求日益增加。

原文

Python doesn't have a built-in way to run untrusted code safely. Multiple attempts have been made, but none really succeeded.

Why? Because Python is a highly introspective object-oriented language with a mutable runtime. Core elements of the interpreter can be accessed through the object graph, frames and tracebacks, making runtime isolation difficult. This means that even aggressive restrictions can be bypassed:

# Attempt: Remove dangerous built-ins
del __builtins__.eval
del __builtins__.__import__

# 1. Bypass via introspection
().__class__.__bases__[0].__subclasses__()

# 2. Bypass via exceptions and frames
try:
    raise Exception
except Exception as e:
    e.__traceback__.tb_frame.f_globals['__builtins__']

Note

Older alternatives like sandbox-2 exist, but they provide isolation near the OS level, not the language level. At that point we might as well use Docker or VMs.

So people concluded it's safer to run Python in a sandbox rather than sandbox Python itself.

The thing is, Python dominates AI/ML, especially the AI agents space. We're moving from deterministic systems to probabilistic ones, where executing untrusted code is becoming common.

Why sandboxing became important now?

2025 was marked by great progress but also showed us that isolation for AI agents goes beyond resource control or retry strategies. It's become a security issue.

LLMs have architectural flaws. The most notorious one is prompt injection, which exploits the fact that LLMs can't tell the difference between system prompt, legitimate user instructions and malicious ones injected from external sources. For example, People demonstrate how a hidden instruction can be injected from a web page through the coding agent and extract sensitive data from your .env file.

It's a pretty common pattern. We've found similar flaws across many AI tools in recent months. Take Model Context Protocol (MCP) for example. It shows how improper implementation extends the attack surface: the SQLite MCP server was forked thousands of times despite SQL injection vulnerabilities.

At least developers using coding agents or MCPs most likely know or are informed about the risks. But for non-technical users accessing AI through third-party services, it's a different story, and this is clear from the incidents we keep seeing like private data leaks or browser-based AI issues.

For me, the most important thing is to make sure that unaware users remain safe when using the AI agents we implement.

The solution isn’t better prompts. It’s isolation.

Like we said, focusing on the prompt is missing the point. You can't filter or prompt-engineer your way out of injections or architectural flaws. The solution has to be at the infrastructure level, through isolation and least privilege.

But what does isolation look like in practice? If your agent needs to read a specific configuration file, it should only have access to that file, not your entire filesystem. If it needs to query a customer database, it should connect with read-only credentials scoped to specific tables, not root access.

We can think of this in terms of levels of isolation :

flowchart TB
    direction TB
    Code["AI Agent"]

    subgraph Isolation["🛡️ Isolation Layers"]
        direction TB
        FS["<b>Filesystem Isolation</b> <br/><i>Only /tmp/agent_sandbox</i>"]
        NET["<b>Network Isolation</b> <br/><i>Allowlisted APIs only</i>"]
        CRED["<b>Credential Scoping</b> <br/><i>Least privilege tokens</</i>"]
        RT["<b>Runtime Isolation</b> <br/><i>Sandboxed environment</i>"]
    end

    subgraph Protected["🔒 Protected Resources"]
        direction TB
        Home["/home, /etc, .env"]
        ExtAPI["External APIs"]
        DB["Databases & CRMs"]
        Infra["Core Infrastructure"]
    end

    Code --> FS
    Code --> NET
    Code --> CRED
    Code --> RT

    FS -.->|❌ Blocked| Home
    NET -.->|❌ Blocked| ExtAPI
    CRED -.->|❌ Blocked| DB
    RT -.->|❌ Blocked| Infra

    style Isolation fill:#2ecc71,stroke:#27ae60,color:#fff
    style Protected fill:#e74c3c,stroke:#c0392b,color:#fff
    style FS fill:#9b59b6,stroke:#8e44ad,color:#fff
    style NET fill:#9b59b6,stroke:#8e44ad,color:#fff
    style CRED fill:#9b59b6,stroke:#8e44ad,color:#fff
    style RT fill:#9b59b6,stroke:#8e44ad,color:#fff

In a perfect world, we would apply these isolation layers to all AI agents, whether they're part of large enterprise platforms or small frameworks.

I believe this is the only way to prevent systemic issues. However, I'm also aware that many agents do need more context or access to specific resources to function properly.

So, the real challenge is finding the right balance between security and functionality.

There are several ways to sandbox AI agents, most of them operating at the infrastructure level, outside the Python code itself. From what I see, two main paradigms stand out: one is to sandbox the entire agent, and the other is to sandbox each individual task separately.

For sandboxing the entire agent, we have many solutions:

Firecracker: A microVM that provides a sandboxed environment for running untrusted code. It requires KVM, so it's Linux-only, but it's still a solid solution for agent-level isolation. The downside is that for granular task isolation, it introduces more overhead, higher resource consumption, and added complexity. AWS originally built it for Lambda, making it the closest option to "secure by default".
Docker: Everyone uses it. But it's not the most secure option. Security teams recommend Firecracker or gVisor (which we'll cover below) for agent-level isolation. And like Firecracker, it's a bit heavy for granular isolation.

On the other hand, for task-level isolation, we have only one popular option: gVisor.

Gvisor sits between container and VMs and provides a strong isolation. Not as strong as Firecracker, but still a solid choice. In my opinion, if you're already using Kubernetes, gVisor is the natural fit, even though it's flexible enough to work with any container runtime.

The only downside is that it's also Linux-only, since it was designed to secure Linux containers by intercepting and reimplementing Linux system calls. On top of that, it adds a non-trivial overhead. That's something to keep in mind when you're sandboxing at the task level.

An emerging alternative: WebAssembly (WASM)

I remember reading an NVIDIA article about using WebAssembly to sandbox AI agents and run them directly in the browser. This is a very interesting approach, though WASM can be used for sandboxing in other contexts too.

I'll admit, I might be a bit biased, since I've started a few projects around WASM recently. But I wouldn't mention it if I wasn't convinced by its technical strengths: no elevated privileges by default, no filesystem, network, or environment variable access unless explicitly granted, which is a great advantage. It can definitely work with or even compete against solutions like Firecracker or gVisor for low-overhead, task-level isolation.

Obviously, the ecosystem is still young, so there are some limits. It supports pure Python well, but support for C extensions is still evolving and not fully working for now. This impacts ML libraries like NumPy, Pandas, or TensorFlow.

Even with these constraints, I believe this could be promising in the future. That's why I'm working on an open-source solution that uses it to isolate individual agent tasks. The goal is to keep it simple, you can sandbox a task just by adding a decorator:

from capsule import task

@task(name="analyze_data", compute="MEDIUM", ram="512MB", timeout="30s", max_retries=1)
def analyze_data(dataset: list) -> dict:
    # Your code runs safely in a Wasm sandbox
    return {"processed": len(dataset), "status": "complete"}

If you're curious, the project is on My GitHub.

Where do we go from here?

Firecracker and gVisor have shown us that strong isolation is possible. And now, we're seeing newer players like WebAssembly come in, which can help us isolate things at a much more granular task level.

So if you're designing agent systems now, I would recommend planning for failure from the start. Assume that an agent will, at some point, execute untrusted code, process malicious instructions, or simply consume excessive resources. Your architecture must be ready to contain all of these scenarios.

Thank you for reading. I'm always interested in discussing these approaches, so feel free to reach out with any thoughts or feedback.