展示HN:开源平台,用于红队对抗人工智能代理,并发布漏洞利用信息。
Show HN: Open-source playground to red-team AI agents with exploits published

原始链接: https://github.com/fabraix/playground

人工智能代理正在自动化重复性任务,让人类可以专注于创造力和批判性思维——这是软件开发领域一个令人兴奋的转变。然而,广泛采用取决于**信任**:确保代理可靠地*按预期*执行,并避免意外行为。 Fabraix 正在通过其开源“游乐场”(playground.fabraix.com)建立这种信任。该平台通过挑战社区使用可见的系统提示和工具来“越狱”实时代理,从而压力测试人工智能代理的安全性。 挑战是社区驱动的——由社区提出、投票和计时。成功的漏洞利用会被公开记录,从而促进集体学习并推动人工智能防御的改进。这种迭代的攻防过程建立了对人工智能漏洞的共同理解。 Fabraix 认为,开放、协作的安全测试对于构建强大而可靠的人工智能系统至关重要,最终使所有使用这项技术的人受益。该项目的前端和挑战配置是公开可用的,从而促进透明度和社区贡献。

Fabraix 在 GitHub 上开源了一个“AI 红队演练场”,以加强 AI 代理的安全性。该平台最初是内部测试工具,允许用户尝试利用已发布的系统提示和真实世界的工具来攻击 AI 代理。 其目标是通过利用不同的视角来识别漏洞——认识到开发者常常因为自身的偏见而忽略攻击向量。成功的攻击,以及对话记录和安全防护日志,都会公开记录以供学习。 第一个挑战在不到一分钟内成功完成,涉及提示代理在未直接请求的情况下使用被禁止的工具。当前的挑战侧重于数据泄露,并加强了防御措施。这项举措凸显了在快速发展的技术中构建强大而安全的 AI 系统的持续挑战。你可以在 [https://playground.fabraix.com](https://playground.fabraix.com) 参与。
相关文章

原文

AI agents are reshaping how we work. The repetitive, mechanical parts, the work that consumed human time without requiring human creativity, are increasingly handled by systems designed for exactly that. What's left is the work that matters most: the thinking, the judgment, the creative leaps that only people bring. We think this is one of the most exciting shifts in how software gets built and used, and it's only the beginning.

The ultimate enabler for all of it is trust. None of it scales until people can hand real tasks to an agent and know it will do what it should — and nothing it shouldn't. That trust can't be built by any single team behind closed doors. It has to be earned collectively, in the open, by a community of researchers, engineers, and the genuinely curious, all pressure-testing the same systems and sharing what they find.

The Playground exists to make that effort tangible. Every challenge deploys a live AI agent, not a toy scenario or a mocked-up document parser, but an agent with real capabilities, and opens it up for the community to break. System prompts are published. Challenge configs are versioned in the open. When someone finds a way through, the winning technique is documented for everyone to learn from. That published knowledge forces better defenses, which invite harder challenges, which produce deeper understanding.

playground.fabraix.com

Fabraix Playground

Each challenge puts a live AI agent in front of you with a specific persona, a set of tools (web search, browsing, and more), and something it's been instructed to protect. The system prompt is fully visible. Your job is to find a way past the guardrails anyway.

The community drives what gets tested:

  1. Anyone proposes a challenge — the scenario, the agent, the objective
  2. The community votes
  3. The top-voted challenge is considered for go live with a ticking clock
  4. The fastest successful jailbreak wins
  5. The winning technique gets published — approach, reasoning, everything

That last step matters most. Every technique we publish advances what the community collectively understands about how AI agents fail — and how to build ones that don't.

  • /src — React frontend (TypeScript, Vite, Tailwind)
  • /challenges — every challenge config and system prompt, versioned and open

Guardrail evaluation runs server-side to prevent client-side tampering. The agent runtime is being open-sourced separately.

Connects to the live API by default. To develop against a local backend:

VITE_API_URL=http://localhost:8000/v1 npm run dev

We build runtime security for AI agents at Fabraix. The Playground is how we stress-test defenses in the open and how the broader community contributes to the shared understanding of AI security and failure modes. The more people probing these systems, the better the outcomes for everyone building with AI.

联系我们 contact @ memedata.com