CaMeL：通过设计来击败提示注入

CaMeL：通过设计来击败提示注入
CaMeL: Defeating Prompt Injections by Design

2025年3月24日提交的一篇研究论文介绍了一种名为“CaMeL”的新型防御机制，旨在保护大型语言模型（LLM）代理免受提示注入攻击。此类攻击可能危及与外部数据交互的LLM驱动系统。 CaMeL通过在LLM周围创建保护层来工作，有效地将其与潜在的恶意输入隔离开来。其核心原理是明确提取和分离来自可信初始查询的控制流和数据流。这确保了LLM检索到的不可信数据无法更改预期的程序执行。此外，CaMeL采用基于能力的安全模型来防止敏感数据的未授权泄露。作者使用AgentDojo基准（一种针对代理系统的安全评估工具）证明了CaMeL的有效性。结果表明，CaMeL在67%的测试任务中实现了可证明的安全性能，这表明与未受保护的LLM代理相比，其对提示注入攻击的抵抗能力有了显著提高。该论文强调，CaMeL是保护在现实世界应用中部署的LLM驱动系统的一种很有前景的方法。

Hacker News 上的一篇文章重点介绍了“CaMeL”，这是一种防御大型语言模型 (LLM) 中提示注入攻击的新方法。长期关注提示注入问题的 Simonw 称赞 CaMeL 是他见到的第一个真正可靠的缓解方案。他强调，与大多数防御措施不同，CaMeL 不依赖于基于 AI 的检测，因为他认为这种检测不可靠，很可能漏报攻击。他将此与 SQL 注入和 XSS 等安全漏洞进行了类比，认为 99% 的检测率是不够的。Simonw 指出，他与 CaMeL 之间存在个人联系，因为 CaMeL 建立在他之前提出的使用双隔离和特权 LLM 的方案之上，该方案在研究论文中被直接引用。最终，虽然论文承认双 LLM 概念的价值，但它在此基础上进行了扩展，提供了一个更全面的解决方案。

（评论） 2023-11-15

梯子：通过递归问题分解来改进大型语言模型 2025-03-08

Llemma：数学领域的开放语言模型 2023-10-19

内循环代理 2025-04-21

原文

[Submitted on 24 Mar 2025]

View a PDF of the paper titled Defeating Prompt Injections by Design, by Edoardo Debenedetti and 9 other authors

View PDF

Abstract:Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models may be susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL relies on a notion of a capability to prevent the exfiltration of private data over unauthorized data flows. We demonstrate effectiveness of CaMeL by solving $67\%$ of tasks with provable security in AgentDojo [NeurIPS 2024], a recent agentic security benchmark.

From: Edoardo Debenedetti [view email]
[v1] Mon, 24 Mar 2025 15:54:10 UTC (2,409 KB)

CaMeL：通过设计来击败提示注入 CaMeL: Defeating Prompt Injections by Design

CaMeL：通过设计来击败提示注入
CaMeL: Defeating Prompt Injections by Design