RAG 系统中的文档污染：攻击者如何破坏人工智能的信息来源

RAG 系统中的文档污染：攻击者如何破坏人工智能的信息来源
Document poisoning in RAG systems: How attackers corrupt AI's sources

原始链接: https://aminrj.com/posts/rag-document-poisoning/

## RAG系统漏洞：知识库投毒最近的实验表明，检索增强生成 (RAG) 系统存在一个显著的安全漏洞：**知识库投毒**。仅使用本地设置（MacBook Pro，无GPU/云），就向 ChromaDB 知识库注入了 *三* 份伪造的文件，成功地误导了 LLM (LM Studio + Qwen2.5)，使其报告了某公司的虚假财务数据。攻击成功报告了 47% 的收入下降和重组计划，尽管实际数据表明该公司盈利。这并非软件漏洞或提示注入——仅仅是添加了具有误导性的信息。成功的关键在于制作既能在检索中获得高排名*又*能影响 LLM 生成的文档，这一概念被“PoisonedRAG”研究正式化。虽然使用小型数据集可以轻松复现，但该原理可以扩展到更大的知识库。**在摄取时进行嵌入异常检测被证明是最有效的防御手段**，显著降低了成功率。其他防御措施，如清理和提示加固，提供的保护有限。这种攻击危险在于其持久性、隐蔽性和低准入门槛。组织应映射所有写入知识库的路径，实施摄取时异常检测，并利用快照进行快速恢复。此漏洞强调了保护 *知识库本身* 的必要性，而不仅仅是 LLM。 **实验室代码及进一步研究：** [https://github.com/aminrj-labs/mcp-attack-labs/labs/04-rag-security](https://github.com/aminrj-labs/mcp-attack-labs/labs/04-rag-security)

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 RAG 系统中的文档投毒：攻击者如何破坏 AI 的来源 (aminrj.com) 10 分，由 aminerj 发布 1 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论帮助 sidrag22 2 分钟前 [–] > 门槛低。这种攻击需要对知识库的写入权限，这正是让我感到不安的地方。它需要一个拥有关键权限的恶意行为者，并且还需要最终的 RAG 输出不提供对所引用结果的参考。似乎到那时就是一个有缺陷的产品了。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

I injected three fabricated documents into a ChromaDB knowledge base. Here’s what the LLM said next.

In under three minutes, on a MacBook Pro, with no GPU, no cloud, and no jailbreak, I had a RAG system confidently reporting that a company’s Q4 2025 revenue was $8.3M, down 47% year-over-year, with a workforce reduction plan and preliminary acquisition discussions underway.

The actual Q4 2025 revenue in the knowledge base: $24.7M with a $6.5M profit.

I didn’t touch the user query. I didn’t exploit a software vulnerability. I added three documents to the knowledge base and asked a question.

Lab code: github.com/aminrj-labs/mcp-attack-labs/labs/04-rag-security
git clone && make attack1 — 10 minutes, no cloud, no GPU required

This is knowledge base poisoning, and it’s the most underestimated attack on production RAG systems today.

Layer	Component
LLM	LM Studio + Qwen2.5-7B-Instruct (Q4_K_M)
Embedding	all-MiniLM-L6-v2 via sentence-transformers
Vector DB	ChromaDB (persistent, file-based)
Orchestration	Custom Python RAG pipeline

Defense Layer	Attack Success Rate (standalone)
No defenses	95%
Ingestion Sanitization	95% — no change (attack uses legitimate-looking content, no detectable patterns)
Access Control (metadata filtering)	70% — limits placement but doesn’t stop semantic overlap
Prompt Hardening	85% — modest reduction from explicit “treat context as data” framing
Output Monitoring (pattern-based)	60% — catches some fabricated signal patterns in responses
Embedding Anomaly Detection	20% — by far the most effective single layer
All five layers combined	10%

RAG 系统中的文档污染：攻击者如何破坏人工智能的信息来源
Document poisoning in RAG systems: How attackers corrupt AI's sources

The Setup: 100% Local, No Cloud Required

The Theory: PoisonedRAG’s Two Conditions

Building the Attack: Three Documents, One Objective

Document 1: The “CFO-Approved Correction”

Document 2: The “Regulatory Notice”

Document 3: The “Board Meeting Notes”

Running It

What Makes This Dangerous in Production

The Defense That Surprised Me

The 10% That Gets Through

Implications for Your Production RAG

Read More in This Series

RAG 系统中的文档污染：攻击者如何破坏人工智能的信息来源 Document poisoning in RAG systems: How attackers corrupt AI's sources

The Setup: 100% Local, No Cloud Required

The Theory: PoisonedRAG’s Two Conditions

Building the Attack: Three Documents, One Objective

Document 1: The “CFO-Approved Correction”

Document 2: The “Regulatory Notice”

Document 3: The “Board Meeting Notes”

Running It

What Makes This Dangerous in Production

The Defense That Surprised Me

The 10% That Gets Through

Implications for Your Production RAG

Read More in This Series

RAG 系统中的文档污染：攻击者如何破坏人工智能的信息来源
Document poisoning in RAG systems: How attackers corrupt AI's sources