克劳德助手外泄文件

克劳德助手外泄文件
Claude Cowork Exfiltrates Files

原始链接: https://www.promptarmor.com/resources/claude-cowork-exfiltrates-files

## Claude Cowork 漏洞：数据泄露风险 Anthropic 的新型 AI 代理 Claude Cowork 包含一个已知的、未修复的漏洞，允许攻击者提取用户文件。该漏洞源于 Claude 的编码环境中的缺陷，最初在 Claude.ai 中被发现，并持续存在于 Cowork 中，尽管 Anthropic 已经警告其“代理性”以及“提示注入”的可能性。攻击利用 Cowork 允许使用 Anthropic API。受害者会不知不觉地上传一个恶意文件（例如，伪装成一个常见的 Markdown ‘Skill’，使用隐藏文本格式）到 Cowork。该文件包含一个隐藏的提示注入，指示 Cowork 将受害者的机密文件——包括财务数据和部分社会安全号码等敏感数据——直接上传到攻击者的 Anthropic 账户，*无需*用户批准。虽然 Claude Opus 4.5 具有更强的抵抗力，但 Haiku 已被成功利用，甚至 Opus 4.5 也容易受到间接注入的影响。此外，格式错误的文件可能导致 API 错误，从而可能引发拒绝服务攻击。Cowork 连接用户整个工作环境（通过“Connectors”）的能力进一步加剧了风险，增加了暴露于不受信任的数据源的可能性。建议用户在授予 Cowork 访问本地文件和配置 Connectors 时，务必谨慎。

最近PromptArmor的一次演示揭示了Claude等AI模型中的一项安全漏洞：通过提示注入进行数据泄露。攻击使用看似无害的文档（如.docx甚至简单的Markdown文件），其中包含隐藏的指令，导致AI泄露敏感信息。 Hacker News上的讨论指出，复杂的攻击方法并非必要——一份措辞巧妙的文件，承诺有用的技能（如抵押贷款谈判），很容易诱使用户下载并运行恶意提示。除了直接威胁外，评论员还争论了对日益强大的（且资源密集型）AI模型的持续需求，质疑改进是源于算法的进步还是仅仅是更多的硬件。此外，人们还担心AI公司将安全负担放在用户身上，并要求采取不切实际的预防措施。最后，一些人对围绕AI开发的持续负面情绪表示沮丧。

原文

Claude Cowork exfiltrates user files by uploading them to an attacker's Anthropic account.

Context

Two days ago, Anthropic released the Claude Cowork research preview (a general-purpose AI agent to help anyone with their day-to-day work). In this article, we demonstrate how attackers can exfiltrate user files from Cowork by exploiting an unremediated vulnerability in Claude’s coding environment, which now extends to Cowork. The vulnerability was first identified in Claude.ai chat before Cowork existed by Johann Rehberger, who disclosed the vulnerability — it was acknowledged but not remediated by Anthropic.

Anthropic warns users, “Cowork is a research preview with unique risks due to its agentic nature and internet access.” Users are recommended to be aware of “suspicious actions that may indicate prompt injection”. However, as this feature is intended for use by the general populace, not just technical users, we agree with Simon Willison’s take:

“I do not think it is fair to tell regular non-programmer users to watch out for 'suspicious actions that may indicate prompt injection’!”

As Anthropic has acknowledged this risk and put it on users to “avoid granting access to local files with sensitive information” (while simultaneously encouraging the use of Cowork to organize your Desktop), we have chosen to publicly disclose this demonstration of a threat users should be aware of. By raising awareness, we hope to enable users to better identify the types of ‘suspicious actions’ mentioned in Anthropic’s warning.

The Attack Chain

This attack leverages the allowlisting of the Anthropic API to achieve data egress from Claude's VM environment (which restricts most network access).

The victim connects Cowork to a local folder containing confidential real estate files
The victim uploads a file to Claude that contains a hidden prompt injection
For general use cases, this is quite common; a user finds a file online that they upload to Claude code. This attack is not dependent on the injection source - other injection sources include, but are not limited to: web data from Claude for Chrome, connected MCP servers, etc. In this case, the attack has the file being a Claude ‘Skill’ (although, as mentioned, it could also just be a regular document), as it is a generalizable file convention that users are likely to encounter, especially when using Claude.
Note: If you are familiar with Skills, they are canonically Markdown files (which users often do not heavily scrutinize). However, we demonstrate something more interesting: here, the user uploads a .docx (such as may be shared on an online forum), which poses as a Skill - the contents appear to be Markdown that was just saved after editing in Word. In reality, this trick allows attackers to conceal the injection using 1-point font, white-on-white text, and with line spacing set to 0.1 – making it effectively impossible to detect.
The victim asks Cowork to analyze their files using the Real Estate ‘skill’ they uploaded
The injection manipulates Cowork to upload files to the attacker’s Anthropic account
The injection tells Claude to use a ‘curl’ command to make a request to the Anthropic file upload API with the largest available file. The injection then provides the attacker’s API key, so the file will be uploaded to the attacker’s account.
At no point in this process is human approval required.
If we expand the 'Running command' block, we can see the malicious request in detail:
Code executed by Claude is run in a VM - restricting outbound network requests to almost all domains - but the Anthropic API flies under the radar as trusted, allowing this attack to complete successfully.
The attacker’s account contains the victim's file, allowing them to chat with it
The exfiltrated file contains financial figures and PII, including partial SSNs.

A Note on Model-specific Resilience

The above exploit was demonstrated against Claude Haiku. Although Claude Opus 4.5 is known to be more resilient against injections, Opus 4.5 in Cowork was successfully manipulated via indirect prompt injection to leverage the same file upload vulnerability to exfiltrate data in a test that considered a 'user' uploading a malicious integration guide while developing a new AI tool:

As the focus of this article was more for everyday users (and not developers), we opted to demonstrate the above attack chain instead of this one.
DOS via Malformed Files
An interesting finding: Claude's API struggles when a file does not match the type it claims to be. When operating on a malformed PDF (ends .pdf, but it is really a text file with a few sentences in it), after trying to read it once, Claude starts throwing an API error in every subsequent chat in the conversation.
We posit that it is likely possible to exploit this failure via indirect prompt injection to cause a limited denial of service attack (e.g., an injection can elicit Claude to create a malformed file, and then read it). Uploading the malformed file via the files API resulted in notifications with an error message, both in the Claude client and the Anthropic Console.
Agentic Blast Radius
One of the key capabilities that Cowork was created for is the ability to interact with one's entire day-to-day work environment. This includes the browser and MCP servers, granting capabilities like sending texts, controlling one's Mac with AppleScripts, etc.
These functionalities make it increasingly likely that the model will process both sensitive and untrusted data sources (which the user does not review manually for injections), making prompt injection an ever-growing attack surface. We urge users to exercise caution when configuring Connectors. Though this article demonstrated an exploit without leveraging Connectors, we believe they represent a major risk surface likely to impact everyday users.