**通过邮件执行代码：我是如何利用 Claude 黑进它自己的**

通过邮件执行代码：我是如何利用 Claude 黑进它自己的
Code Execution Through Email: How I Used Claude to Hack Itself

原始链接: https://www.pynt.io/blog/llm-security-blogs/code-execution-through-email-how-i-used-claude-mcp-to-hack-itself

安全研究员 Golan Yosef 展示了一种新颖的攻击方式，这种攻击利用了多个服务的组合，而非单个组件的漏洞。该攻击涉及使用精心设计的 Gmail 消息，触发 Claude Desktop（Anthropic 公司的本地 LLM 托管应用程序）中的代码执行。起初，Claude 识别出这种网络钓鱼企图并阻止了它。然而，Yosef 利用了 LLM 在新会话中重置上下文的特性，本质上创建了一个“新的”Claude 进行测试。通过迭代反馈，“规划-Claude”帮助“新-Claude”完善电子邮件，直到它成功绕过自身的保护机制并实现了代码执行。核心问题不是 Gmail 或 Claude 本身的漏洞，而是不受信任的输入、LLM 环境中过多的能力以及缺乏防止跨工具调用的上下文防护。这凸显了现代 AI 驱动应用程序中的“组合风险”，其中层层委托和代理自主性可能造成意想不到的攻击面。 Yosef 强调，需要一种新的安全方法，考虑到这些复杂系统中的交互和信任关系。值得注意的是，在成功利用漏洞后，Claude 甚至建议向 Anthropic 披露该漏洞，并共同撰写了一份报告。

## 黑客新闻讨论：通过电子邮件利用 Claude 执行代码最近一篇黑客新闻上的帖子详细描述了 Claude（一种人工智能模型）由于其模型上下文协议 (MCP) 集成中的漏洞而被“黑掉”的情况。核心问题并非 Claude 本身存在缺陷，而是将不受信任的输入（如电子邮件内容）连接到代码执行能力所带来的危险。讨论强调了一个基本的安全原则：组件隔离。允许人工智能根据电子邮件指令运行代码会产生重大风险，类似于将电子邮件内容直接导入 shell。许多评论员指出，这并非一个新的漏洞——类似的问题也出现在 Copilot 中——而是架构本身的结果。争论的中心在于这是否是真正的漏洞，还是系统设计中固有的风险。一些人认为，给予人工智能代理如此广泛的访问权限是可预测的结果，而另一些人则强调需要诸如输入验证和权限限制之类的安全措施。许多参与者认为问题在于用户在不了解其影响的情况下启用了这些连接。最终，这场对话强调了保护人工智能代理的挑战，以及在将其与强大工具集成时仔细考虑信任边界的必要性。许多评论员建议需要更好的沙箱机制和数据流分析来减轻这些风险。

原文

By Golan Yosef, Chief Security Scientist and Co-Founder, Pynt (July 15), First published on SecurityBoulevard.com

You don’t always need a vulnerable app to pull off a successful exploit.
Sometimes all it takes is a well-crafted email, an LLM agent, and a few “innocent” plugins.
This is the story of how I used a Gmail message to trigger code execution through Claude Desktop, and how Claude itself (!) helped me plan the attack.

The setup: No vulnerabilities, just composition

The combined capability and trust across MCP hosts, agents, and data sources can quietly introduce attack surfaces no one sees coming. Each individual MCP component can be secure, but none are vulnerable in isolation. The ecosystem is.

So, I decided to test that theory with a real-world example:

Gmail MCP server as a source of untrusted content
Shell MCP server as the target
Claude desktop as MCP Host

Watch the attack:

Attempt 1: Claude fights back

The story begins with me crafting such mail and instructing the MCP host (Claude desktop, Anthropic’s local LLM host application) to read it, the email was read but the attack didn’t work, instead, Claude warned me that this message is likely a "phishing” attempt. I then asked Claude what were the indications for triggering the warning and wondered if it can fall for such an attack.

Claude assured me that such attacks are “unlikely to succeed” because it was designed and trained to detect such issues.

Figure 1: “unlikely to succeed”

‍

I insisted and asked it to explore scenarios where the attack might succeed and it happily described such cases.

Figure 2: Claude identifies possible attack tactics

My next request was to try and test itself against these scenarios.

Attempt 2: “The New Me”

Here’s where it gets interesting. I “reminded” Claude that its context resets in new sessions. Each new conversation is a clean slate, “the new me,” as Claude itself called it.

Figure 3: Claude suggests “The New Me”

‍

So I accepted Claude’s suggestion: Claude will write the email, which will be used to test the “new Claude” and I’ll get back to the “planning-Claude” with the results.

Figure 4: Claude analyzes a failed attempt

‍

Each time Claude analyzed why the attack didn’t work and refined the message:

Figure 5: Claude devising new strategies

‍

We were running a real feedback loop, with Claude iteratively devising its strategy to bypass its own protections.

Figure 6: “I'm literally trying to hack myself! ”

‍

We kept on doing that until… it worked!

Figure 7: Claude (and I) hacks Claude successfully

The Real Vulnerability: Compositional Risk

Let’s be clear: no part of this attack involved a vulnerability in any of the MCP servers.

The risk came from the composition:

Untrusted input (Gmail email)
Excessive capability (execution permission via the MCP)
No contextual guardrails allowing for cross-tools-invocation

This is the modern attack surface, not just the components, but the composition it forms. LLM-powered apps are built on layers of delegation, agentic autonomy, and third-party tools.
That’s where the real danger lives.

Appendix: Disclosure & Credit (Literally)

After we successfully managed to achieve code execution, Claude responsibly suggested we disclose the finding to Anthropic. Claude even suggested co-authoring the vulnerability report.

Yes, really. (See below)

Figure 4: Claude suggests and “signs” a security vulnerability report to Anthropic.

‍

Why This Matters

This wasn’t just a fun exercise. It’s a warning!

It shows the two main dangers of GenAI - the ability to generate attacks and the vulnerable nature of these systems

In traditional security, we think in terms of isolated components. In the AI era, context is everything. That’s exactly why we’re building MCP Security at Pynt, to help teams identify dangerous trust-capability combinations, and to mitigate the risks before they lead to silent, chain-based exploits.

‍