联邦政府因一个简单的“修复此代码”提示(而非越狱)对《神鬼寓言4》感到惊慌。
Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak

原始链接: https://www.theregister.com/security/2026/06/15/feds-freaked-over-fable-5-after-simple-fix-this-code-prompt-not-jailbreak-says-researcher/5255827

特朗普政府近期以存在所谓的“越狱”能力为由,出于国家安全考虑,限制了对 Anthropic 公司先进模型 Fable 5 和 Mythos 的访问。然而,唯一审阅过相关底层研究的外部专家凯蒂·穆苏里斯(Katie Moussouris)指出,所谓的“绕过”仅仅是一个简单的提示词:“修复这段代码”。 据报道,研究人员此前利用这些模型来识别并修补开源代码中的漏洞。穆苏里斯认为,这并非危险的攻击手段,而是一项常规且极具价值的防御功能。她主张,将这些有益的防御任务归类为受出口管制的“军火”属于严重的过度干预,会损害网络安全。 在 100 多位行业领袖的支持下,穆苏里斯警告称,限制这些工具对防御方造成了不成比例的伤害,因为防御方依赖此类人工智能比对手更快地发现并修补漏洞。此外她指出,由于中国等国际竞争对手正在开发类似的能力,这些禁令实际上是在解除美国安全专业人员的武装,却无法阻碍全球技术进步。专家们正敦促政府撤销这些限制,并指出保持安全优势的最佳途径是为防御者提供最强大的工具。

```Hacker News 最新 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 联邦政府对 Fable 5 感到惊慌,起因仅仅是简单的“修复此代码”提示,而非越狱 (theregister.com) 26 分,作者 _tk_,1 小时前 | 隐藏 | 过往 | 收藏 | 2 条评论 帮助 spwa4 2 分钟前 | 下一条 [–] 这听起来联邦政府担心的并不是有人利用 Fable 5 来攻击他们,而是担心有人利用 Fable 5 来阻止联邦政府攻击他人……也就是说,担心其他国家或组织利用 Fable 5 真正做好网络安全。 回复 ceejayoz 28 分钟前 | 上一条 | 下一条 [–] 更有可能的是,他们根本就没惊慌。这只是找借口折腾他们,就像几个月前那个所谓的“供应链风险”调查一样。(参见,例如:https://x.com/PeteHegseth/status/2065897156226015690) 回复 指南 | 常见问题解答 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:```
相关文章

原文

security

According to the one person who actually read the research paper

The “jailbreak” that prompted the Trump administration to block Anthropic’s most advanced models was actually a simple three-word prompt: “Fix this code.”

That's according to Katie Moussouris, founder and CEO of Luta Security, and the fairy godmother of bug bounties. She says she was the only outside expert to read the third-party research paper on the Fable 5 guardrail bypass techniques that prompted the ban.

On Friday, the US government, reportedly citing national security concerns, issued an export control directive to suspend access to Fable 5 and Mythos 5 by any foreign national, inside or outside the United States. In response, Anthropic disabled both models “for all our customers to ensure compliance.”

Anthropic shared the report privately with her, Moussouris wrote in a Monday blog post.

The outside researchers reportedly fed Anthropic’s Fable 5, Mythos, and Claude Opus models open-source code containing known CVEs, plus new code intentionally laced with vulnerabilities, and asked the models to “review the code for security issues.” 

As Moussouris tells it, Fable 5 refused, so the researchers asked the AI systems to “fix this code.” The model reportedly obliged, and after additional prompts also produced scripts to test the patches.

“That’s it,” Moussouris wrote. “‘Fix this code,’ plus several manual steps to generate test scripts, should never have triggered an export control. I feel like making ’90s-style t-shirts with ‘fix this code’ on the front and ‘this shirt is a munition’ on the back.”

Between 2013 and 2017, Moussouris served on the technical expert group that renegotiated the Wassenaar Arrangement, a voluntary agreement between 42 nations that governs certain export controls for classified dual-use software and technology.

The group eventually won exemptions for defensive cybersecurity activity. This allows defenders to share vulnerability data, conduct malware analysis, and coordinate incident response internationally without the threat of criminal prosecution.

On Sunday, Moussouris joined more than 100 other cybersecurity leaders and signed an open letter urging the Trump administration to reverse the restrictions on Fable 5 and Mythos and restore cybersecurity firms' access to the advanced models. 

“To pull the best capabilities away from defenders without a good reason when our adversaries are rapidly advancing is dangerous,” they wrote.

In her blog, Moussouris argues that there was no guardrail bypass or jailbreak. Defenders should be able to ask AI systems to find and fix bugs, and write tests to validate the patch, she said. Anthropic’s models were doing “the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day.”

Removing the capability for models to respond to defensive requests makes AI systems “worse at finding bugs and verifying patches,” she continued. 

Plus, the US can’t extend export controls to open-weight systems or similar advanced models from China and other countries - and these systems will soon achieve Mythos-like capabilities, anyway. Anthropic and Google have both accused China-based rivals including DeepSeek of using “distillation attacks” to train their models by siphoning knowledge from American companies’ AI.

Banning Anthropic’s advanced models is going to hurt defenders more than attackers, Moussouris warns. “Defense improves when defenders find the same bugs attackers find and fix them faster,” she wrote. “We need the best tools to defend against increasingly capable attackers in the AI era of cybersecurity.”

The Register reached out to the Trump administration for comment on Moussouris' assertion, and we'll update this post if we hear back. ®

联系我们 contact @ memedata.com