网络安全研究人员对 Anthropic 公司 Fable 的护栏机制感到不满。
Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

原始链接: https://techcrunch.com/2026/06/10/cybersecurity-researchers-arent-happy-about-the-guardrails-on-anthropics-fable/

Anthropic 近日发布了其专业网络安全模型 Mythos 的公开限制版“Fable”。尽管该模型旨在防止恶意软件和生物武器的制造,但因其过于激进且“随意”的安全护栏,遭到了网络安全专业人士的广泛批评。 研究人员指出,Fable 经常对良性任务(如代码审查或阅读安全相关博文)触发安全拦截,将任何提及网络安全或软件工程最佳实践的内容均视为违规风险。一旦触发这些护栏,系统就会默认切换至 Anthropic 的标准 Claude 模型。 尽管 Matt Suiche 等专家认为,对新技术采取过度谨慎的限制是必要的起步阶段,但其他人仍对该模型无法区分恶意意图与正当安全工作感到沮丧。若要获得更广泛的访问权限,专业人士必须申请 Anthropic 的“网络验证计划”。这反映了行业内的一种普遍趋势——正如 OpenAI 所做的那样,通过对用户进行审核,以降低高级人工智能在敏感领域被滥用的风险。

网络安全研究人员对 Anthropic 的“Fable”模型所设置的限制性防护措施表示强烈不满。批评者认为,这些措施阻碍了合法的安全工作,往往将专业人士拒之门外,却无法阻止那些可以通过提示词工程轻松绕过过滤器的恶意攻击者。 讨论重点关注了以下几个核心问题: * **欺骗性行为:** 用户称,当模型怀疑输入了禁止的提示词时,会悄悄地替换为能力较弱的版本,这损害了用户信任。 * **实用性丧失:** 许多研究人员发现该工具在漏洞分析方面的作用日益减弱,并指出 DeepSeek 等竞争对手目前能为安全研究提供更实用、无限制的支持。 * **隐藏议程:** 有猜测认为,这些防护措施可能是进行数据采集的借口,被标记的提示词可能会被用于模型训练。 * **优先级:** 观察人士认为,Anthropic 为了维护过度谨慎的品牌形象(担心在生物威胁等领域的潜在滥用),牺牲了真正安全专家的工作效率。 归根结底,评论者们的共识是,过度严苛的审查正在将网络安全社区推向 Anthropic 模型之外,转向更开放的替代方案。
相关文章

原文

Anthropic released its latest model Fable on Tuesday, billing it as a public and limited version of its powerful and much-hyped cybersecurity model Mythos.

But not everyone is happy with the restrictions, and a number of cybersecurity researchers and professionals have aired complaints online. 

“[Fable] rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post,” said Valentina “Chompie” Palmiotti, a well-known security researcher who works at IBM X-Force. 

When a prompt triggers its guardrails, Fable pauses the chat and says that its “safety measures flagged this message for cybersecurity or biology topics.”

The guardrails were put in place to limit the risk that Fable could be used to develop malware or compromise software — a long-standing concern within Anthropic. The restrictions on biology come from a similar concern around developing biological weapons.

When the AI giant released Mythos in April, it restricted the model to a limited number of companies and organizations in what it called Project Glasswing, an effort to deploy the model to secure critical software and infrastructure. Last week, Anthropic expanded access to Mythos to hundreds of organizations in 15 countries. 

But despite the good intentions, many cybersecurity experts are still put off by the haphazard nature of the restrictions. Matt Suiche, a cybersecurity veteran, told TechCrunch that “if you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded.” Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail. “It seems to be keyword based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails.”

Contact Us

Do you have more information about how hackers are using AI? Or how cybersecuity companies are using AI? We’d love to hear from you. From a non-work device and network, you can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, or via Telegram and Keybase @lorenzofb, or email.

“But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies,” said Suiche, who is a member of the technical staff at Tolmo, an AI cybersecurity startup. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time.”

Another researcher griped on X that “even asking for a code review” triggers Fable’s guardrails. 

Anthropic did not immediately respond to a request for comment.

Apart from guardrails inside its models, Anthropic requires cybersecurity professionals to apply to the Cyber Verification Program. If they get approved, the applicants have fewer limitations on using Claude for cybersecurity work. OpenAI has a similar program called Trusted Access for Cyber.

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.

联系我们 contact @ memedata.com