克劳德寓言 5

克劳德寓言 5
Claude Fable 5

原始链接: https://www.anthropic.com/news/claude-fable-5-mythos-5

Anthropic 发布了其迄今为止最强大、能力最强的模型 **Claude Fable 5**。该模型在软件工程、科学研究、视觉处理和知识工作方面表现领先，擅长处理复杂的长期任务。由于此类高能力模型在网络安全和生物学领域存在潜在风险，该公司实施了审慎的安全分类机制。当查询触发这些安全防护时，系统会自动将其重定向至限制较少的 Claude Opus 4.8。此类触发情况在不到 5% 的会话中发生，Anthropic 计划在未来不断优化这些机制，以减少误报。此外，通过“Project Glasswing”计划，Anthropic 向特定的网络防御者和基础设施提供商提供了一个名为 **Claude Mythos 5** 的专用版本，以辅助安全和生物医学研究。这两款模型现已通过 Claude API 提供，价格下调为每百万输入 Token 10 美元，每百万输出 Token 50 美元。为了加强安全监控，Anthropic 还为这些模型引入了强制性的 30 天数据保留政策，确保数据仅用于安全目的。

Anthropic 发布了性能强大的新模型 **Claude Fable 5**，以及针对政府支持项目、放宽了网络安全保障限制的 **Mythos 5** 版本。以下是此次公告及随后 Hacker News 讨论中的重点摘要： * **限时访问：** Fable 5 在 6 月 22 日前向订阅用户免费开放。此后，访问将需要消耗用量额度（usage credits），这引发了人们的担忧，认为 Anthropic 正在从固定价格订阅转向按量计费模式。 * **性能与安全：** Fable 5 在智能体编程基准测试中表现出显著提升。然而，它采用了严格的“保守”安全过滤机制，经常将查询转交给 Opus 4.8 处理，导致合法编程任务频繁被标记，令用户感到沮丧。 * **数据政策：** Mythos 级模型现要求执行 30 天的数据保留政策以进行安全审计，这引发了企业用户对隐私和合规性的担忧。 * **市场情绪：** 社区观点两极分化。尽管一些人称赞性能的飞跃是自主编程的一个里程碑，但许多人认为激进的营销、“免费增值”试用模式以及潜在的由 IPO 驱动的炒作是“对用户不友好”的表现，或者标志着前沿 AI 正逐渐成为大型企业专享的奢侈服务。

原文

Today we’re launching Claude Fable 5: a Mythos-class¹ model that we’ve made safe for general use.

Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.

For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.² Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program.

The capabilities of models like Fable 5 and Mythos 5 have the potential to do profound good for the world. We’ve seen the beginnings of this in Project Glasswing, where the models have helped cyber defenders secure critically important software. We’ve also seen it in life sciences research, where the models are positing novel hypotheses and speeding up the development of new therapeutics.

Fable 5 and Mythos 5 are being offered at $10 per million input tokens and $50 per million output tokens—less than half the price of Claude Mythos Preview. Today’s joint launch is another step towards our goal of bringing advanced AI capabilities to as many users as possible, as quickly and as safely as we can.

Evaluating Claude Fable 5 and Claude Mythos 5

The table below compares the capabilities of Fable 5 and Mythos 5 to other leading models.

Fable 5 and Mythos 5 can work autonomously for longer than any previous Claude models. Below we discuss how these skills apply to software engineering, and cover the model’s improved capabilities in knowledge work, vision, memory, and life sciences research.

Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand. Fable 5 is also more token-efficient than past Claude models: on Cognition’s FrontierCode evaluation, which tests whether models can pass difficult coding tasks while meeting the standards of high-quality production codebases, Fable 5 scores highest among frontier models, even at medium effort.

Knowledge work. Fable 5 shows strong performance on complex analytical tasks. On Hebbia’s Finance Benchmark for senior-level reasoning, Fable 5 has the highest score of any model, with substantial gains in document-based reasoning, chart and table interpretation, and problem solving. IMC noted that Fable 5 aced their trading-analysis evaluations nearly across the board, including factual lookup, conceptual reasoning, root-cause analysis, and expected-value analysis.

Vision. Fable 5 is the new state-of-the-art model for tasks involving vision. It can extract precise numbers from detailed scientific figures and can perform complex vision-based tasks like rebuilding a web app’s source code from screenshots alone. It also needs less scaffolding: for example, previous Claude models struggled to play Pokémon FireRed even with harnesses that gave them additional helpful tools, but Fable 5 beat FireRed with a minimal, vision-only harness.