克劳德 Opus 4.7
Claude Opus 4.7

原始链接: https://www.anthropic.com/news/claude-opus-4-7

## Claude Opus 4.7现已发布:重大升级 Anthropic发布了Claude Opus 4.7,相比前代产品有了显著提升,尤其是在复杂的软件工程任务方面。用户反馈,在处理具有挑战性的编码工作时更有信心,该模型表现出严谨性、一致性以及改进的指令遵循能力。 主要增强包括大幅改进的视觉能力(支持更高分辨率的图像)以及为界面和文档创建等专业任务提供更具创意、更高质量的输出。虽然整体能力不如*Mythos Preview*模型,但Opus 4.7在众多基准测试中超越了Opus 4.6。 为了优先考虑安全性,Opus 4.7内置了网络安全防护措施,Anthropic还推出了针对安全专业人员的*Cyber Verification Program*。它现在已在所有Claude产品和主要云平台(Amazon Bedrock、Google Cloud、Microsoft Foundry)上可用,定价与Opus 4.6相同(输入token 5美元/百万,输出token 25美元/百万)。 早期测试者强调Opus 4.7改进了推理能力、效率以及自我纠正能力,从而加快了开发速度并提高了结果质量。一个关键变化是该模型对指令的字面解释,可能需要调整提示词。

## Claude Opus 4.7:摘要 Anthropic 发布了 Claude Opus 4.7,它具有更新的 tokenizer,可以改善文本处理,但 token 使用量增加了 10-35%。虽然一些用户报告 Claude 4.6 的近期性能下降,可能由于计算限制,但新版本旨在解决这个问题。 讨论的重点是效率和能力之间的权衡,一些人指出在长文本任务等特定领域可能出现倒退。一个关键的变化是更严格的默认努力级别,需要用户调整提示词。人们对新的安全措施表示担忧,这些措施限制了网络安全应用,反映了对 OpenAI 做法的批评。 许多用户还将 Claude 与 Codex 进行比较,一些人由于 Claude 认为存在不稳定性和质量问题而选择切换。这次发布引发了关于增量更新与模型性能的重大飞跃的价值,以及计算限制对人工智能发展的影响的争论。最终,社区持谨慎乐观态度,但对潜在的限制以及模型发布和调整的持续循环保持警惕。
相关文章

原文

Our latest model, Claude Opus 4.7, is now generally available.

Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back.

The model also has substantially better vision: it can see images in greater resolution. It’s more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs. And—although it is less broadly capable than our most powerful model, Claude Mythos Preview—it shows better results than Opus 4.6 across a range of benchmarks:

Last week we announced Project Glasswing, highlighting the risks—and benefits—of AI models for cybersecurity. We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models.

Security professionals who wish to use Opus 4.7 for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) are invited to join our new Cyber Verification Program.

Opus 4.7 is available today across all Claude products and our API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6: $5 per million input tokens and $25 per million output tokens. Developers can use claude-opus-4-7 via the Claude API.

Testing Claude Opus 4.7

Claude Opus 4.7 has garnered strong feedback from our early-access testers:

Below are some highlights and notes from our early testing of Opus 4.7:

  • Instruction following. Opus 4.7 is substantially better at following instructions. Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally. Users should re-tune their prompts and harnesses accordingly.
  • Improved multimodal support. Opus 4.7 has better vision for high-resolution images: it can accept images up to 2,576 pixels on the long edge (~3.75 megapixels), more than three times as many as prior Claude models. This opens up a wealth of multimodal uses that depend on fine visual detail: computer-use agents reading dense screenshots, data extractions from complex diagrams, and work that needs pixel-perfect references.1
  • Real-world work. As well as its state-of-the-art score on the Finance Agent evaluation (see table above), our internal testing showed Opus 4.7 to be a more effective finance analyst than Opus 4.6, producing rigorous analyses and models, more professional presentations, and tighter integration across tasks. Opus 4.7 is also state-of-the-art on GDPval-AA, a third-party evaluation of economically valuable knowledge work across finance, legal, and other domains.
  • Memory. Opus 4.7 is better at using file system-based memory. It remembers important notes across long, multi-session work, and uses them to move on to new tasks that, as a result, need less up-front context.

The charts below display more evaluation results from our pre-release testing, across a range of different domains:

Safety and alignment

Overall, Opus 4.7 shows a similar safety profile to Opus 4.6: our evaluations show low rates of concerning behavior such as deception, sycophancy, and cooperation with misuse. On some measures, such as honesty and resistance to malicious “prompt injection” attacks, Opus 4.7 is an improvement on Opus 4.6; in others (such as its tendency to give overly detailed harm-reduction advice on controlled substances), Opus 4.7 is modestly weaker. Our alignment assessment concluded that the model is “largely well-aligned and trustworthy, though not fully ideal in its behavior”. Note that Mythos Preview remains the best-aligned model we’ve trained according to our evaluations. Our safety evaluations are discussed in full in the Claude Opus 4.7 System Card.

Also launching today

In addition to Claude Opus 4.7 itself, we’re launching the following updates:

  • More effort control: Opus 4.7 introduces a new xhigh (“extra high”) effort level between high and max, giving users finer control over the tradeoff between reasoning and latency on hard problems. In Claude Code, we’ve raised the default effort level to xhigh for all plans. When testing Opus 4.7 for coding and agentic use cases, we recommend starting with high or xhigh effort.
  • On the Claude Platform (API): as well as support for higher-resolution images, we’re also launching task budgets in public beta, giving developers a way to guide Claude’s token spend so it can prioritize work across longer runs.
  • In Claude Code: The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. We’re giving Pro and Max Claude Code users three free ultrareviews to try it out. In addition, we’ve extended auto mode to Max users. Auto mode is a new permissions option where Claude makes decisions on your behalf, meaning that you can run longer tasks with fewer interruptions—and with less risk than if you had chosen to skip all permissions.

Migrating from Opus 4.6 to Opus 4.7

Opus 4.7 is a direct upgrade to Opus 4.6, but two changes are worth planning for because they affect token usage. First, Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type. Second, Opus 4.7 thinks more at higher effort levels, particularly on later turns in agentic settings. This improves its reliability on hard problems, but it does mean it produces more output tokens.

Users can control token usage in various ways: by using the effort parameter, adjusting their task budgets, or prompting the model to be more concise. In our own testing, the net effect is favorable—token usage across all effort levels is improved on an internal coding evaluation, as shown below—but we recommend measuring the difference on real traffic. We’ve written a migration guide that provides further advice on upgrading from Opus 4.6 to Opus 4.7.

联系我们 contact @ memedata.com