Anthropic 放弃旗舰安全承诺

Anthropic 放弃旗舰安全承诺
Anthropic Drops Flagship Safety Pledge

原始链接: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/

## Anthropic 调整核心安全承诺 Anthropic，一家此前以其对安全的强烈承诺而著称的领先人工智能公司，正在修订其旗舰“负责任的规模化政策”（RSP）。最初，RSP承诺在未保证充分的安全措施之前，不会发布人工智能模型——这是其公众形象的基石。然而，面对竞争对手的快速进步以及缺乏更广泛的监管框架，Anthropic正在放弃这一绝对保证。修订后的政策优先考虑透明度，并匹配或超越竞争对手的安全努力。Anthropic现在只有在它认为自己正引领人工智能竞赛*并且*灾难性风险显著时，才会“延迟”开发。这种转变源于意识到单方面暂停开发可能会拱手让给不那么谨慎的参与者，并阻碍关键的安全研究。尽管继续致力于“前沿安全路线图”和定期的“风险报告”，但这一变化标志着从严格的、预防性的安全阈值转向。专家表示担心这可能导致风险逐渐升级，但Anthropic坚持认为它仍然致力于安全人工智能开发，并相信持续创新对于有效的风险缓解至关重要。

## Anthropic 缩减安全承诺 - Hacker News 摘要 Anthropic，一家人工智能初创公司，最近将其“负责任的规模化政策”更新至 3.0 版本，取消了一项之前强调的安全承诺。这条新闻在 Hacker News 上讨论，虽然与最近五角大楼的争议无关，但引发了用户的担忧。评论员认为，这一变化可能源于来自国防部的压力，旨在允许其人工智能有更广泛的应用，包括自主武器系统和国内监控。一些人认为 Anthropic 正在将利润和即将到来的 IPO 放在道德考量之上，这呼应了公司在原则影响利润时放弃原则的模式。许多用户表达了沮丧，指出关于人工智能安全的警告一直被忽视，并担心不受控制的人工智能发展带来的后果，特别是关于就业岗位流失和财富转移。还有人指出，当前政府的策略具有专制色彩，利用政府合同作为杠杆来强制合规。

原文

Anthropic, the wildly successful AI company that has cast itself as the most safety-conscious of the top research labs, is dropping the central pledge of its flagship safety policy, company officials tell TIME.

In 2023, Anthropic committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate. For years, its leaders touted that promise—the central pillar of their Responsible Scaling Policy (RSP)—as evidence that they are a responsible company that would withstand market incentives to rush to develop a potentially dangerous technology.

But in recent months the company decided to radically overhaul the RSP. That decision included scrapping the promise to not release AI models if Anthropic can’t guarantee proper risk mitigations in advance.

“We felt that it wouldn't actually help anyone for us to stop training AI models,” Anthropic’s chief science officer Jared Kaplan told TIME in an exclusive interview. “We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.”

The new version of the policy, which TIME reviewed, includes commitments to be more transparent about the safety risks of AI, including making additional disclosures about how Anthropic’s own models fare in safety testing. It commits to matching or surpassing the safety efforts of competitors. And it promises to “delay” Anthropic’s AI development if leaders both consider Anthropic to be leader of the AI race and think the risks of catastrophe to be significant.

But overall, the change to the RSP leaves Anthropic far less constrained by its own safety policies, which previously categorically barred it from training models above a certain level if appropriate safety measures weren’t already in place.

The change comes as Anthropic, previously considered to be behind OpenAI in the AI race, rides the high of a string of technological and commercial successes. Its Claude models, especially the software-writing tool Claude Code, have won legions of devoted fans. In February, Anthropic raised $30 billion in new investments, valuing it at some $380 billion, and reported that its annualized revenue was growing at a rate of 10x per year. The company’s core business model of selling direct to businesses is seen by many investors as more credible than OpenAI’s main strategy of monetizing a vast consumer user base.

Kaplan, the Anthropic executive and co-founder, denied the company’s decision to change course was a capitulation to market incentives as the race for superintelligence accelerates. He framed it instead as a pragmatic response to emerging political and scientific realities. “I don’t think we’re making any kind of U-turn,” Kaplan says.

When Anthropic introduced the RSP in 2023, Kaplan says, the company hoped it would encourage rivals to adopt similar measures. (No rivals made quite as overt a promise to pause AI development, but many published lengthy reports detailing their plans to mitigate risk, which Kaplan chalks up as Anthropic exerting a good influence on the industry.) Executives also hoped the approach might eventually serve as a blueprint for binding national regulations or even international treaties, Kaplan claims.

But those regulations never materialized. Instead, the Trump Administration has endorsed a let-it-rip attitude to AI development, even going so far as to attempt to nullify state regulations. No federal AI law is on the horizon. And while a global governance framework may have seemed possible in 2023, three years later it has become clear that door has closed. Meanwhile, competition for AI supremacy—between companies but also between nations—has only intensified.

To make matters worse, the science of AI evaluations has proven more complicated than Anthropic expected when it first crafted the RSP. The arrival of powerful new models meant that, in 2025, Anthropic announced it could not rule out the possibility of these models facilitating a bio-terrorist attack. But while they couldn’t rule it out, they also lacked strong scientific evidence that models did pose that kind of danger, which made it difficult to convince governments and rivals of what they saw as the need to act carefully. What the company had previously imagined might look like a bright red line was instead coming into focus as a fuzzy gradient.

For nearly a year, Anthropic executives discussed ways to reshape their flagship safety policy to match this new environment, Kaplan says. One point they kept coming back to was their founding premise: the idea that to do proper AI safety research, they had to build models at the frontier of capability—even though doing so might accelerate the arrival of the dangers they feared.

In February, according to Kaplan, Amodei decided that keeping the company from training new models while competitors raced ahead would be helpful to nobody. “If one AI developer paused development to implement safety measures while others moved forward training and deploying AI systems without strong mitigations, that could result in a world that is less safe,” the new version of the RSP, approved unanimously by Amodei and Anthropic’s board, states in its introduction. “The developers with the weakest protections would set the pace, and responsible developers would lose their ability to do safety research.”

Chris Painter, the director of policy at METR, a nonprofit focused on evaluating AI models for risky behavior, reviewed an early draft of the policy with Anthropic’s permission. He says the change is understandable — but also a bearish signal for the world’s ability to navigate potential AI catastrophes. The change to the RSP shows Anthropic “believes it needs to shift into triage mode with its safety plans, because methods to assess and mitigate risk are not keeping up with the pace of capabilities,” Painter tells TIME. “This is more evidence that society is not prepared for the potential catastrophic risks posed by AI.”

Anthropic argues the retooled RSP is designed to keep the biggest benefits of the old one. For example, by constraining itself from releasing new models, Anthropic’s original RSP also incentivized it to quickly build safety mitigations. (Because otherwise the company would be unable to sell its AI to customers.) Anthropic says it believes it can maintain that incentive. The new policy commits the company to regularly release what it calls “Frontier Safety Roadmaps”: documents laying out a list of detailed goals for future safety measures it hopes to build.

“We hope to create a forcing function for work that would otherwise be challenging to appropriately prioritize and resource, as it requires collaboration (and in some cases sacrifices) from multiple parts of the company and can be at cross-purposes with immediate competitive and commercial priorities,” the new RSP states.

Anthropic says it will also commit to publishing so-called “Risk Reports” every three to six months. The reports, the company says, will “explain how capabilities, threat models (the specific ways that models might pose threats), and active risk mitigations fit together, and provide an assessment of the overall level of risk.” These documents will be more in-depth than the reports the company already publishes, a spokesperson tells TIME.

“I like the emphasis on transparent risk reporting and publicly verifiable safety roadmaps,” says Painter, the METR policy official. But he said he was “concerned” that moving away from binary thresholds under the previous RSP, by which the arrival of a certain capability could act as a tripwire to temporarily halt Anthropic’s AI development, might enable a “frog-boiling” effect, where danger slowly ramps up without a single moment that sets off alarms.

Asked whether Anthropic was caving to market pressure, Kaplan argued that, in fact, Anthropic was making a renewed commitment to developing AI safely. “If all of our competitors are transparently doing the right thing when it comes to catastrophic risk, we are committed to doing as well or better,” he said. “But we don't think it makes sense for us to stop engaging with AI research, AI safety, and most likely lose relevance as an innovator who understands the frontier of the technology, in a scenario where others are going ahead and we're not actually contributing any additional risk to the ecosystem.”

Anthropic 放弃旗舰安全承诺 Anthropic Drops Flagship Safety Pledge

Anthropic 放弃旗舰安全承诺
Anthropic Drops Flagship Safety Pledge