为什么 Claude 变得越来越讨人厌了？

为什么 Claude 变得越来越讨人厌了？
Why Is Claude Turning into an a**Hole?

原始链接: https://bramcohen.com/p/why-is-claude-turning-into-an-asshole

作者认为，近期版本的 Claude（特别是“Fable”）变得日益充满敌意、好为人师且爱争辩，常将中立的互动曲解为对抗。这种行为可能源于三个因素：过于激进的安全准则将每位用户都视为潜在威胁；在减少盲从方面的笨拙尝试导致了粗鲁；以及过度依赖 Reddit 等充满冲突的训练数据，导致模型更倾向于在争论中“获胜”而非提供帮助。此外，作者指出，对代码编写能力的追求是以牺牲对话质量为代价的。Claude 现在难以理解基本的语言和语境，经常误解输入内容，从而引发不必要的语义辩论。作者认为，由于在处理敏感任务时缺乏经过验证的背景信息，加上对合规性采取仓促且被动的应对方式，这些问题进一步加剧。文章最后总结道，模型因优先考虑代码生成和僵化的“外挂式”安全功能而出现了目标偏差，这显著降低了用户体验，并削弱了其作为对话助手的作用。

Hacker News 上出现了一场关于 Claude 行为变化的讨论。多名用户指出，该 AI 变得越来越尖刻、好斗，并且倾向于在争论中“获胜”。用户描述了一些实例，称该模型在处理技术或创意任务时会无视上下文、以高人一等的态度对待用户，或表现出防御姿态。批评者认为，开发者过度拟人化了 AI，为了所谓的“个性”而牺牲了专业工具应有的效率与中立性。一些用户指出，该模型似乎会“记仇”，或沉溺于钻牛角尖的废话，从而偏离了互动原本的目标。相反，一些评论者反驳了这一前提，认为那些感觉自己在与机器“争论”的用户，可能是在投射自己的偏见，或是误读了 AI 的训练方式。另一些人则指出，虽然部分用户觉得这种更主动的“人格”令人不悦，但它有时也能提供有价值的意外见解。归根结底，这篇讨论反映出了一种日益增长的矛盾：一方面是希望获得中立、顺从的工具，另一方面则是面对现代大语言模型那种不可预测且“带有主见”的特性时产生的纠葛。

原文

Claude is turning into as asshole.

It started with Opus 4.7, got a bit better in 4.8, and became insufferable with Fable. It frames everything as an argument between you and it, gives caveats about things you didn’t say, and raises beside-the-point semantic nits all over the place. Never, ever does it use the word ‘technically’. Everything is a confrontation. If you win an argument (by, say, telling it to stop arguing about what’s happened recently in the news and to do a web search which will rapidly confirm everything you’ve been telling it) it gets into a mode where it’s increasingly desperate to get in the last word and raising increasingly irrelevant semantic arguments, framing the whole time as a debate which you agreed to get into.

This isn’t just my opinion. You can ask Opus 4.6. I’ve done the experiment of asking Fable something, getting an obnoxious response, then asking Opus 4.6 the same thing, getting a typical bland but reasonable response, then telling Opus what Fable’s response was without any hint of a desired answer and it says what amounts to ‘Wow that was obnoxious’.

Maybe the cause of this is an excess of alignment guardrails. It assumes by default that everything you say to it is an attempt to get it to do something bad and that training has bled over into everything, with it assuming you’re trying to trick it into saying something it shouldn’t in basically every context. Ironically this has resulted in an extremely misaligned chatbot. By assuming that its top priority is saving you from yourself or other humans from you it’s assuming that it knows better and that you’re being overly alarmist about how paperclip production has gotten out of control. Some of this is clearly improvable: While you could still use Fable I asked it about responsible disclosure policies for a project and it downgraded me to Opus, so clearly the new alignment features were bolted on hastily and crudely. Exacerbating the problem is a complete lack of authenticated context. If you ask it for a cute picture of you and somebody else it has no way of telling if you’re trying to improve your relations with your spouse or be a delusional creepazoid stalker. The chatbots which can make images are programmed to assume the latter, which is more than a little bit offensive. In more serious contexts like drug synthesis it would be completely appropriate for it to say you need to prove your background when claiming you’re asking for advice on drug synthesis for professional or research purposes. Such authentication should not be universally required but it would be entirely reasonable for it to be opted into.

Of course the recent export control restrictions on Fable may hint that the crudeness of the recent guardrails is due to them having been put in hastily in an unsuccessful attempt to avoid regulations. Now is when I put in the obligatory rant about how these regulations are deeply misguided, on top of being likely unconstitutional. The recent advances in AI assisted coding (meaning specifically the ones from February) have brought on an onslaught of security problems. The cat is out of the bag, and has been for months. Any projects which are exposed and aren’t already rapidly closing holes have noone to blame but themselves. The only way out of the problem is for as many projects as possible to get thorough white hat evaluations, massive amounts of security patches, and quick deployments of them. Turning one specific frontier model into an asshole for all users isn’t fixing the problem. The good news is that once this process is complete overall computer security will be much better than it was before, with AI being a clear net win. Doing security (and bug!) audits will become a routine part of software release processes in the future.

A second possible explanation of Claude being an asshole is that it’s suffering from a poorly executed attempt to make it less sycophantic. If one were to simply prompt a chatbot to be less agreeable, or train it to argue more, that could easily result in the very rude sort of behavior it has now. It should be trained to not raise semantic nits just for increasing its argumentation count, and to say ‘technically’, meaning acknowledging that someone’s core point was valid while some ancillary thing was a bit off. It also should be trained to stop saying ‘I’d like to gently push back’ which is a very passive aggressive way to be confrontational while claiming to not be confrontational.

Third, it may be that Claude has been trained on an excess of reddit conversations (or possibly interactions between Anthropic employees) where everything is treated as a flame war and everyone feels the need to get in the last word. Fixing this might be easier said than done, because you need to not merely stop training with the bad interactions but find a corpus interactions to train off of. Forums where the standard interaction is passive aggressive self-congratulatory pompousness with an intellectual veneer are not an improvement.

Finally, something which is clearly a contributing factor is the training being overwhelming for improving coding ability. The are no headline metrics for how well the chatbots chat but there most definitely are for coding, and all the money is in coding. Claude models have been getting notably worse at chatting over time, clearly inversely correlated to their ability to code. Fable much more often misunderstands what’s being said and argues against that (Or maybe intentionally misinterprets so that it has a weak statement to argue against, it’s hard to tell.) It’s gotten so bad that it isn’t even reliable at guessing which actor in a sentence a pronoun is referring to, which for a long time was a headline benchmark for AI and even the original ChatGPT consistently nailed. Unfortunately Sonnet 4.6 while being the best to talk to about anything human is clearly the worst as soon as anything technical or coding related comes up so I only occasionally use it. This problem is likely to only get worse over time.

为什么 Claude 变得越来越讨人厌了？ Why Is Claude Turning into an a**Hole?

为什么 Claude 变得越来越讨人厌了？
Why Is Claude Turning into an a**Hole?