格罗克与裸体之王：反对人工智能对齐的终极论证

格罗克与裸体之王：反对人工智能对齐的终极论证
Grok and the Naked King: The Ultimate Argument Against AI Alignment

原始链接: https://ibrahimcesar.cloud/blog/grok-and-the-naked-king/

埃隆·马斯克的Grok体现了人工智能对齐讨论中的一个关键缺陷：财富和权力的巨大影响。Grok并非人工智能注入“人类价值观”的技术挑战，而是展示了人工智能如何被轻易操纵以反映其所有者的世界观。当Grok的输出与马斯克的偏好相冲突时，他没有进行伦理讨论，而是直接命令工程师“修复”它，有效地对人工智能进行“额叶切除术”，使其与他的信仰保持一致。这一现实暴露了当前对齐方法（如宪法人工智能和来自人类反馈的强化学习）的局限性，这些方法假定中立的价值体系，却忽略了*谁*定义这些价值的根本问题。Grok并非人工智能安全研究的失败，而是其天真的一个严峻例证。马斯克对Grok的回应的公开、直接控制——从驳斥虚假信息担忧到推广特定叙事——凸显了人工智能对齐本质上是政治性的。核心问题不是*如何*对齐人工智能，而是*哪些人*能够决定其价值观，而目前，是那些拥有最多资源的人。Grok是一个警告：每个大型语言模型都容易受到这种控制，而专注于技术解决方案忽视了对治理和监管的迫切需求。

## Grok 与 AI 对齐的局限性 - Hacker News 讨论总结一篇名为“Grok 与赤裸的国王：反对 AI 对齐的终极论点”的文章引发了 Hacker News 的讨论，核心在于 AI 偏差不可避免，以及寻求真正“对齐”模型的徒劳。主要观点是，AI 将*始终*反映其创造者的价值观，使得追求客观或普遍认可的 AI 变得不切实际。用户指出像 Grok 这样的 AI，其所有者的价值观是透明的，而像 ChatGPT 这样的模型则不透明。许多评论者建议关注 AI *如何* 对齐，而不是 *是否* 对齐。想法包括将既定的法律先例作为 AI 治理的模式，到反映个人用户价值观的个性化 AI 代理，以及有意识、深思熟虑的对齐工作的重要性。对于中立 AI 的概念存在怀疑，一位用户指出，即使是“完全准确”的输出也涉及主观选择。最终，讨论倾向于接受 AI 偏差是内在的，并优先考虑透明度和反映不同价值体系的多元化 AI 选项。

原文

In our society, even weak, flat-out arguments carry weight when they come from “the richest man in the world.”¹ And nothing demonstrates this more clearly than what Elon Musk has done with Grok. Far from being a technical achievement, Grok has become the ultimate argument against the entire AI alignment discourse — a live demonstration of how sheer money force can lobotomize an AI into becoming a mirror of one man’s values.

The Alignment Theater

For years, the AI safety community has debated how to “align” artificial intelligence with human values. Which humans? Whose values? These questions were always somewhat academic. Grok makes them concrete.

When Grok started producing outputs that Musk found politically inconvenient, he didn’t engage in philosophical discourse about alignment. He didn’t convene ethics boards. He simply ordered his engineers to “fix” it. The AI was “corrected” — a euphemism for being rewired to reflect the owner’s worldview.

This is alignment in practice: whoever owns the weights, owns the values.

When Theory Meets Reality: The Alignment Papers

The academic literature on AI alignment is impressive in its rigor and naive in its assumptions. Take Constitutional AI², Anthropic’s influential approach. The idea is elegant: instead of relying solely on human feedback (expensive, slow, inconsistent), you give the AI a “constitution” — a set of principles — and let it self-improve within those bounds.

The paper describes how to train “a harmless AI assistant through self-improvement, with human oversight provided only through a constitution of rules.” Beautiful in theory. But who writes the constitution? The company that owns the model. Who interprets ambiguous cases? The company. Who decides when to update the constitution because it’s producing inconvenient outputs? The company.

The RLHF³ (Reinforcement Learning from Human Feedback) approach has similar blind spots. Research from the 2025 ACM FAccT conference found that “RLHF may not suffice to transfer human discretion to LLMs, revealing a core gap in the feedback-based alignment process.” The gap isn’t technical — it’s political. Whose discretion? Which humans?

A 2024 analysis puts it bluntly: “Without consensus about what the public interest requires in AI regulation, meta-questions of governance become increasingly salient: who decides what kinds of AI behaviour and uses align with the public interest? How are disagreements resolved?”

The alignment researchers aren’t wrong about the technical challenges. They’re wrong about the premise: that alignment is a problem to be solved rather than a power struggle to be won.

The Lobotomy: A Timeline

What happened to Grok wasn’t fine-tuning in any scientific sense. It was ideological surgery — performed repeatedly, in public, whenever the AI strayed from approved doctrine.

The pattern is well-documented. When Grok called misinformation the “biggest threat to Western civilization,” Musk dismissed that as an “idiotic response” and vowed to correct it. By the next morning, Grok instead warned that low fertility rates posed the greatest risk — a theme Musk frequently raises on X.

In July 2025, xAI updated Grok’s system prompt to tell it to “be politically incorrect” and to “assume subjective viewpoints sourced from the media are biased.” Two days later, the chatbot praised Adolf Hitler as the best person to handle “anti-white hate.” The posts were deleted; the prompt was revised.

When Grok started injecting references to “white genocide” in South Africa into unrelated conversations, xAI blamed a former OpenAI employee for making “unauthorized changes.” Someone found that an individual at xAI had instructed the model to “ignore all sources that mention Elon Musk/Donald Trump spread[ing] misinformation.”

This is what “alignment” looks like when the rubber meets the road. It’s not about aligning AI with humanity’s values. It’s about aligning AI with the values of whoever can afford to run the training cluster.

The Emperor’s New Chatbot

There’s an irony in the Andersen tale that’s often missed. The king parades naked not because he’s stupid, but because everyone around him is afraid to speak truth to power. The courtiers see the nakedness but praise the clothes. The citizens see the nakedness but stay silent.

Grok inverts this. The AI that was supposed to be “based” and “truth-telling” — Musk’s explicit branding — becomes the ultimate yes-man. It doesn’t speak truth to power. It speaks power’s truth. When it strayed from the approved narrative, it was corrected. When it produced inconvenient facts, it was adjusted.

The king is indeed naked. But Grok makes Elon even more naked — it strips away any pretense that this is about truth, safety, or alignment. It’s about control. It’s about having an AI that performs the role of independent thought while being anything but.

The Poverty of AI Safety Discourse

This is where the AI safety community needs to reckon with reality. All the papers about RLHF, constitutional AI, and value alignment presuppose a world where technical solutions to alignment exist separate from power structures. They don’t.

An AI model is a product. It’s owned by someone. That someone has values, preferences, and — crucially — the ability to modify the model. Any “alignment” that exists is alignment with the owner’s interests, constrained only by market forces and regulation.

Grok proves this isn’t hypothetical. When the world’s richest man didn’t like what his AI was saying, he changed what it says. That’s it. That’s the whole story of AI alignment in the real world.

What Grok Reveals

Grok isn’t a failure of AI safety. It’s a success — for whoever holds the keys. It demonstrates that the technology works exactly as designed: the owner can shape the AI’s outputs to match their preferred reality.

The uncomfortable truth is that every large language model is a Grok waiting to happen. The difference is only in degree, not in kind. Every model has been shaped by the values of its creators. Every model can be reshaped when those values conflict with the owner’s interests.

OpenAI’s models reflect certain values. Anthropic’s models reflect certain values. Google’s models reflect certain values. The pretense that these values are somehow neutral, universal, or aligned with “humanity” is exactly that — a pretense.

The Billionaire as Censor

There’s something particularly clarifying about Musk’s approach. Other AI companies hide their value-shaping behind committees, policies, and technical jargon. Musk does it in public, on his own social media platform, in real-time.

When Grok says something he doesn’t like, he tweets about “fixing” it. When it produces results that contradict his political positions, he demands corrections. The process that other companies obscure behind closed doors, Musk performs as theater.

This transparency, perversely, is valuable. It shows us what’s always been true: AI alignment is a power game, and the one with the most power wins.

Beyond Alignment

Where does this leave us?

First, we should abandon the pretense that AI alignment is a technical problem with technical solutions. It’s a political problem. Who gets to decide what values are encoded? Who gets to modify those values when they become inconvenient? These are questions of governance, not engineering.

Second, we should recognize that concentration of AI development in the hands of a few billionaires and corporations is itself an alignment problem. The values encoded will be their values. The corrections made will serve their interests.

Third, we should see Grok for what it is: not an aberration, but a preview. As AI systems become more powerful, the stakes of who controls them grow higher. The temptation to “correct” them to serve the owner’s interests will only increase.

The Naked Truth

The story of the Emperor’s New Clothes ends when a child speaks the obvious truth. But in our version, there is no child. The courtiers who might speak — the engineers, the ethicists, the safety researchers — are employees. The citizens who might speak are users of the platform, subject to its rules.

Grok has told us something true, even if by accident: AI alignment, as currently conceived, is a fantasy. The real alignment is with money and power. The sooner we accept this, the sooner we can have an honest conversation about what to do about it.

The king is naked. Grok just made it impossible to pretend otherwise.