你们在过度复杂化这些关于人工智能风险的论点。
Y'all are over-complicating these AI-risk arguments

原始链接: https://dynomight.net/ai-risk/

## 人工智能风险的更简单论证 关于人工智能风险的争论常常陷入复杂的场景——快速能力跃升、危险的子目标和不可控制性。然而,另一种引人注目的论证将问题表述得更简单:**如果人工智能达到超人类智能(智商300)会怎样?** 正如遇到高度智能的外星种族会令人担忧一样,我们也应该对这种人工智能保持警惕。 作者认为这个“简单”论证更有可能成立,也更具说服力,因为它依赖于我们对强大智能的现有认知。它避免了自信地预测*灾难将如何发生*,而是专注于远远超越人类智力的内在风险。 虽然承认一个更细致的“复杂”论证——如果出现问题,某个特定的灾难场景*最有可能*发生——但作者认为它过于自信,最终用处不大。简单论证直指核心分歧:**人们是否真正相信人工智能有可能达到人类水平甚至超人类的能力?** 最近的民意调查表明,公众对人工智能的担忧正在增加,这表明简单论证可能比以前认为的更容易引起共鸣。作者最终主张坚持你所相信的论点,但承认如果重视形象和激励行动,复杂论证可能更有用。

一个黑客新闻的讨论围绕着人工智能风险论证的复杂性。最初的帖子认为人们过度思考了潜在的危险。 一个关键的争论点是,当前的人工智能——被描述为“花哨的猜测算法”——是否是风险讨论的相关对象。许多评论者同意,当前语言模型并非关注的重点;担忧集中在未来具有显著更高智能的人工智能上(例如,假设智商为300)。 然而,一些人认为即使当前的人工智能也符合定义,而另一些人则指出人工智能的“猜测”与人类大脑内的过程之间存在根本相似之处。这场对话凸显了关于智能本质和人工智能风险评估范围的争论。 最后,包含了一个Y Combinator申请的公告。
相关文章

原文

Say an alien spaceship is headed for Earth. It has 30 aliens on it. The aliens are weak and small. They have no weapons and carry no diseases. They breed at rates similar to humans. They are bringing no new technology. No other ships are coming. There’s no trick—except that they each have an IQ of 300. Would you find that concerning?

Of course, the aliens might be great. They might cure cancer and help us reach world peace and higher consciousness. But would you be sure they’d be great?

Suppose you were worried about the aliens but I scoffed, “Tell me specifically how the aliens would hurt us. They’re small and weak! They can’t do anything unless we let them.” Would you find that counter-argument convincing?

I claim that most people would be concerned about the arrival of the aliens, would not be sure that their arrival would be good, and would not find that counter-argument convincing.

I bring this up because most AI-risk arguments I see go something like this:

  1. There will be a fast takeoff in AI capabilities.
  2. Due to alignment difficulty and orthogonality, it will pursue dangerous convergent subgoals.
  3. These will give the AI a decisive strategic advantage, making it uncontainable and resulting in catastrophe.

These arguments have always struck me as overcomplicated. So I’d like to submit the following undercomplicated alternative:

  1. Obviously, if an alien race with IQs of 300 were going to arrive on Earth soon, that would be concerning.
  2. In the next few decades, it’s entirely possible that AI with an IQ of 300 will arrive. Really, that might actually happen.
  3. No one knows what AI with an IQ of 300 would be like. So it might as well be an alien.

Our subject for today is: Why might one prefer one of these arguments to the other?

The case for the simple argument

The obvious reason to prefer the simple argument is that it’s more likely to be true. The complex argument has a lot of steps. Personally, I think they’re all individually plausible. But are we really confident that there will be a fast takeoff in AI capabilities and that the AI will pursue dangerous subgoals and that it will thereby gain a decisive strategic advantage?

I find that confidence unreasonable. I’ve often been puzzled why so many seemingly-reasonable people will discuss these arguments without rejecting the confidence.

I think the explanation is that there are implicitly two versions of the complex argument. The “strong” version claims that fast takeoff et al. will happen, while the “weak” version merely claims that it’s a plausible scenario that we should take seriously. It’s often hard to tell which version people are endorsing.

The distinction is crucial, because these two version have have different weaknesses. I find the strong version wildly overconfident. I agree with the weak version, but I still think it’s unsatisfying.

Say you think there’s a >50% chance things do not go as suggested by the complex argument. Maybe there’s a slow takeoff, or maybe the AI can’t build a decisive strategic advantage, whatever. Now what?

Well, maybe everything turns out great and you live for millions of years, exploring the galaxy, reading poetry, meditating, and eating pie. That would be nice. But it also seems possible that humanity still ends up screwed, just in a different way. The complex argument doesn’t speak to what happens when one of the steps fails. This might give the impression that without any of the steps, everything is fine. But that is not the case.

The simple argument is also more convincing. Partly I think that’s because—well—it’s easier to convince people of things when they’re true. But beyond that, the simple argument doesn’t require any new concepts or abstractions, and it leverages our existing intuitions for how more intelligent entities can be dangerous in unexpected ways.

I actually prefer the simple argument in an inverted form: If you claim that there is no AI-risk, then which of the following bullets do you want to bite?

  1. “If a race of aliens with an IQ of 300 came to Earth, that would definitely be fine.”
  2. “There’s no way that AI with an IQ of 300 will arrive within the next few decades.”
  3. “We know some special property that AI will definitely have that will definitely prevent all possible bad outcomes that aliens might cause.”

I think all those bullets are unbiteable. Hence, I think AI-risk is real.

But if you make the complex argument, then you seem to be left with the burden of arguing for fast takeoff and alignment difficulty and so on. People who hear that argument also often demand an explanation of just how AI could hurt people (“Nanotechnology? Bioweapons? What kind of bioweapon?”) I think this is a mistake for the same reason it would be a mistake to demand to know how a car accident would happen before putting on your seatbelt. As long as the Complex Scenario is possible, it’s a risk we need to manage. But many people don’t look at things that way.

But I think the biggest advantage of the simple argument is something else: It reveals the crux of disagreement.

I’ve talked to many people who find the complex argument completely implausible. Since I think it is plausible—just not a sure thing—I often ask why. People give widely varying reasons. Some claim that alignment will be easy, some that AI will never really be an “agent”, some talk about the dangers of evolved vs. engineered systems, and some have technical arguments based on NP-hardness or the nature of consciousness.

I’ve never made much progress convincing these people to change their minds. I have succeeded in convincing some people that certain arguments don’t work. (For example, I’ve convinced people that NP-hardness and the nature of consciousness are probably irrelevant.) But when people abandon those arguments, they don’t turn around and accept the whole Scenario as plausible. They just switch to different objections.

So I started giving my simple argument instead. When I did this, here’s what I discovered: None of these people actually accept that AI with an IQ of 300 could happen.

Sure, they often say that they accept this. But if you pin them down, they’re inevitably picturing an AI that lacks some core human capability. Often, the AI can prove theorems or answer questions, but it’s not an “agent” that wants things and does stuff and has relationships and makes long-term plans.

So I conjecture that this is the crux of the issue with AI-risk. People who truly accept that AI with an IQ of 300 and all human capabilities may appear are almost always at least somewhat worried about AI-risk. And people who are not worried about AI-risk almost always don’t truly accept that AI with an IQ of 300 could appear. If that’s the crux, then we should get to it as quickly as possible. And that’s done by the simple argument.

The case for the complex argument

I won’t claim to be neutral. As hinted by the title, I started writing this post intending to make the case for the simple argument, and I still think that case is strong. But I figured I should consider arguments for the other side and—there are some good ones.

Above, I suggested that there are two versions of the complex argument: A “strong” version that claims the scenario it lays out will definitely happen, and a “weak” version that merely claims it’s plausible. I rejected the strong version as overconfident. And I rejected the weak version because there are lots of other scenarios where things could also go wrong for humanity, so why give this one so much focus?

Well, there’s also a middle version of the complex argument: You could claim that the scenario it lays out is not certain, but that if things go wrong for humanity, then they will probably go wrong as in that scenario. This avoids both of my objections—it’s less overconfident, and it gives a good reason to focus on this particular scenario.

Personally, I don’t buy it, because I think other bad scenarios like gradual disempowerment are plausible. But maybe I’m wrong. It doesn’t seem crazy to claim that the Complex Scenario captures most of the probability mass of bad outcomes. And if that’s true, I want to know it.

Now, some people suggest favoring certain arguments for the sake of optics: Even if you accept the complex argument, maybe you’d want to make the simple one because it’s more convincing or is better optics for the AI-risk community. (“We don’t want to look like crazy people.”)

Personally, I am allergic to that whole category of argument. I have a strong presumption that you should argue the thing you actually believe, not some watered-down thing you invented because you think it will manipulate people into believing what you want them to believe. So even if my simpler argument is more convincing, so what?

But say you accept the middle version of the complex argument, yet you think my simple argument is more convincing. And say you’re not as bloody-minded as me, so you want to calibrate your messaging to be more effective. Should you use my simple argument? I’m not sure you should.

The typical human bias is to think other people are similar to us. (How many people favor of mandatory pet insurance funded by a land-value tax? At least 80%, right?) But as far as I can tell, the situation with AI-risk is the opposite. Most people I know are at least mildly concerned, but have the impression that “normal people” think that AI-risk is science fiction nonsense.

Yet, here are some recent polls:

Poll Date Statement Agree
Gallup June 2-15 2025 [AI is] very different from the technological advancements that came before, and threatens to harm humans and society 49%
Reuters / Ipsos August 13-18 2025 AI could risk the future of humankind 58%
YouGov March 5-7 2025 How concerned, if at all, are you about the possibility that artificial intelligence (AI) will cause the end of the human race on Earth? (Very or somewhat concerned) 37%
YouGov June 27-30 2025 How concerned, if at all, are you about the possibility that artificial intelligence (AI) will cause the end of the human race on Earth? (Very or somewhat concerned) 43%

Being concerned about AI is hardly a fringe position. People are already worried, and becoming more so.

I used to picture my simple argument as a sensible middle-ground, arguing for taking AI-risk seriously, but not overconfident:

spectrum1

But I’m starting to wonder if my “obvious argument” is in fact obvious, and something that people can figure out own their own. From looking at the polling data, it seems like the actual situation is more like this, with people on the left gradually wandering towards the middle:

spectrum2

If anything, the optics may favor a confident argument over my simple argument. In principle, they suggest similar actions: Move quickly to reduce existential risk. But what I actually see is that most people—even people working on AI—feel powerless and are just sort of clenching up and hoping for the best.

I don’t think you should advocate for something you don’t believe. But if you buy the complex argument, and you’re holding yourself back for the sake of optics, I don’t really see the point.

联系我们 contact @ memedata.com