萨茨克维尔和勒丘恩：扩大语言模型规模不会带来更有用的结果。

萨茨克维尔和勒丘恩：扩大语言模型规模不会带来更有用的结果。
Ilya Sutskever, Yann LeCun and the End of “Just Add GPUs”

原始链接: https://www.abzglobal.net/web-development-blog/ilya-sutskever-yann-lecun-and-the-end-of-just-add-gpus

## 人工智能的转变：从规模化到创新领先的人工智能人物伊利亚·苏茨克维尔（OpenAI）和杨·勒丘恩（Meta）均暗示了一个转折点：当今的大型语言模型（LLM）正接近其极限。过去十年，通过规模化——更大的模型、更多的数据、更多的计算能力——取得了快速进展，但收益正在递减。我们正在进入一个新的“研究时代”，创新理念，而不仅仅是硬件，将推动进步。苏茨克维尔强调了LLM泛化能力的问题、不透明的学习过程以及基准测试性能与实际效用之间的差距。勒丘恩认为LLM从根本上缺乏对物理世界的理解，并提出了“世界模型”——通过环境交互学习的系统——作为更有希望的途径，例如JEPA架构。对于开发者而言，这意味着仅仅访问更多计算能力带来的优势将减少。成功将取决于**以用户为中心的指标**（可靠性、可解释性），而不是排行榜分数，**拥有高质量的数据**，以及构建强大的**反馈循环**。预计模型类型将日益多样化，并且需要协调多个系统，而不是仅仅依赖LLM API。重点正在从追求参数数量转向在产品中构建智能——优先考虑推理、规划以及与人类工作流程的无缝集成。

根据 Hacker News 上最近的讨论，著名人工智能研究员 Ilya Sutskever 和 Yann LeCun 都对仅仅*扩大*大型语言模型 (LLM) 的规模能否带来显著改进或实现通用人工智能 (AGI) 表示怀疑。虽然承认他们担忧的合理性，一位评论员质疑 Sutskever 提出的解决方案，认为更实际的途径是公开部署并根据用户反馈进行迭代改进——相信像 Google、OpenAI 和 Anthropic 这样的公司可以通过自由市场的方式达到最佳系统。另一些人指出 Sutskever 和 LeCun 的观点最近趋于一致，Sutskever 越来越支持仅靠扩大规模无法解决问题的观点。一位评论员也讽刺地认为，原文可能质量不高（“AI 垃圾”）。

原文

When two of the most influential people in AI both say that today’s large language models are hitting their limits, it’s worth paying attention.

In a recent long-form interview, Ilya Sutskever – co-founder of OpenAI and now head of Safe Superintelligence Inc. – argued that the industry is moving from an “age of scaling” to an “age of research”. At the same time, Yann LeCun, VP & Chief AI Scientist at Meta, has been loudly insisting that LLMs are not the future of AI at all and that we need a completely different path based on “world models” and architectures like JEPA.

As developers and founders, we’re building products right in the middle of that shift.

This article breaks down Sutskever’s and LeCun’s viewpoints and what they mean for people actually shipping software.

1. Sutskever’s Timeline: From Research → Scaling → Research Again

Sutskever divides the last decade of AI into three phases:

1.1. 2012–2020: The first age of research

This is the era of “try everything”:

convolutional nets for vision
sequence models and attention
early reinforcement learning breakthroughs
lots of small experiments, new architectures, and weird ideas

There were big models, but compute and data were still limited. The progress came from new concepts, not massive clusters.

1.2. 2020–2025: The age of scaling

Then scaling laws changed everything.

The recipe became:

More data + more compute + bigger models = better results.

You didn’t have to be extremely creative to justify a multi-billion-dollar GPU bill. You could point to a curve: as you scale up parameters and tokens, performance climbs smoothly.

This gave us:

1.3. 2025 onward: Back to an age of research (but with huge computers)

Now Sutskever is saying that scaling alone is no longer enough:

The industry is already operating at insane scale.
The internet is finite, so you can’t just keep scraping higher-quality, diverse text forever.
The returns from “just make it 10× bigger” are getting smaller and more unpredictable.

We’re moving into a phase where:

The clusters stay huge, but progress depends on new ideas, not only new GPUs.

2. Why the Current LLM Recipe Is Hitting Limits

Sutskever keeps circling three core issues.

2.1. Benchmarks vs. real-world usefulness

Models look god-tier on paper:

But everyday users still run into:

So there’s a gap between benchmark performance and actual reliability when someone uses the model as a teammate or co-pilot.

2.2. Pre-training is powerful, but opaque

The big idea of this era was: pre-train on enormous text + images and you’ll learn “everything”.

It worked incredibly well… but it has downsides:

you don’t fully control what the model learns
when it fails, it’s hard to tell if the issue is data, architecture, or something deeper
pushing performance often means more of the same, not better understanding

That’s why there’s so much focus now on post-training tricks: RLHF, reward models, system prompts, fine-tuning, tool usage, etc. We’re papering over the limits of the pre-training recipe.

2.3. The real bottleneck: generalization

For Sutskever, the biggest unsolved problem is generalization.

Humans can:

learn a new concept from a handful of examples
transfer knowledge between domains
keep learning continuously without forgetting everything

Models, by comparison, still need:

Even the best systems today generalize much worse than people. Fixing that is not a matter of another 10,000 GPUs; it needs new theory and new training methods.

3. Safe Superintelligence Inc.: Betting on New Recipes

Sutskever’s new company, Safe Superintelligence Inc. (SSI), is built around a simple thesis:

SSI is not rushing out consumer products. Instead, it positions itself as:

focused on long-term research into superintelligence
trying to invent new training methods and architectures
putting safety and controllability at the core from day one

Instead of betting that “GPT-7 but bigger” will magically become AGI, SSI is betting that a different kind of model, trained with different objectives, will be needed.

4. Have Tech Companies Overspent on GPUs?

Listening to Sutskever, it’s hard not to read between the lines:

Huge amounts of money have gone into GPU clusters on the assumption that scale alone would keep delivering step-function gains.
We’re discovering that the marginal gains from scaling are getting smaller, and progress is less predictable.

That doesn’t mean the GPU arms race was pointless. Without it, we wouldn’t have today’s LLMs at all.

But it does mean:

The next major improvements will likely come from smarter algorithms, not merely more expensive hardware.
Access to H100s is slowly becoming a commodity, while genuine innovation moves back to ideas and data.

For founders planning multi-year product strategies, that’s a big shift.

5. Yann LeCun’s Counterpoint: LLMs Aren’t the Future at All

If Sutskever is saying “scaling is necessary but insufficient,” Yann LeCun goes further:

LLMs, as we know them, are not the path to real intelligence.

He’s been very explicit about this in talks, interviews and posts.

5.1. What LeCun doesn’t like about LLMs

LeCun’s core criticisms can be summarized in three points:

Limited understanding
LLMs are great at manipulating text but have a shallow grasp of the physical world.
They don’t truly “understand” objects, physics or causality – all the things you need for real-world reasoning and planning.
A product-driven dead-end
He sees LLMs as an amazing product technology (chatbots, assistants, coding helpers) but believes they are approaching their natural limits.
Each new model is larger and more expensive, yet delivers smaller improvements.
Simplicity of token prediction
Under the hood, an LLM is just predicting the next token. LeCun argues this is a very narrow, simplistic proxy for intelligence.
For him, real reasoning can’t emerge from next-word prediction alone.

5.2. World models and JEPA

Instead of LLMs, LeCun pushes the idea of world models – systems that:

learn by watching the world (especially video)
build an internal representation of objects, space and time
can predict what will happen next in that world, not just what word comes next

One of the architectures he’s working on is JEPA – Joint Embedding Predictive Architecture:

it learns representations by predicting future embeddings rather than raw pixels or text
it’s designed to scale to complex, high-dimensional input like video
the goal is a model that can support persistent memory, reasoning and planning

5.3. Four pillars of future AI

LeCun often describes four pillars any truly intelligent system needs:

Understanding of the physical world
Persistent memory
Reasoning
Planning

His argument is that today’s LLM-centric systems mostly hack around these requirements instead of solving them directly. That’s why he’s increasingly focused on world-model architectures instead of bigger text models.

6. Sutskever vs. LeCun: Same Diagnosis, Different Cure

What’s fascinating is that Sutskever and LeCun agree on the problem:

Where they differ is how radical the change needs to be:

Sutskever seems to believe that the next breakthroughs will still come from the same general family of models – big neural nets trained on massive datasets – but with better objectives, better generalization, and much stronger safety work.
LeCun believes we need a new paradigm: world models that learn from interaction with the environment, closer to how animals and humans learn.

For people building on today’s models, that tension is actually good news: it means there is still a lot of frontier left.

7. What All This Means for Developers and Founders

So what should you do if you’re not running an AI lab, but you are building products on top of OpenAI, Anthropic, Google, Meta, etc.?

7.1. Hardware is becoming less of a moat

If the next big gains won’t come from simply scaling, then:

the advantage of “we have more GPUs than you” decreases over time
your real edge comes from use cases, data, UX and integration, not raw model size

This is good for startups and agencies: you can piggyback on the big models and still differentiate.

7.2. Benchmarks are not your product

Both Sutskever’s and LeCun’s critiques are a warning against obsessing over leaderboards.

Ask yourself:

Does this improvement meaningfully change what my users can do?
Does it reduce hallucinations in their workflows?
Does it make the system more reliable, debuggable and explainable?

User-centric metrics matter more than another +2% on some synthetic reasoning benchmark.

7.3. Expect more diversity in model types

If LeCun’s world models, JEPA-style architectures, or other alternatives start to work, we’ll likely see:

specialized models for physical reasoning and robotics
LLMs acting as a language interface over deeper systems that actually handle planning and environment modeling
more hybrid stacks, where multiple models collaborate

For developers, that means learning to orchestrate multiple systems instead of just calling one chat completion endpoint.

7.4. Data, workflows and feedback loops are where you win

No matter who is right about the far future, one thing is clear for product builders:

Owning high-quality domain data
Designing tight feedback loops between users and models
Building evaluations that match your use case

…will matter more than anything else.

You don’t need to solve world modeling or superintelligence yourself. You need to:

pick the right model(s) for the job
wrap them in workflows that make sense for your users
keep improving based on real-world behavior

8. A Quiet Turning Point

In 2019–2021, the story of AI was simple: “scale is all you need.” Bigger models, more data, more GPUs.

Now, two of the field’s most influential figures are effectively saying:

We’re entering a new phase where research, theory and new architectures matter again as much as infrastructure.

For builders, that doesn’t mean you should stop using LLMs or pause your AI roadmap. It means:

focus less on chasing the next parameter count
focus more on how intelligence shows up inside your product: reliability, reasoning, planning, and how it fits into real human workflows

The GPU race gave us today’s tools. The next decade will be defined by what we do with them – and by the new ideas that finally move us beyond “predict the next token.”