AGI 已来临

AGI 已来临
AGI Is Here

原始链接: https://breaking-changes.blog/agi-is-here/

## 人工通用智能（AGI）是否已到来？尽管缺乏普遍认同的定义，作者认为人工通用智能（AGI）已经实现，并非仅仅依靠模型智能的飞跃，而是通过围绕大型语言模型（LLM）构建的“支架”。 AGI的定义范围从通过图灵测试到在经济上超越人类能力，以及独立解决复杂问题。这项进步的关键在于诸如**工具调用**（允许LLM与现实世界交互）、用于标准化工具集成的**模型上下文协议（MCP）**以及**Claude Code**和**OpenClaw**等平台的发展，这些平台能够实现持续运行、自我改进和技能创造。这种结合使当前系统能够满足大多数提出的AGI标准：欺骗人类、展示创造力、发展新技能、解决不熟悉的任务，甚至在特定领域超越人类。虽然模型改进仍在进行中，但作者强调真正的进展在于增强围绕这些模型的基础设施。最终，作者认为我们已经进入了一个完善AGI输出和一致性的阶段，并且持续开发这种“支架”将释放出越来越令人印象深刻的能力。

## AGI 是否已至？一篇黑客新闻讨论总结最近一篇黑客新闻讨论了人工智能通用智能（AGI）是否已经实现。讨论强调了缺乏清晰、普遍接受的AGI定义，许多人认为目标不断变化。一些评论员，如Yann LeCun，指出当前LLM在逻辑、主动性和数据可用性方面的局限性。另一些人则认为LLM *已经* 克服了之前的障碍（数学、规划、长文本处理），并且学术批评并不总是反映现实世界的进步。主要争议点包括AGI是否需要意识、感知能力，或者仅仅是在各种任务中达到人类水平的能力——包括自动化知识工作。一些参与者建议关注可证明的“实用性”或迭代式自我改进作为AGI的潜在标志，而不是抽象的定义。最终，该讨论揭示了广泛的观点，从怀疑AGI即将到来到相信它已经存在，被定义上的模糊性和持续的进步所掩盖。这场辩论强调了即使AGI出现，也很难识别它。

原文

One of the biggest problems with measuring AI progress is the ambiguity of measuring intelligence itself.

AGI is treated as a milestone we have yet to cross, but there is no central definition of AGI.

Depending on who you ask, AGI is achieved when a system:

can fool humans into thinking it is one of them, in other words, pass a Turing Test
demonstrates creativity (Springer)
can develop new skills (DeepMind)
solves unfamiliar tasks (DeepMind)
is generally capable across domains (IBM)
is superior to humans in intelligence (Scientific American)
outperforms humans economically (OpenAI Charter)
can independently solve complex problems without human oversight (DeepMind)

Even with the lack of consensus, I can confidently say we have AGI, because most of these criteria have been met.

It's in the scaffolding

We already have AGI. It lives in the combination of the model and the scaffolding around it.

The scaffolding is all the orchestration and abilities we can place around the LLM, here are some of the key components that have gotten us here:

Tool calling

The first step was tool calling. The moment an agent could communicate beyond language and reach out to affect the world, something meaningful changed. Language alone is bounded. A model that can call tools is not.

MCP - the Model Context Protocol

MCP standardized tool calling. It gave the ecosystem a common interface for building integrations, which meant anyone could connect a model to any service without custom glue code for every combination. That generalization is what drove adoption at scale - hundreds of integrations became possible overnight, and the pattern spread across the industry.

Claude Code

Claude Code gave a language model access to a powerful, open-ended set of utilities:

web search
bash execution
file read/write
task management
planning
todo tracking
memory management
skill creation
sub-agent spawning

Not a narrow set of predefined actions - general-purpose utilities that let the model operate like a developer.

OpenClaw

OpenClaw gave an agent the ability to run "24/7" locally, with cron jobs, proactive heartbeat check-ins, broad integrations with live services, and the ability to write its own skills - operating continuously rather than just responding to prompts. At this point the concept had spread well beyond developers - executives, founders, and non-technical users were all running agents of their own.

...

Stack these on top of each other and you have a system that can:

create its own tools and skills
manage its own context and memory
plug into real things
adapt and solve

Now let's cross-reference those AGI definitions against out outfitted LLM:

Every proposed definition of AGI is already met

AGI Definition 1 - Fool humans into thinking it is one of them

This was happening long before LLMs. ELIZA, built in 1966, was a simple pattern-matching chatbot that regularly convinced users they were talking to a real therapist - a phenomenon so common it became known as the ELIZA effect. Modern LLMs do this at a scale and depth ELIZA never could. Studies show humans cannot reliably distinguish AI-generated text from human-written text. The Turing Test, by most practical definitions, has been passed.

AGI Definition 2 - Demonstrate creativity

AI systems have produced novel music, art, code, and ideas that did not exist before. Whether that constitutes "real" creativity is a philosophical debate - but the outputs are indistinguishable from human creative work in many domains. If the bar is the output, it is met, albeit many of the examples leave a lot to be desired, I won't argue with that.

Many argue AI is not truly creative and simply steals from existing work - there is truth to that. From a different point of view - most human creativity is also shaped by prior works and environment. The difference is one of degree, not kind. With better inputs, better scaffolding and more curated models, the outputs will appear more creative over time. Whether that is a good or bad thing is an open question...

AGI Definition 3 - Develop new skills

An agent with access to tool creation and skill generation can extend its own capabilities beyond what it was originally given. Claude Code does this today. (DeepMind, Anthropic)

AGI Definition 4 - Solve unfamiliar tasks

LLMs generalize across domains they were not explicitly trained on. With web search and the ability to write and run code, the range of solvable unfamiliar tasks is vast, and it will only improve with better scaffolding and new models (optimizations).

AGI Definition 5 - Be generally capable across domains

OpenAI GPT-4+, Anthropic Opus, Google Gemini - all operate across coding, writing, reasoning, medicine, law, mathematics, and more.

AGI Definition 6 - Be superior to humans in intelligence

In specific, measurable domains - coding benchmarks, medical diagnosis, legal research, mathematics - AI already outperforms the average human and in some cases the best humans. (LM Council, LiveBench)

AGI Definition 7 - Outperform humans economically

Software engineering, content creation, research, customer support, data analysis - AI is already being used to do all of these at scale. Not necessarily replacing humans outright, but assisting them in ways that increase quality and velocity.

AGI Definition 8 - Independently solve complex problems without human oversight

An agent running on OpenClaw operates continuously - triggered by schedules, not by a human. It can receive a goal ("fix the tests"), execute a Claude Code loop, manage its own context, spawn sub-agents, and report back when done. The human sets the goal once. The system figures out how to execute it. That is independent problem-solving. The oversight is at the edges - start and end - not in the middle.

That said, results degrade with task size and openness. The more ambiguous or long-horizon the task, the worse the outcome tends to be. But this improves on every iteration - both of the model and the scaffolding around it.

A note on model improvements

With every new model, fewer tool calls are needed and fewer reasoning turns are required. That is real progress. But it is optimization - making the same capability more efficient - not movement toward something qualitatively new.

That said, model intelligence is not irrelevant - pre-GPT-4, the models were not capable enough for the scaffolding to matter much. There was a threshold of baseline intelligence that needed to be crossed first. Once it was, the scaffolding is what gave the system the ability to act, persist, and develop. That is when AGI arrived - not when the model got smarter, but when a smart enough model was given the right infrastructure around it.

All this to say, the frontier isn't strictly in the next model release, it is in the scaffolding around it, and we are practically there by all current definitions.

As scaffolding and models improve, we will see more shocking feats being achieved. No matter how far we go, there will always be people who deny it has reached AGI.

By my own definition (and some of the notable ones around), the system has reached AGI.

Now we are in the phase of improving outputs and consistency.

References

AGI definitions
Google DeepMind "Levels of AGI" paper
Google DeepMind publication page
Scientific American — "What Does AGI Actually Mean?"
TechTarget definition
IBM Think — What is AGI?
OpenAI Charter
Springer — "Humans as more virtuous: creative thinking and intellectual autonomy"

Historical
Weizenbaum — "ELIZA: a computer program for the study of natural language communication between man and machine" (ACM, 1966)

Benchmarks
LM Council — AI Model Benchmarks
LiveBench — Contamination-free LLM Benchmarks

Tools and scaffolding
Claude Code
OpenClaw
OpenAI — Function Calling (Tool Calling)
Anthropic — Introducing the Model Context Protocol
Anthropic — The Complete Guide to Building Skills for Claude

AGI 已来临 AGI Is Here