ARC 奖 – 一项价值超过 100 万美元的竞赛，旨在推动开放 AGI 进步

ARC 奖 – 一项价值超过 100 万美元的竞赛，旨在推动开放 AGI 进步
ARC Prize – a $1M+ competition towards open AGI progress

本文讨论了目前在人工智能 (AI) 开发中处于领先地位的大型语言模型 (LLM) 的局限性。这些模型擅长记忆复杂数据，但在面对不熟悉的情况时缺乏产生新推理或技能的能力。作者认为，要获得真正的通用智能，需要法学硕士以外的新方法。他介绍了 Arc-AGI，这是 2019 年推出的一种独特的人工智能评估方法，衡量获取新技能而不仅仅是记忆的效率。尽管 Arc-AGI 对人类来说简单易用，但当代人工智能却难以应对。该竞赛旨在通过 ARC 奖鼓励人工智能的进步，这是一项价值百万美元以上的活动，专注于开发克服 Arc-AGI 的解决方案。此外，人们对最近 AGI 进展的封闭发展感到担忧，强调在人工智能研究中培育开放文化的重要性。

Mike 和 Francois Chollet 宣布设立 ARC 奖，邀请参与者开发用于 ARC-AGI 评估的解决方案。该评估针对的是通用人工智能（AGI），即能够学习新技能和解决开放式问题的系统。创建于 2019 年，其最先进 (SoTA) 性能从 20% 提高到 34%，而人类得分仍然很高 (85-100%)。去年，超过300个团队尝试了ARC-AGI，领先的机构也进行了尝试。与注重特定技能的传统评估不同，ARC-AGI 抵制记忆策略。任务涉及识别简单的模式，需要“核心知识先验”，例如目标导向、物体识别、对称性和旋转。尽管这些难题对于人类来说很容易，但事实证明对于当前的人工智能来说是无法解决的。该竞赛旨在扩大 AGI 研究库，为 AGI 进步提供明确的基准，并发现有关智能行为的新见解。感兴趣的？在 https://arcprize.org/play 尝试一些 ARC-AGI 任务。欲了解更多信息，请访问 https://x.com/arcprize。

原文

A $1,000,000+ competition towards open AGI progress.

AGI progress has stalled. New ideas are needed.

Intelligence vs Memorization

Modern AI (LLMs) have shown to be great memorization engines. They are able to memorize high-dimensional patterns in their training data and apply those patterns into adjacent contexts. This is also how their apparent reasoning capability works. LLMs are not actually reasoning. Instead they memorize reasoning patterns and apply those reasoning patterns into adjacent contexts. But they cannot generate new reasoning based on novel situations.

More training data lets you "buy" performance on memorization based benchmarks (MMLU, GSM8K, ImageNet, GLUE, etc.) But memorization alone is not general intelligence. General intelligence is the ability to efficiently acquire new skills.

More scale will not enable LLMs to learn new skills. We need new architectures or algorithms that enable AI systems to learn at test time. This is how humans are able to adapt to novel situations.

Beyond LLMs, for many years, we've had AI systems that can beat humans at poker, chess, go, and other games. However, no AI system trained to succeed at one game can simply be retrained toward another. Instead researchers have had to re-architect and rebuild entirely new systems per game.

This is a failure to generalize.

Without this capability, AI will forever be rate-limited by the human general intelligence in the loop. We want AGI that can discover and invent alongside humans to push humanity forward.

Given the success and proven economic utility of LLMs over the past 4 years, the above may seem like extraodinary claims. Strong claims require strong evidence.

ARC-AGI

Introduced by François Chollet in his influencial paper "On the Measure of Intelligence", ARC-AGI is the only AI eval which measures general intelligence: a system that can efficiently acquire new skills and solve novel, open-ended problems.

ARC-AGI was created in 2019 and the state-of-the-art (SOTA) high score was 20%. Today, only 34%.

Yet humans - even children - can master tasks quickly.

ARC-AGI is easy for humans and impossible for modern AI.

Most AI benchmarks rapidly saturate to human performance-level because they test only for memorization, which is something AI is superhuman at.

ARC-AGI is not saturating, in fact current pace is slowing down. It was designed to resist memorization and has proven extremely challenging for both the largest foundational transformer models as well as bespoke AI systems designed to defeat ARC-AGI.

ARC-AGI Benchmark Comparison

A solution to ARC-AGI, at a minimum, opens up a completely new programming paradigm where programs can perfectly and reliably generalize from an arbitrary set of priors. We also believe a solution is on the critical path towards AGI

3 ARC-AGI Tasks

Open Source AGI Progress

If you accept new ideas are needed, let's consider how to increase the rate of new ideas. Unfortunately, trends in AI are going the wrong way.

Closed vs Open

Starting with the GPT-4 release, frontier AGI progress has gone closed source. The GPT-4 technical report surprisingly contains no technical details. OpenAI said "competitive" reasons were the first reason why. Google's Gemini technical report also contains no technical details on the long context window frontier innovation.

LLMs have also shifted the majority of research attention away from new architectures and new algorithms. Over $20B was deployed to non-general AI companies in 2023 and many frontier DeepMind researchers were restaffed to Gemini (in order to compete with OpenAI.)

Leading labs have strong incentives to loudly claim, "scale is all you need," and, "don't try to compete with us on frontier research," even though they all quietly believe new ideas are needed to reach AGI. Their bet is they can discover all the necessary new ideas within their labs.

LLM History

But let's look at the history of LLMs. Specifically the transformer architecture. Transformers emerged many years downstream of machine translation research (e.g., English to Spanish.)

2014: Sutskever et. al. (Google) published Seq2Seq Learning using RNNs and CNNs for variable length input vs output (English and Spanish words are not the same length.)
2016: Bahdanau et. al. (Jacobs University) popularized the concept of "attention" so a system could consider different parts of the input to predict output (English adjectives come before nouns, Spanish after.)
2017: Vaswani et. al. (Google) realized "attention is all you need", dropping RNNs and CNNs, optimizing the architecture, enabling new scale
2018: Radford et. al. (OpenAI) created GPT-2 built on top of the transformer architecture at frontier scale, showing emergent capabilities

The story of the transformer is the story of science. Researchers in different labs and teams publish and build on top of each other's work.

While it is possible one lab could discover AGI alone, it is highly unlikely. The global chance of AGI discovery has decreased and will keep decreasing if we accepting this as status quo.

Progress

I have spoken with many young students and would-be researchers over the past year. Many are depressed. There is a sense of dread that everything has been figured out already. But this is not true! The AI ecosystem is intentionally telling a partial-truth to boost their relative competitive positions to the detriment of actual progress towards AGI.

Worse, the inaccurate "scale is all you need" belief is now influencing the AI regulatory environment. Regulators are considering roadblocks to frontier AI research under the wrong assumption that AGI is imminent. The truth is no one knows how to build AGI.

We should be trying to incentivize new ideas, not slow them down. The internet and open source are the strongest innovation engines the world has ever seen.

By incentivizing open source we increase the rate of new ideas, increasing the chance we discover AGI, and ensure those new ideas are widely distributed to establish a more even playing field between small and large AI companies.

We hope ARC Prize can help counterbalance these trends.

François Chollet and Mike Knoop

ARC Prize

Announcing ARC Prize, a $1,000,000+ prize pool competition to beat and open-source a solution to the ARC-AGI eval.

Hosted by Mike Knoop and François Chollet. Presented by Infinite Monkey and Lab42.

ARC Prize Goals

Increase the number of people working on frontier AGI research.
Popularize an objective measure of AGI progress.
Solve ARC-AGI and learn something new about the nature of intelligence.

Get Started

Ready to make the first significant leap towards AGI in years? No matter who you are, where you come from, what you do for a living, you are welcome to join this competition. New ideas might come from anywhere. Possibly you?

Find competition format and prize details on ARC Prize 2024 here.

For more information on how to get started solving ARC-AGI visit the guide.

To learn more how ARC-AGI measures general intelligence visit ARC-AGI.

Stay updated on ARC Prize progress and SOTA solutions on X/Twitter, YouTube, Email, and Discord. You can also contact us at [email protected].