嵌套学习：一种用于持续学习的新机器学习范式

嵌套学习：一种用于持续学习的新机器学习范式
Nested Learning: A new ML paradigm for continual learning

原始链接: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/

机器学习的最新进展，特别是大型语言模型（LLM），在**持续学习**方面仍然面临挑战——即在不遗忘先前知识的情况下持续学习的能力，而人脑则通过神经可塑性擅长于此。当前的LLM在更新新信息时会面临“**灾难性遗忘**”问题，牺牲过去的学习成果。研究人员通常通过架构改变或改进训练方法来解决这个问题，但往往将它们视为独立的实体。一种新的方法，**嵌套学习**，在NeurIPS 2025论文中提出，统一了这些概念。它将模型视为相互关联的多层学习问题，并*同时*进行优化，认识到架构和优化是同一过程的不同层次。这允许更深层的计算深度和更高的学习效率，从而减轻灾难性遗忘。一个概念验证模型“Hope”展示了优于现有模型的语言建模和长上下文记忆能力，验证了嵌套学习构建更具适应性和能力的AI的潜力。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录嵌套学习：一种用于持续学习的新机器学习范式 (research.google) 80 分，by themgt 12小时前 | 隐藏 | 过去 | 收藏 | 2 评论 abracos 6小时前 | 下一个 [–] 有人正在尝试在开源中重现它 https://github.com/kmccleary3301/nested_learning 回复 panarchy 3小时前 | 上一个 [–] 我从大约2019年起就在等待有人做这个，这似乎很明显。当他们进入具有元网络优化特定任务的混合异构架构网络时，将会很有趣。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

The last decade has seen incredible progress in machine learning (ML), primarily driven by powerful neural network architectures and the algorithms used to train them. However, despite the success of large language models (LLMs), a few fundamental challenges persist, especially around continual learning, the ability for a model to actively acquire new knowledge and skills over time without forgetting old ones.

When it comes to continual learning and self-improvement, the human brain is the gold standard. It adapts through neuroplasticity — the remarkable capacity to change its structure in response to new experiences, memories, and learning. Without this ability, a person is limited to immediate context (like anterograde amnesia). We see a similar limitation in current LLMs: their knowledge is confined to either the immediate context of their input window or the static information that they learn during pre-training.

The simple approach, continually updating a model's parameters with new data, often leads to “catastrophic forgetting” (CF), where learning new tasks sacrifices proficiency on old tasks. Researchers traditionally combat CF through architectural tweaks or better optimization rules. However, for too long, we have treated the model's architecture (the network structure) and the optimization algorithm (the training rule) as two separate things, which prevents us from achieving a truly unified, efficient learning system.

In our paper, “Nested Learning: The Illusion of Deep Learning Architectures”, published at NeurIPS 2025, we introduce Nested Learning, which bridges this gap. Nested Learning treats a single ML model not as one continuous process, but as a system of interconnected, multi-level learning problems that are optimized simultaneously. We argue that the model's architecture and the rules used to train it (i.e., the optimization algorithm) are fundamentally the same concepts; they are just different "levels" of optimization, each with its own internal flow of information ("context flow") and update rate. By recognizing this inherent structure, Nested Learning provides a new, previously invisible dimension for designing more capable AI, allowing us to build learning components with deeper computational depth, which ultimately helps solve issues like catastrophic forgetting.

We test and validate Nested Learning through a proof-of-concept, self-modifying architecture that we call “Hope”, which achieves superior performance in language modeling and demonstrates better long-context memory management than existing state-of-the-art models.

嵌套学习：一种用于持续学习的新机器学习范式 Nested Learning: A new ML paradigm for continual learning

嵌套学习：一种用于持续学习的新机器学习范式
Nested Learning: A new ML paradigm for continual learning