从记忆到推理：损失曲率谱系中

从记忆到推理：损失曲率谱系中
From Memorization to Reasoning in the Spectrum of Loss Curvature

这项研究调查了记忆在Transformer模型（包括语言和视觉模型）中的表现形式。作者们证明，通过分析模型的损失景观的曲率可以识别出记忆信息——记忆数据会产生比非记忆数据更尖锐的曲率。他们基于这种曲率开发了一种权重编辑技术，能够比现有方法*更*有效地减少不必要的记忆，同时保持整体语言流畅性。然而，这种编辑特别影响了依赖高度专业知识的任务，例如事实检索和算术，即使在保持更广泛的推理能力的同时。该研究表明，这些任务利用了模型权重空间中独特且狭义定义的区域，并且消除与记忆相关的组件也会移除这些特定技能的关键要素。这项工作为了解神经网络中的记忆提供了更深入的理解，并提供了一种有针对性的移除方法，突出了模型中存在的特殊结构。

这个Hacker News讨论围绕着一篇最近的研究论文，探讨神经网络的学习方式——特别是记忆和真正推理之间的区别。该论文提出了一种方法，使用K-FAC技术分析模型权重矩阵内的“损失曲率”。核心思想是：**高曲率对应于训练数据的记忆，而低曲率代表更通用的、共享的知识。** 提出的编辑技术涉及*移除*权重矩阵的低曲率部分，有效地剪除推理核心，并可能释放空间以获得更好的泛化能力。一位评论员指出对摘要的一种潜在误解，澄清论文将*高*曲率与记忆联系起来。其他人则认为这项研究是朝着将模型提炼到其本质“推理”能力的一步。

原文

[Submitted on 28 Oct 2025 (v1), last revised 31 Oct 2025 (this version, v2)]

View a PDF of the paper titled From Memorization to Reasoning in the Spectrum of Loss Curvature, by Jack Merullo and 3 other authors

View PDF HTML (experimental)

Abstract:We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature. This insight is based on prior theoretical and empirical work showing that the curvature for memorized training points is much sharper than non memorized, meaning ordering weight components from high to low curvature can reveal a distinction without explicit labels. This motivates a weight editing procedure that suppresses far more recitation of untargeted memorized data more effectively than a recent unlearning method (BalancedSubnet), while maintaining lower perplexity. Since the basis of curvature has a natural interpretation for shared structure in model weights, we analyze the editing procedure extensively on its effect on downstream tasks in LMs, and find that fact retrieval and arithmetic are specifically and consistently negatively affected, even though open book fact retrieval and general logical reasoning is conserved. We posit these tasks rely heavily on specialized directions in weight space rather than general purpose mechanisms, regardless of whether those individual datapoints are memorized. We support this by showing a correspondence between task data's activation strength with low curvature components that we edit out, and the drop in task performance after the edit. Our work enhances the understanding of memorization in neural networks with practical applications towards removing it, and provides evidence for idiosyncratic, narrowly-used structures involved in solving tasks like math and fact retrieval.

From: Jack Merullo [view email]
[v1] Tue, 28 Oct 2025 10:09:35 UTC (2,148 KB)
[v2] Fri, 31 Oct 2025 00:26:33 UTC (2,148 KB)

从记忆到推理：损失曲率谱系中 From Memorization to Reasoning in the Spectrum of Loss Curvature

从记忆到推理：损失曲率谱系中
From Memorization to Reasoning in the Spectrum of Loss Curvature