动态大型概念模型:自适应语义空间中的潜在推理
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

原始链接: https://arxiv.org/abs/2512.24617

## 动态大型概念模型 (DLCM): 摘要 本文介绍了一种名为动态大型概念模型 (DLCM) 的新型语言建模框架,旨在解决大型语言模型 (LLM) 处理信息效率低下的问题。当前的 LLM 将所有token一视同仁,尽管信息密度各不相同。DLCM 学习语义边界,将可预测的文本压缩成“概念”,从而将计算重点转移到关键的语义转换上,以实现更高效的推理。 DLCM 端到端地发现这些概念,无需预定义的语言规则,并引入了一种新型的“压缩感知缩放定律”来优化计算资源分配。一项关键创新是“解耦的μP参数化”,它能够实现稳定的训练和超参数迁移。 实验表明,DLCM 平均每个概念包含四个token,它将计算资源重新分配给更强大的推理骨干,在 12 个基准测试中实现了 **2.69% 的平均性能提升**,*且*没有增加整体计算成本。这表明 LLM 可以通过优先考虑推理而非冗余处理来更有效地利用资源。

arXiv上的一篇新研究论文探讨了“动态大概念模型”,利用受HyperNetworks (HNet)启发的方案来提升大型语言模型的性能。该模型旨在在自适应语义空间内实现潜在推理。 Hacker News上的讨论集中在性能提升是由于模型架构还是仅仅因为参数比基线增加了75%——这一点被比作在Mixture-of-Experts (MoE)模型中看到的优势。 一个关键问题是,这种方法是否能够促进跨语言理解,在一种语言中学习概念并在另一种语言中应用它们。然而,一位评论员认为该模型主要提供更高效的token表示(压缩),而不是真正的抽象概念学习,很可能保留了输入语言的痕迹。 另外有人指出论文中的引用存在错误。
相关文章

原文

View a PDF of the paper titled Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space, by Xingwei Qu and 18 other authors

View PDF HTML (experimental)
Abstract:Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictable spans while under-allocating computation to semantically critical transitions. We propose $\textbf{Dynamic Large Concept Models (DLCM)}$, a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient. DLCM discovers variable-length concepts end-to-end without relying on predefined linguistic units. Hierarchical compression fundamentally changes scaling behavior. We introduce the first $\textbf{compression-aware scaling law}$, which disentangles token-level capacity, concept-level reasoning capacity, and compression ratio, enabling principled compute allocation under fixed FLOPs. To stably train this heterogeneous architecture, we further develop a $\textbf{decoupled $\mu$P parametrization}$ that supports zero-shot hyperparameter transfer across widths and compression regimes. At a practical setting ($R=4$, corresponding to an average of four tokens per concept), DLCM reallocates roughly one-third of inference compute into a higher-capacity reasoning backbone, achieving a $\textbf{+2.69$\%$ average improvement}$ across 12 zero-shot benchmarks under matched inference FLOPs.
From: Xingwei Qu [view email]
[v1] Wed, 31 Dec 2025 04:19:33 UTC (2,886 KB)
[v2] Mon, 5 Jan 2026 05:44:29 UTC (2,887 KB)
联系我们 contact @ memedata.com