KV 缓存压缩：超越 TurboQuant 和向量香农极限 90 万倍

KV 缓存压缩：超越 TurboQuant 和向量香农极限 90 万倍
KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

arXivLabs是一个框架，允许合作者直接在我们的网站上开发和分享新的arXiv功能。个人和与arXivLabs合作的组织都认同并接受我们开放、社群、卓越和用户数据隐私的价值观。arXiv致力于这些价值观，并且只与秉持这些价值观的合作伙伴合作。您是否有为arXiv社群增加价值的项目想法？了解更多关于arXivLabs的信息。

arXiv上的一篇新论文详细介绍了一种KV缓存压缩技术，可实现高达900,000倍的压缩率——远超TurboQuant等方法。作者（EGreg）解释说，其核心思想是利用LLM自身的权重作为预测“词典”。系统不是存储所有内存，而是预测可能的响应，并仅保存令人惊讶或难以猜测的部分。这项技术建立在作者之前的“概率语言尝试”（PLT）工作之上，认识到LLM本质上编码了可能序列的概率分布。允许进行*有损*压缩——接受偶尔的“溢出”以处理意外数据——可以实现超越香农极限的压缩。作者认为这项技术具有广泛的应用，从廉价的AI推理和机器人技术，到潜在地模拟动物的学习方式，利用快速的预测“系统1”思维，并在需要时进行更详细的分析。一个可用的原型和更多细节可通过电子邮件获得（论文中找到）。

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

KV 缓存压缩：超越 TurboQuant 和向量香农极限 90 万倍 KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

KV 缓存压缩：超越 TurboQuant 和向量香农极限 90 万倍
KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit