``` MegaTrain：在单个GPU上对1000亿+参数LLM进行全精度训练 ```

``` MegaTrain：在单个GPU上对1000亿+参数LLM进行全精度训练 ```
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU

arXivLabs是一个框架，允许合作者直接在我们的网站上开发和分享新的arXiv功能。个人和与arXivLabs合作的组织都认同并接受我们开放、社群、卓越和用户数据隐私的价值观。arXiv致力于这些价值观，并且只与秉持这些价值观的合作伙伴合作。您是否有为arXiv社群增加价值的项目想法？了解更多关于arXivLabs的信息。

## MegaTrain：在有限硬件上训练大型LLM 一种新的方法，MegaTrain，允许在*单个*GPU上进行全精度的大型语言模型（LLM）训练，参数超过1000亿。关键是将参数和优化器状态存储在主机（CPU）内存中，将GPU视为瞬态计算引擎，并为每一层流式传输数据进出。这对于VRAM有限的用户来说尤其令人兴奋，例如RTX 3080（10GB），他们目前由于“内存不足”错误而难以处理超过40-50M参数的模型。用户正在探索互补技术，如LoRA和MoE，以进一步优化性能并利用系统内存。讨论强调了将能力从模型权重转移到自适应工具的重要性，以及为本地硬件优化工作负载。虽然该技术并非全新，但该论文展示了显著的进展——在3090上实现了每秒341个token，但对于完全预训练来说仍然较慢。此外，还提到了高内存GPU（如H200）的可用性，但成本仍然是一个障碍。最终，重点转向使LLM训练更易于访问和更高效，即使是在消费级硬件上。

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

``` MegaTrain：在单个GPU上对1000亿+参数LLM进行全精度训练 ``` MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU

``` MegaTrain：在单个GPU上对1000亿+参数LLM进行全精度训练 ```
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU