通过多标记预测更好更快的大型语言模型

通过多标记预测更好更快的大型语言模型
Better and Faster Large Language Models via Multi-Token Prediction

原始链接: https://arxiv.org/abs/2404.19737

arXivLabs 是一个框架，允许合作者直接在我们的网站上开发和共享新的 arXiv 功能。与 arXivLabs 合作的个人和组织都接受并接受了我们开放、社区、卓越和用户数据隐私的价值观。 arXiv 致力于这些价值观，并且只与遵守这些价值观的合作伙伴合作。您有一个可以为 arXiv 社区增加价值的项目想法吗？了解有关 arXivLabs 的更多信息。

该用户讨论了他们对机器学习领域的复杂性和缺乏清晰解释的沮丧，特别是围绕与 Langhchain 和 GPT-2 等语言模型相关的预训练、训练、推理和专家混合 (MoE) 等主题。 They suggest reading resources like Andrej Karpathy's "Let's Build GPT-2" videos and studying the code closely for a deeper understanding。 They also propose setting a limit for models that can run locally and considering the long-term prospects of "prompts as a service" applications。 The user compares the limitations of approaches to inference based on single-symbol outputs versus longer sequences, discussing the importance of context and the role of probability distributions。 They critique the documentation of Langindex and recommend exploring generative models through hands-on implementations。 The user suggests refining training techniques, including the use of cross-entropy for both training loss and post-processing, and improving sampling strategies beyond greedy sampling。 They express curiosity about the potential impact of adding more output heads, and the difficulty of modeling entire sentences due to the infinite possibilities of subsequent words。 The user wonders why models don't predict entire sentences instead of individual tokens, questioning whether there's a trade-off between efficiency and focus。 They challenge the notion of an inherent plan in the model, proposing that it's simply a distribution of probabilities influenced by previous inputs。 They contrast the seemingly planned nature of the outputs with the actual absence of an underlying plan or intention in the model。 The conversation ends with a question about how to effectively incorporate larger contexts into language models。

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

通过多标记预测更好更快的大型语言模型 Better and Faster Large Language Models via Multi-Token Prediction

通过多标记预测更好更快的大型语言模型
Better and Faster Large Language Models via Multi-Token Prediction