为什么你们的模型这么大？ (2023)

为什么你们的模型这么大？ (2023)
Why are your models so big? (2023)

原始链接: https://pawa.lt/braindump/tiny-models/

作者质疑当前大型语言模型（LLM）日益增长的趋势，认为巨大的规模并非总是必要的。虽然大型模型适用于需要细致理解的复杂通用任务，如聊天机器人，但许多应用——例如SQL自动补全或结构化数据提取——范围有限，并不需要数十亿参数。核心问题是*推理成本*。运行这些大型模型成本高昂，需要大量的计算能力和基础设施，最终每次用户交互都会增加成本。作者预测将出现转向更小、更专业化的模型，这些模型针对特定任务进行训练。现有工具已经允许构建和部署这些“小而强大”的模型，甚至可以直接在网页浏览器中进行，为LLM应用提供更高效、更经济的未来——可能仅用1500万个参数就能实现所需的结果。

一个黑客新闻的讨论围绕着大型语言模型（LLM）和计算设备的日益增大。最初的帖子链接到一篇文章，质疑模型为何如此庞大，并指出更简单的文本补全任务通常在生成结构化输出方面表现优于提示大型模型，且成本更低。评论者们争论大型模型是否有必要，有人以树莓派作为优化基准（“它能在树莓派上运行吗？”）。另一些人质疑现代笔记本电脑的尺寸，认为许多任务（例如查看电子邮件或天气）并不*需要*强大的硬件或大屏幕。一个关键点是，注意力模型处理复杂逻辑的能力*可能*取决于它们的大小和训练数据的广度。最终，人们对未来持乐观态度，认为更小、更便携的设备能够处理日常计算需求，这与当前向更大、更强大的机器发展的趋势形成对比。

原文

I don’t understand why today’s LLMs are so large. Some of the smallest models getting coverage sit at 2.7B parameters, but even this seems pretty big to me.

If you need generalizability, I totally get it. Things like chat applications require a high level of semantic awareness, and the model has to respond in a manner that’s convincing enough to its users. In cases where you want the LLM to produce something human-like, it makes sense that the brains would need to be a little juiced up.

That said, LLMs are a whole lot more than just bots we can chat with. There are some domains that have a tightly-scoped set of inputs and require the model to always respond in a similar way. Something like SQL autocomplete is a good example - completing a single SQL query requires a very small context window, and it requires no generalized knowledge of the English language. Structured extraction is similar: you don’t need 2.7B parameters to go from remind me at 7pm to walk the dog to { "time": "7pm", "reminder": "walk the dog" }.

I say all this because inference is expensive. Not only is it expensive in terms of raw compute - maintaining the infrastructure required to run models also gets pretty complicated. You either end up shelling out money for in-house talent or paying some provider to do the inference for you. In either case, you’re paying big money every time a user types remind me to eat a sandwich.

I think the future will be full of much smaller models trained to do specific tasks. Some tooling to build these already exists, and you can even run them in the browser. This mode of deployment is inspiring to me, and I’m optimistic about a future where 15M params is all you need.

为什么你们的模型这么大？ (2023) Why are your models so big? (2023)

为什么你们的模型这么大？ (2023)
Why are your models so big? (2023)