无聊是好事。

无聊是好事。
Boring is good

大型语言模型（LLM）最初的炒作正在消退，数据表明95%的公司尚未看到积极成果。问题不在于技术本身，而在于我们的方法——我们被LLM流畅的语言所误导，误以为它们拥有真正的智能。未来不在于更大、更集中的LLM，而在于来自开源社区的更小、更专注的“SLM”（小型语言模型）。这些SLM擅长特定的、低级别的语言任务——例如优化搜索查询——在*幕后*运行以改进结果，而无需试图模仿人类智能，从而避免“幻觉”。这种转变反映了历史上的技术演变：从大型集中式电源到小型分布式电源。SLM更便宜，训练起来更符合伦理，并且最终更实用。我们一再陷入技术被强加到现有工作流程中的陷阱，而不是调整工作流程以适应技术的优势。关键是停止要求LLM *成为* 智能，而是利用它们针对性的、 “枯燥”的任务的语言能力。成熟的LLM不会是一个神奇的助手，而是一个可靠的、不可见的基础设施组成部分——而这才是它真正的价值所在。

Hacker News新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录平淡即好 (jenson.org)14 分，作者 zdw 1 小时前 | 隐藏 | 过去 | 收藏 | 3 条评论 stephenlf 18 分钟前 | 下一个 [–] 很好的观点。我个人觉得以规格为导向的开发很枯燥乏味。但也许这很好。回复akagusu 32 分钟前 | 上一个 [–] 我也同意平淡即好，但在我们现在的社会，你不会因为平淡而找到工作，而且当你找到工作时，你肯定不是为了解决问题而获得报酬。回复keyle 17 分钟前 | 父评论 [–] > 而且当你找到工作时，你肯定不是为了解决问题而获得报酬这只是你的经验，基于你的地理位置和事件链。指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

The initial, feverish enthusiasm for large language models (LLMs) is beginning to cool, and for good reason. It’s time to trade the out-of-control hype for a more pragmatic, even “boring,” approach. A recent MIT report shows that 95% of companies implementing this technology have yet to see a positive outcome. It’s understandable to feel confused.

When I get confused, I write. This is why I wrote the first part of this series, Hype is a Business Tool as the online debate had become so overheated. In part 2, The Timmy Trap, I covered why we are, surprisingly, a large part of this hype problem. We’ve allowed ourselves to be fooled, confusing an LLM’s language fluency with actual intelligence. LLMs have effectively hacked our social protocols, fooling us into believing they are more intelligent than they are.

So in this final part, I want to answer the question: why should we still care? The tech is problematic, and signs point to the bubble bursting. When we hit the “Trough of Disillusionment,” what rises from the ashes? Two lessons from my career help me navigate uncertainty: 1. technology flows downhill, and 2. we usually start on the wrong path.

In his 1989 paper, The Dynamo and the Computer, Paul David describes how as technology matures, its impact changes dramatically. He uses the example of the dynamo, an old-fashioned term for a powerful electric motor. This power source completely changed the Industrial Revolution.

Early factories were tied to rivers to harness water power, but the dynamo freed them from this geographic limitation. Initially, factories had just one large dynamo, which required a complicated system of pulleys to distribute power to the rest of the building. This made the factory’s workflow convoluted. But as dynamos became smaller and more affordable, factories were able to put them in multiple locations. This second development was even more liberating than the first because it allowed for the creation of the assembly line. The power could now adapt to the workflow, instead of the other way around, which led to a major boost in productivity.

David used this historical shift as an analogy for what was happening in the late 1980s. Instead of everyone having to work around a single, clunky mainframe, the new, smaller desktop computers were conforming to the workflows of the modern office. This same pattern, from large and centralized to small and distributed, is happening with LLMs right now.

This downsizing of LLMs is mostly being pushed by the open-source community, which is creating a wide variety of models that challenge this assumption that we need bigger, centralized models. These smaller forms of LLM are called SLMs (Small Language Models) that are trained on much smaller sets of data, with far fewer parameters, and reduced quantization. Microsoft’s Phi3 model is very reasonable for small tasks and runs on my 8 year old PC without using more than 10% of the CPU.

But I can understand why you’d be skeptical. These smaller open-source models, while very good, usually don’t score as well as the big foundational models by OpenAI and Google which makes them feel second-class. That perception is a mistake. I’m not saying they perform better; I’m saying it doesn’t matter. We’re asking them the wrong questions. We don’t need models to take the bar exam.

Several companies are experimenting with better questions, using SLMs for smaller, even invisible tasks. For example, performing query rewrites behind the scenes. This is a vastly simpler task. The user has no idea an LLM is even involved; they just get better results. By sticking to lower level syntactic tasks, they’re not asking LLMs to pretend to be human which generates no hallucinations! What’s even more exciting about this use case is that the company could likely use a very small, bespoke, and local LLM for this.

Tiny uses like this flip the script on the large centralized models and favor SLMs which have knock-on benefits: they are easier to ethically train and have much lower running costs. As it gets cheaper and easier to create these custom LLMs, this type of use case could become useful and commonplace.

The original iPod went from a weight of 180g to just over 12g as technology improved and it morphed into more niche uses. LLMs are also likely to significantly change as the technology and market matures. They will be used in much smaller, more focused, and, I’m afraid to say it, significantly more boring ways. This will only accelerate as people get tired of “hallucinations” and discover how powerful LLMs are when they’re kept focused on these smaller, more predictable language processing goals.

There’s a reason there are so many failures in that MIT report: people are rushing too quickly into hyped technology not understanding how to best use the tech. We’ve seen this throughout history with naive database implementations in the 1980s, the dot-com bust of the late ’90s, and the mobile web of the early 2000s. Whenever there is hype, we shuffled into the easy path, forcing the tech into the product without understanding its weaknesses. We are more worried about being left behind than actually doing something of value. We get there eventually, but only after understanding that we were asking the wrong questions. So many companies fail figuring this out.

This is why I continue to use and explore LLMs despite all their current issues. Not because I support these companies, I’m just trying to understand the “grain” of this new material (as a recovering furniture maker, wood metaphors come easily to me). I’m playing with today’s models to figure out how they fall over trying to find how they can be useful. I did this by trying to use LLMs to help me write this series of blog posts. Let’s just say it did not go well (the details, while funny, are just a bit tedious to share).

To be honest, I started off on the wrong path as well. I totally bought into the “intelligent assistant” framing of their skills and tried to use them to short-circuit the process of writing. But writing is hard for a deeply human reason. You don’t know what you don’t know. You write to understand, which usually means writing a ton of awful text that must then be ruthlessly thrown away. Trying to ‘write automatically’ using LLMs completely circumvents this pain. Just like James T. Kirk, we need our pain!

Much like the query rewrite example above, I’ve had success going for smaller wins, using LLM’s underlying superpower: linguistic, syntax-driven language tasks. This is mostly for proofreading and condensing my rambling voice notes. These rather boring uses have significantly reduced drudgery and improved the overall quality of my writing. Best yet, they work pretty well (well, most of the time) But I don’t ask them to do any of the writing. I need my pain.

LLMs are not intelligent and they never will be. We keep asking them to do “intelligent things” and find out a) they really aren’t that good at it, and b) replacing that human task is far more complex than we originally thought. This has made people use LLMs backwards, desperately trying to automate from the top down when they should be augmenting from the bottom up.

Putting these two lessons together implies that we are headed for a more productive but boring place: SLMs used for low-level linguistic tasks. Let’s be very clear, I’m not an LLM expert; I’m certainly not descending from a mountain with stone tablets. There are clearly going to be other potential uses for this tech. I’m just pointing out that our current approach is failing and something has to change. Like many others, I expect this AI bubble will pop, which will cause lots of grief. But afterwards, my hypothesis is that the technology will flow downhill into smaller, more efficient, and hopefully more ethical packages. And we, in turn, need to finally get on the right path by using these models for tasks they excel at.

Ultimately, a mature technology doesn’t look like magic; it looks like infrastructure. It gets smaller, more reliable, and much more boring.

We’re here to solve problems, not look cool..

无聊是好事。 Boring is good

无聊是好事。
Boring is good