The initial, feverish enthusiasm for large language models (LLMs) is beginning to cool, and for good reason. It’s time to trade the out-of-control hype for a more pragmatic, even “boring,” approach. A recent MIT report shows that 95% of companies implementing this technology have yet to see a positive outcome. It’s understandable to feel confused.
When I get confused, I write. This is why I wrote the first part of this series, Hype is a Business Tool as the online debate had become so overheated. In part 2, The Timmy Trap, I covered why we are, surprisingly, a large part of this hype problem. We’ve allowed ourselves to be fooled, confusing an LLM’s language fluency with actual intelligence. LLMs have effectively hacked our social protocols, fooling us into believing they are more intelligent than they are.
So in this final part, I want to answer the question: why should we still care? The tech is problematic, and signs point to the bubble bursting. When we hit the “Trough of Disillusionment,” what rises from the ashes? Two lessons from my career help me navigate uncertainty: 1. technology flows downhill, and 2. we usually start on the wrong path.
In his 1989 paper, The Dynamo and the Computer, Paul David describes how as technology matures, its impact changes dramatically. He uses the example of the dynamo, an old-fashioned term for a powerful electric motor. This power source completely changed the Industrial Revolution.
Early factories were tied to rivers to harness water power, but the dynamo freed them from this geographic limitation. Initially, factories had just one large dynamo, which required a complicated system of pulleys to distribute power to the rest of the building. This made the factory’s workflow convoluted. But as dynamos became smaller and more affordable, factories were able to put them in multiple locations. This second development was even more liberating than the first because it allowed for the creation of the assembly line. The power could now adapt to the workflow, instead of the other way around, which led to a major boost in productivity.
David used this historical shift as an analogy for what was happening in the late 1980s. Instead of everyone having to work around a single, clunky mainframe, the new, smaller desktop computers were conforming to the workflows of the modern office. This same pattern, from large and centralized to small and distributed, is happening with LLMs right now.
This downsizing of LLMs is mostly being pushed by the open-source community, which is creating a wide variety of models that challenge this assumption that we need bigger, centralized models. These smaller forms of LLM are called SLMs (Small Language Models) that are trained on much smaller sets of data, with far fewer parameters, and reduced quantization. Microsoft’s Phi3 model is very reasonable for small tasks and runs on my 8 year old PC without using more than 10% of the CPU.
But I can understand why you’d be skeptical. These smaller open-source models, while very good, usually don’t score as well as the big foundational models by OpenAI and Google which makes them feel second-class. That perception is a mistake. I’m not saying they perform better; I’m saying it doesn’t matter. We’re asking them the wrong questions. We don’t need models to take the bar exam.
Several companies are experimenting with better questions, using SLMs for smaller, even invisible tasks. For example, performing query rewrites behind the scenes. This is a vastly simpler task. The user has no idea an LLM is even involved; they just get better results. By sticking to lower level syntactic tasks, they’re not asking LLMs to pretend to be human which generates no hallucinations! What’s even more exciting about this use case is that the company could likely use a very small, bespoke, and local LLM for this.
Tiny uses like this flip the script on the large centralized models and favor SLMs which have knock-on benefits: they are easier to ethically train and have much lower running costs. As it gets cheaper and easier to create these custom LLMs, this type of use case could become useful and commonplace.
The original iPod went from a weight of 180g to just over 12g as technology improved and it morphed into more niche uses. LLMs are also likely to significantly change as the technology and market matures. They will be used in much smaller, more focused, and, I’m afraid to say it, significantly more boring ways. This will only accelerate as people get tired of “hallucinations” and discover how powerful LLMs are when they’re kept focused on these smaller, more predictable language processing goals.
There’s a reason there are so many failures in that MIT report: people are rushing too quickly into hyped technology not understanding how to best use the tech. We’ve seen this throughout history with naive database implementations in the 1980s, the dot-com bust of the late ’90s, and the mobile web of the early 2000s. Whenever there is hype, we shuffled into the easy path, forcing the tech into the product without understanding its weaknesses. We are more worried about being left behind than actually doing something of value. We get there eventually, but only after understanding that we were asking the wrong questions. So many companies fail figuring this out.
This is why I continue to use and explore LLMs despite all their current issues. Not because I support these companies, I’m just trying to understand the “grain” of this new material (as a recovering furniture maker, wood metaphors come easily to me). I’m playing with today’s models to figure out how they fall over trying to find how they can be useful. I did this by trying to use LLMs to help me write this series of blog posts. Let’s just say it did not go well (the details, while funny, are just a bit tedious to share).
To be honest, I started off on the wrong path as well. I totally bought into the “intelligent assistant” framing of their skills and tried to use them to short-circuit the process of writing. But writing is hard for a deeply human reason. You don’t know what you don’t know. You write to understand, which usually means writing a ton of awful text that must then be ruthlessly thrown away. Trying to ‘write automatically’ using LLMs completely circumvents this pain. Just like James T. Kirk, we need our pain!
Much like the query rewrite example above, I’ve had success going for smaller wins, using LLM’s underlying superpower: linguistic, syntax-driven language tasks. This is mostly for proofreading and condensing my rambling voice notes. These rather boring uses have significantly reduced drudgery and improved the overall quality of my writing. Best yet, they work pretty well (well, most of the time) But I don’t ask them to do any of the writing. I need my pain.
LLMs are not intelligent and they never will be. We keep asking them to do “intelligent things” and find out a) they really aren’t that good at it, and b) replacing that human task is far more complex than we originally thought. This has made people use LLMs backwards, desperately trying to automate from the top down when they should be augmenting from the bottom up.
Putting these two lessons together implies that we are headed for a more productive but boring place: SLMs used for low-level linguistic tasks. Let’s be very clear, I’m not an LLM expert; I’m certainly not descending from a mountain with stone tablets. There are clearly going to be other potential uses for this tech. I’m just pointing out that our current approach is failing and something has to change. Like many others, I expect this AI bubble will pop, which will cause lots of grief. But afterwards, my hypothesis is that the technology will flow downhill into smaller, more efficient, and hopefully more ethical packages. And we, in turn, need to finally get on the right path by using these models for tasks they excel at.
Ultimately, a mature technology doesn’t look like magic; it looks like infrastructure. It gets smaller, more reliable, and much more boring.
We’re here to solve problems, not look cool..