I first got into deep learning in 2012, when AlexNet came out. I was CTO of Jetpac, a startup that aimed to provide information about bars, hotels, and restaurants by analyzing public photos, for example finding hipster (and Turk) friendly cafes. The results from the paper were so astonishing I knew AlexNet would be incredibly helpful, so I spent my Christmas holidays heating our house using a gaming rig with two GPUs and the CudaConvNet software, since that was the only way to train my own version of the model.
The results were even better than I’d hoped, but then I faced the problem of how to apply the model across the billions of photos we’d collected. The only GPU instances on Amazon were designed for video streaming and were prohibitively expensive. The CPU support in the Caffe framework was promising, but it was focused on training models, not running them after they’d been trained (aka inference). What I needed was software that would let me run the model at a massive scale on low-cost hardware. That was the original reason I wrote the Jetpac framework, so I could spin up hundreds of cheap EC2 instances to process our huge backlog of images for tens of thousands of dollars instead of millions.
It turned out that the code was small and fast enough to even run on phones, and after Jetpac was acquired by Google I continued in that direction by leading the mobile support for TensorFlow. While I love edge devices, and that’s what I’m known for these days, my real passion is for efficiency. I learned to code in the 80’s demo scene, went on to write PC game engines professionally in the 90’s, and I got addicted to the dopamine rush of optimizing inner loops. There’s nothing quite like having hard constraints, clear requirements, and days to spend solving the puzzle of how to squeeze just a little bit more speed out of a system.
If you’re not a programmer, it might to difficult to imagine what an emotional process optimizing can be. There’s no guarantee that it’s even possible to find a good answer, so the process itself can be endlessly frustrating. The first thrill comes when you see an opening, a possibility that nobody else has spotted. There’s the satisfaction of working hard to chase down the opportunity, and then too often the despair when it turns out not to work. Even then, that means I’ve learned something, and being good at optimization means learning everything you can about the hardware, operating system, the requirements themselves, and studying others’ code in depth. I can never guarantee that I’ll find a solution, but my consolation is always that I have a better understanding of the world than when I started. The deepest satisfaction comes when I do finally find an approach that runs faster, or uses fewer resources. It’s even a social joy, it almost always contributes to a wider solution that the team is working on, making a product better, or even possible in a way it wasn’t before. The best optimizations come from a full stack team that’s able to make tradeoffs all the way from the product manager to the model architects, from hardware to operating system to software.
Anyway, enough rhapsodizing about the joy of coding, what does this have to do with the AI bubble? When I look around, I see hundreds of billions of dollars being spent on hardware – GPUs, data centers, and power stations. What I don’t see are people waving large checks at ML infrastructure engineers like me and my team. It’s been an uphill battle to raise the investment we’ve needed for Moonshine, and I don’t think it’s just because I’m a better coder than I am a salesman. Thankfully we have found investors who believe in our vision, and we’re on track to be cashflow-positive in Q1 2026, but in general I don’t see many startups able to raIse money on the promise of improving AI efficiency.
This makes no sense to me from any rational economic point of view. If you’re a tech company spending billions of dollars a month on GPUs, wouldn’t spending a few hundreds of millions of dollars a year on software optimization be a good bet? We know that GPU utilization is usually below 50%, and in my experience is often much lower for interactive applications where batches are small and memory-bound decoding dominates. We know that motivated engineers like Scott Gray can do better than Nvidia’s libraries on their own GPUs, and from my experience at Jetpac and Google I’m certain there are a lot of opportunities to run inference on much lower cost CPU machines. Even if you don’t care about the cost, the impact AI power usage has on us and the planet should make this a priority.
So, why is this money being spent? As far as I can tell, it’s because of the signaling benefits to the people making the decisions. Startups like OpenAI are motivated to point to the number of GPUs they’re buying as a moat, suggesting that they’ll be the top AI company for years to come because nobody else will be able to catch up with their head start on compute capacity. Hardware projects are also a lot easier to manage than software, they don’t take up so much scarce management attention. Investors are on board because they’ve seen early success turn into long-term dominance before, it’s clear that AI is a world-changing technology so they need to be part of it, and OpenAI and others are happy to absorb billions of dollars of investment, making VCs’ jobs much easier than it would be if they had to allocate across hundreds of smaller companies. Nobody ever got fired for buying IBM, and nobody’s going to get fired for investing in OpenAI.
I’m picking on OpenAI here, but across the industry you can see everyone from Oracle to Microsoft boasting of the amounts of money they’re spending on hardware, and for the same reasons. They get a lot more positive coverage, and a much larger share price boost, from this than they would announcing they’re hiring a thousand engineers to get more value from their existing hardware.
If I’m right, this spending is unsustainable. I was in the tech industry during the dot com boom, and I saw a similar dynamic with Sun workstations. For a couple of years every startup needed to raise millions of dollars just to launch a website, because the only real option was buying expensive Sun servers and closed software. Then Google came along, and proved that using a lot of cheap PCs running open-source software was cheaper and much more scalable. Nvidia these days feels like Sun did then, and so I bet over the next few years there will be a lot of chatbot startups based on cheap PCs with open source models running on CPUs. Of course I made a similar prediction in 2023, and Nvidia’s valuation has quadrupled since then, so don’t look to me for stock tips!