7/2/2025 by Kadhir
So far, I've only used LLMs as an assistant, where I'm doing something, and an LLM helps me along the way. Code autocomplete feels like a great example of how useful it can be when it gets it right. I don't doubt that over time this will improve, but I'm excited to see a more significant transition from this assistant mode to a compiler mode, at least for coding.
It will be exciting when we focus solely on the context we fed the LLM, then test the features it generates rather than the code. And importantly, we let the LLM handle integrating new features into the existing codebase. That means we no longer examine the code. Our time as engineers will be spent handling context, testing features, and iterating on them.
The consequence of that seems to be:
- Democratize access to engineering
- You don't need as specialized skillsets to build complex apps, you just need to know how to put context together and iterate
- Increase the velocity of feature development
- My gut says dealing with context will result in a better ratio of engineering time into features shipped than dealing with code directly
The obvious pushback here is, well, compilers are provable. There's a straightforward mapping between inputs and outputs, and we can prove the outputs are the same each time. We can also write tests to ensure the outputs are optimized.
But if we squint, a compiler transforms an input into an output. If we treat the code as an intermediate layer, viewing the input as context and the output as features, then we can demonstrate that the compiler is reliable through evaluations and testing. And importantly, we don't have to get the output right in the first go, we can let it iterate over and over until it gets it right. A new kind of compiler.
So I propose that if we get LLM-as-a-compiler, as a software engineer, I will go through this cycle:
- Put together the context
- Which includes a series of tests for the final behavior (perhaps I use an LLM for this)
- I put it through the LLM compiler
- Which is probably a system composed of several things
- Which continually iterates on the output until all the tests pass
- Ideally, as the LLM compiler gets better, the latency gets lower and lower
- We cache the output (code) for performance improvements
- I decide how I need to edit the context, and go back to step 1
SWE agents feel like they're the right abstraction on this path; they convert context into features, iterating in the background. They feel like they'll be an integral part of the LLM compiler system, which I think will have the following pieces:
- A way to specify the context of my app
- And a way to specify which part of my context to focus on
- Mechanism for specifying my reward signal (my tests)
- A system for monitoring the changes happening
- And a way to redirect parts of the compiler if it's not doing what I expect
- Over time, I'd expect this part to evolve and the need to see the code to reduce
Resources