LLM 可以是，但不应该成为编译器。

LLM 可以是，但不应该成为编译器。
LLMs could be, but shouldn't be compilers

原始链接: https://alperenkeles.com/posts/llms-could-be-but-shouldnt-be-compilers/

## LLM：编译器还是危险的捷径？大型语言模型（LLM）的兴起引发了争论：它们是编译器的下一个演进，允许我们用自然语言“编程”吗？历史上，编程语言一直在抽象复杂性——从机器代码到循环和变量等高级结构——而LLM提出了独特的挑战。传统的编译器用易用性换取*控制*，提供明确的语义和可测试的保证。然而，LLM操作的是本质上不明确的指令。即使是“无幻觉”的LLM仍然会根据模糊的提示从无数可能性中*选择*一个实现。这将关键的设计决策——数据模型、错误处理、安全性——让给了模型，可能导致意想不到的后果。我们面临的风险是成为生成代码的消费者，而不是深思熟虑的生产者，接受与我们的意图并不完全一致的“合理”解决方案。核心问题不仅仅是不可预测性，而是我们在规范方面固有的懒惰。 LLM在*精确*的指令和强大的测试方面表现出色，这突出了规范软件往往比构建它更困难。如果LLM真的能够“规范即构建”，那么规范和验证将成为最重要的技能。虽然LLM *可以* 像编译器一样将意图转换为代码，但失去的控制权是前所未有的。我们必须培养“规范的意愿”，并优先进行严格的验证，以避免被动接受我们不完全理解的软件。

## LLM 作为编译器：Hacker News 讨论最近 Hacker News 上进行了一场讨论，主题是使用大型语言模型 (LLM) 作为编译器——将英语翻译成代码。虽然通过极其详细的提示在技术上是可行的，但评论者普遍认为这不是*正确*的方法。核心论点是 LLM 本质上是非确定性的，这使得可靠的编译变得不可能。即使实现了确定性，用所需的精度指定意图也会效率低下且繁琐。一位用户将其比作用钻头敲钉子——在技术上可行，但最终是一种 flawed 的解决方案。此外，讨论还强调了对 Python 等语言编码流畅性的偏好，在这些语言中“边编码边思考”更容易，而 Haskell 等语言则更加 rigid。LLM 同样常常需要过于具体的指令，从而阻碍了自然的编码流程。最终，大家的观点倾向于认识到 LLM 的优势在于*超越*简单地取代传统的编码方法。

原文

I’ve been going round and round in my mind about a particular discussion around LLMs: are they really similar to compilers? Are we headed toward a world where people don’t look at the underlying code for their programs?

People have been making versions of this argument since Andrej Karpathy’s “English is the hottest new programming language.” Computer science has been advancing language design by building higher and higher level languages; this is the latest iteration: maybe we no longer need a separate language to express ourselves to machines; we can just use our native tongues (let alone English).

My stance has been pretty rigid for some time: LLMs hallucinate, so they aren’t reliable building blocks. If you can’t rely on the translation step, you can’t treat it as a serious abstraction layer because it provides no stable guarantees about the underlying system.

As models get better, hallucinations become less central (even though models still make plenty of mistakes). Lately I’ve been thinking about a different question: imagine an LLM that never “hallucinates” in the usual sense, one that reliably produces some plausible implementation of what you asked. Would that make it the next generation of compiler? And what would that mean for programming and software engineering in general?

This post is my stab at that question. The core of my argument is simple:

Specifying systems is hard; and we are lazy.

Before getting to what that means in practice, I want to pin down something else: what does it mean for a language to be “higher level”?

Programming is, at a fundamental level, the act of making a computer do something. Computers are very dumb from the point of view of a human. You need to tell the computer exactly what to do, there's no inference. A computer fundamentally doesn't even have the notion of a value, type, concept; everything is a series of bits, which are processed to generate other bits, we bring meaning to this whole ordeal. Very early on, people have started by building arithmetic and logical instructions into computers, you would have 2 different bit sequences each denoting a number, you could add, subtract, multiply them. In order to make a computer do something, you could denote your data in terms of a bunch of numbers, map your logical operations onto those ALU instructions, and interpret the result in your domain at the end. Then, you can define a bunch of operations on your domain, which will be compiled down to those smaller ALU instructions, and voila, you have a compiler at hand.

This compiler is, admittedly, kind of redundant. It doesn't do anything you would be able to do because you essentially have a direct mapping between your two languages, your higher level language desugars into a bunch of lower level ALU instructions, so anyone would be able to implement the same mapping very easily, and even go further, perhaps just write the ALU instructions themselves.

What real higher level languages do is they give you an entirely new language that is eventually mapped to the underlying instruction set in non-trivial mechanisms in order to reduce the mental complexity on the side of the programmer. For instance, instruction sets do not have the concept of variables, nor loops, nor data structures. You can definitely build a sequence of instructions that amount to a binary search tree, but the mental burden of the process is orders of magnitude higher than any classic programming language. Structs, Enums, Classes, Loops, Conditionals, Exceptions, Variables, Functions are all properties that exist in higher level languages that are compiled away when going down the stack.

There's a crucial aspect of compilation, which is that the programmer gives away some control, that's essentially what removes the mental burden. If a programming language doesn't give away any control, it arguably isn't a very useful abstraction layer, because it did not absolve you of any responsibility that comes with that control. One of the first examples of this type of control we gave away is code layout. If you are writing handwritten assembly, you control where the code lives in the program memory. When you go into a language with structured control flow with callable procedures, you now don't have exact control over when the instructions for a particular piece of code is fetched, how basic blocks are arranged in the memory. Other examples are more common, the runtime of a language works in the background to absolve you of other responsibilities such as manual memory management, which itself was an abstraction for automatically managing how your data is organized in memory in the first place.

This loss of control raises a question: how do we know the abstraction is implemented correctly? More importantly, what does it mean for an abstraction to be correct?

There are a few layers to the answer. First, mature abstractions are defined against some semantics: what behaviors are allowed, what behaviors are forbidden, and what guarantees you’re meant to rely on. In C, malloc gives you a pointer to a block of memory of at least the requested size (or NULL), suitably aligned, which you may later free. It doesn’t give you “exclusive ownership” in the language-theoretic sense, but it does define a contract you can program against.

Second, we validate implementations with testing (and sometimes proofs), because these guarantees are at least in principle checkable. Third, in practice, guarantees are contextual: most programs care that allocation works; only some care deeply about allocator performance, fragmentation behavior, or contention, those are the cases where people swap allocators or drop down a level.

This highlights a critical point: abstraction guarantees aren’t uniform; they’re contextual. Most of the time, that contextuality is dominated by functional correctness: “does it do what it says?” Programming languages made enormous progress by giving us abstractions whose functional behavior can be specified precisely and tested relentlessly. We can act as if push/pop on a Python list has the same semantics as a vector in C++ even when the underlying implementation differs wildly across languages and runtimes.

LLM-based programming challenges this domination because the “language” (natural language) doesn’t come with precise semantics. That makes it much harder to even state what functional correctness should mean without building a validation/verification suite around it (tests, types, contracts, formal specs).

This gets to my core point. What changes with LLMs isn’t primarily nondeterminism, unpredictability, or hallucination. It’s that the programming interface is functionally underspecified by default. Natural language leaves gaps; many distinct programs can satisfy the same prompt. The LLM must fill those gaps.

Just as a garbage-collected runtime takes control over how and when memory is reclaimed, “programming in English” relinquishes control over which exact program gets built to fulfill your requirements. The underspecification forces the model to guess the data model, edge cases, error behavior, security posture, performance tradeoffs in your program, analogous to how an allocator chooses an allocation strategy.

This creates quite a novel danger in how we write programs.

Humans have always written vague requirements; that part isn’t new. What’s new is how directly an LLM can turn vagueness into running code, inviting us to outsource functional precision itself. We can leave meaningful behavioral choices to a generator and only react to the outcome.

If you say “give me a note-taking app,” you’re not describing one program, you’re describing a huge space of programs. The LLM can return one of a billion “reasonable” implementations: something Notion-like, Evernote-like, Apple Notes-like, or something novel. The danger is that “reasonable” choices can still be wrong for your intent, and you won’t notice which commitments got made until later.

This pushes development toward an iterative refinement loop: write an imprecise spec, get one of the possible implementations, inspect it, refine the spec, repeat until you’re satisfied. In this mode, you become more like a consumer selecting from generated artifacts than a producer deliberately constructing one.

And you also lose something subtle: when you hand-build, the “space of possibilities” is explored through design decisions you’re forced to confront. With a magic genie, those decisions get made for you; you only see the surface of what you ended up with.

I don’t think this point is widely internalized yet: hallucinations aren’t the only problem. Even in a hallucination-free world, the ability to take the easy way out on specification plays into a dangerously lazy part of the mind. You can see it in the semi-conscious slips (I’m guilty too): accept-all-edits, “one more prompt and it’ll be fine,” and slow drifting into software you don’t really understand.

That’s why I think the will to specify is going to become increasingly important. We already see LLMs shine when they’re given concrete constraints: optimization, refactors, translations, migrations, tasks that used to be so labor-intensive we’d laugh at the timeline, become feasible when the target behavior is well specified and backed by robust test suites.

It’s been true for a long time that specifying a piece of software is often harder than building it. But we may be entering a world where: if you can specify, you can build. If that’s right, then specification and verification become the bottleneck, and therefore the core skill.

This isn’t my most polished post, but I wanted to get the idea out. I do think it’s possible to treat LLMs as compiler-like, in the loose sense that they translate a specification into an executable artifact. But the control we relinquish to that translation layer is larger than it has ever been.

Traditional compilers reduce the need to stare at lower layers by replacing low-level control with defined semantics and testable guarantees. LLMs also reduce the need to read source code in many contexts, but the control you lose isn’t naturally bounded by a formal language definition. You can lose control all the way into becoming a consumer of software you meant to produce, and it’s frighteningly easy to accept that drift without noticing.

So: I don’t think we should fully accept the compiler analogy without qualification. As LLMs become a central toolchain component, we’ll need ways to strengthen the will to specify, and to make specification and verification feel as “normal” as writing code used to.

LLM 可以是，但不应该成为编译器。 LLMs could be, but shouldn't be compilers

LLM 可以是，但不应该成为编译器。
LLMs could be, but shouldn't be compilers