编译器是确定性的吗?
Are compilers deterministic?

原始链接: https://blog.onepatchdown.net/2026/02/22/are-compilers-deterministic-nerd-version/

## 可复现构建:计算机科学 vs. 工程视角 编译器(现在包括LLM)*必须*是确定性的问题,既有理论答案,也有实践答案。从计算机科学的角度来看,编译器*应该*是确定性的——相同的输入产生相同的输出。然而,在现实世界的工程中,实现这一点非常困难。 构建输出受多种因素影响,不仅仅是源代码,还包括编译器版本、系统环境、时间戳,甚至硬件调度。这些“噪声”元素会导致构建漂移,这是在2000年代内核补丁工作中深刻体会到的教训。 虽然编译器旨在保持*语义*等价性(相同的行为,不一定是相同的代码),但真正可复现的构建需要深思熟虑的工程:冻结工具链、控制环境和仔细剥离元数据。像Debian可复现构建项目这样的努力表明这是可以实现的,从而产生可重复、可验证和封闭的构建。 将此应用于LLM,工程方法占主导地位。虽然不能保证完全的确定性,但控制输入、使用测试验证输出以及使用可复现的流水线至关重要。就像传统软件一样,一个“概率系统”在采取适当的保障措施后,仍然可以提供在操作上更好的结果。

一个黑客新闻的讨论围绕着编译器是否真正具有确定性。核心论点是,虽然*理论上*编译器应该为相同的输入产生相同的输出,但*实际上*实际构建经常引入非确定性因素。这些因素包括时间戳和 UUID 等包含在构建过程中的内容,这意味着即使代码相同,输出也可能“漂移”。 一位评论员指出,编译器旨在保留语义——功能上等效的输出是可以接受的,即使指令顺序不同。另一位评论员强调了确定性对于“可验证构建”的重要性。 原始发帖者澄清他们无意声称 LLM 解决了停机问题,并欢迎对他们文章中相关部分的重写建议。最终,讨论归结于理论确定性与工程现实和不可控变量之间的差异。
相关文章

原文

Betteridge says “no,” and for normal developer experience that answer is mostly right. (Also, you’re absolutely right! and here’s an em—dash so that you know that I used ChatGPT to help me write this.)

Here’s my take. There’s a computer science answer and an engineering answer. The computer science answer: a compiler is deterministic as a function of its full input state. Engineering answer: most real builds do not control the full input state, so outputs drift.

I worked at Ksplice back in the 2000s, where we patched running Linux kernels in RAM so you could take security updates without rebooting. Reading objdump output of crashy kernels was not daily routine, but I had to do it often enough that “compiler output versus source intent” stopped being theoretical.

Formally:

artifact = F(
  source,
  flags,
  compiler binary,
  linker + assembler,
  libc + runtime,
  env vars,
  filesystem view,
  locale + timezone,
  clock,
  kernel behavior,
  hardware/concurrency schedule
)

Most teams hold only source and maybe flags constant, then call everything else “noise.” That “noise” is where non-reproducibility lives.

I learned this hard at Ksplice in the 2000s. We generated rebootless Linux kernel updates by diffing old vs new compiled output and stitching hot patches into live kernel memory. Most diffs mapped cleanly to changed C. Sometimes they exploded for reasons that were not semantic source changes: register allocation differences, altered pass behavior, section/layout changes. Same intent, different machine code.

If you want a concrete historical artifact, GCC bug 18574 has a gcc-bugs thread calling out pointer-hash instability affecting traversal order and SSA coalescing.

That distinction matters:

  • deterministic compiler: same complete input tuple -> same output
  • reproducible build: two independent builders recreate bit-identical output
  • reliable toolchain: differences rarely matter functionally

Related concepts, not equivalent guarantees.

Compiler Contract: Semantics, Not Byte Identity

The commenter is right on this point: compilers are expected to preserve semantics. For programs with defined behavior, the output should be observationally equivalent to the source language’s abstract machine.

That means instruction order, register choice, inlining strategy, and block layout are fair game as long as externally visible behavior stays the same. In practice, “visible behavior” means things like I/O effects, volatile accesses, atomic synchronization guarantees, and defined return values, not byte-for-byte instruction identity.

Important caveats:

  • undefined behavior weakens or voids the semantic guarantee
  • timing, microarchitectural side channels, and exact memory layout are usually outside the core language contract
  • reproducible builds are a stricter goal than semantic preservation (same bits, not just same behavior)

Where Entropy Comes From

  • __DATE__, __TIME__, __TIMESTAMP__
  • embedded absolute paths in DWARF/debug info
  • build path leakage (for example /home/fragmede/projects/foo)
  • locale-sensitive sort behavior (LC_ALL)
  • filesystem iteration order
  • parallel build and link race ordering
  • archive member order and metadata (ar, ranlib)
  • build IDs, UUIDs, random seeds
  • network fetches during build
  • toolchain version skew
  • host kernel/c library differences
  • historical compiler internals depending on unstable pointer/hash traversal order

ASLR note: ASLR does not directly randomize the emitted binary. It randomizes process memory layout. But if a compiler pass behavior depends on pointer identity/order, ASLR can indirectly perturb outcomes.

So “compilers are deterministic” is often true in a theorem sense and false in an operational sense. And even with reproducible artifacts, Ken Thompson’s Reflections on Trusting Trust still applies. Keep in mind, too, that compilers are not new tech: Grace Hopper’s A-0 system dates to 1952 on UNIVAC. ChatGPT’s only been around 4 years to compiler’s 74?

Reproducible Builds: Deliberate Engineering

Debian and the broader reproducible-builds effort (around 2013 onward) pushed this mainstream: same source + same build instructions should produce bit-for-bit identical artifacts.

The practical playbook:

  • freeze toolchains and dependencies
  • stable environment (TZ=UTC, LC_ALL=C)
  • set SOURCE_DATE_EPOCH
  • normalize/strip volatile metadata
  • canonicalize path prefixes (-ffile-prefix-map, -fdebug-prefix-map)
  • deterministic archives (ar -D)
  • remove network from the build graph
  • build in hermetic containers/sandboxes
  • continuously diff artifacts across builders in CI

That gets you:

  • Repeatable
  • Reproducible
  • Verifiable
  • Hermetic
  • Deterministic

Do we have this now? In many ecosystems, mostly yes. But it took years of very intentional work across compilers, linkers, packaging, and build systems. We got here by grinding through weird edge cases, not by waving our hands and declaring purity.

Why This Matters For LLMs

This comes up now as “is vibecoding sane if LLMs are nondeterministic?” Again: do you want the CS answer, or the engineering answer?

We have, and have not, solved the halting problem with LLMs. We have not remotely solved the halting problem in the formal sense But for practical purposes, if I write a forloop and mess up the condition, an LLM can look at my code, tell me I’m being dumb, and then it can go fix it for me.

Engineering has never depended on perfectly deterministic intelligence. It depends on controlled interfaces, test oracles, reproducible pipelines, and observability. I’m AI-pilled enough to daily-drive comma.ai, and I still want deterministic verification gates around generated code. My girlfriend prefers when I let it drive because it’s smoother and less erratic than I am, which is a useful reminder that “probabilistic system” and “operationally better result” can coexist.

Same pattern for LLM-assisted coding:

  • constrain inputs
  • make outputs testable
  • gate with deterministic CI
  • require reproducible artifacts
  • treat stochastic generation as upstream, not deploy-time truth

Computer science answer: nondeterminism is scary. Engineering answer: control boundary conditions, verify outputs, ship.

And yes, part of this argument is existential: most of us are still in the rent-paying business, not the philosophy business. So we use the tools that move work forward, then build the guardrails we need.

联系我们 contact @ memedata.com