预测:人工智能将使形式化验证普及。
AI will make formal verification go mainstream

原始链接: https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html

## AI 与形式化验证的兴起 马丁·克莱普曼预测软件开发领域将迎来重大转变:形式化验证——数学上证明代码正确性——将实现主流应用,这得益于人工智能的进步。 历史上,形式化验证成本高昂且复杂,需要专业知识和巨大投入(例如,验证一个 8,700 行内核需要 20 人年)。 然而,基于 LLM 的编码助手现在能够*编写*证明脚本,从而自动化了以前手动且困难的过程。 这大大降低了成本,使验证对更多项目来说成为可能。 同时,对人工智能生成代码日益依赖,需要强大的验证方法*代替*传统的代码人工审查。 挑战将从*证明*代码转向*指定*代码应该做什么,尽管人工智能甚至可以协助完成这项工作。 虽然不能完全消除错误,但形式化验证为 LLM 的概率性本质提供了一个重要的制衡,确保可靠性。 克莱普曼认为,最大的障碍将是行业内部的文化转变,以拥抱这些现在可行的方法。

相关文章

原文

Published by Martin Kleppmann on 08 Dec 2025.

Much has been said about the effects that AI will have on software development, but there is an angle I haven’t seen talked about: I believe that AI will bring formal verification, which for decades has been a bit of a fringe pursuit, into the software engineering mainstream.

Proof assistants and proof-oriented programming languages such as Rocq, Isabelle, Lean, F*, and Agda have been around for a long time. They make it possible to write a formal specification that some piece of code is supposed to satisfy, and then mathematically prove that the code always satisfies that spec (even on weird edge cases that you didn’t think of testing). These tools have been used to develop some large formally verified software systems, such as an operating system kernel, a C compiler, and a cryptographic protocol stack.

At present, formal verification is mostly used by research projects, and it is uncommon for industrial software engineers to use formal methods (even those working on classic high-assurance software such as medical devices and aircraft). The reason is that writing those proofs is both very difficult (requiring PhD-level training) and very laborious.

For example, as of 2009, the formally verified seL4 microkernel consisted of 8,700 lines of C code, but proving it correct required 20 person-years and 200,000 lines of Isabelle code – or 23 lines of proof and half a person-day for every single line of implementation. Moreover, there are maybe a few hundred people in the world (wild guess) who know how to write such proofs, since it requires a lot of arcane knowledge about the proof system.

To put it in simple economic terms: for most systems, the expected cost of bugs is lower than the expected cost of using the proof techniques that would eliminate those bugs. Part of the reason is perhaps that bugs are a negative externality: it’s not the software developer who bears the cost of the bugs, but the users. But even if the software developer were to bear the cost, formal verification is simply very hard and expensive.

At least, that was the case until recently. Now, LLM-based coding assistants are getting pretty good not only at writing implementation code, but also at writing proof scripts in various languages. At present, a human with specialist expertise still has to guide the process, but it’s not hard to extrapolate and imagine that process becoming fully automated in the next few years. And when that happens, it will totally change the economics of formal verification.

If formal verification becomes vastly cheaper, then we can afford to verify much more software. But on top of that, AI also creates a need to formally verify more software: rather than having humans review AI-generated code, I’d much rather have the AI prove to me that the code it has generated is correct. If it can do that, I’ll take AI-generated code over handcrafted code (with all its artisanal bugs) any day!

In fact, I would argue that writing proof scripts is one of the best applications for LLMs. It doesn’t matter if they hallucinate nonsense, because the proof checker will reject any invalid proof and force the AI agent to retry. The proof checker is a small amount of code that is itself verified, making it virtually impossible to sneak an invalid proof past the checker.

That doesn’t mean software will suddenly be bug-free. As the verification process itself becomes automated, the challenge will move to correctly defining the specification: that is, how do you know that the properties that were proved are actually the properties that you cared about? Reading and writing such formal specifications still requires expertise and careful thought. But writing the spec is vastly easier and quicker than writing the proof by hand, so this is progress.

I could also imagine AI agents helping with the process of writing the specifications, translating between formal language and natural language. Here there is the potential for subtleties to be lost in translation, but this seems like a manageable risk.

I find it exciting to think that we could just specify in a high-level, declarative way the properties that we want some piece of code to have, and then to vibe code the implementation along with a proof that it satisfies the specification. That would totally change the nature of software development: we wouldn’t even need to bother looking at the AI-generated code any more, just like we don’t bother looking at the machine code generated by a compiler.

In summary: 1. formal verification is about to become vastly cheaper; 2. AI-generated code needs formal verification so that we can skip human review and still be sure that it works; 3. the precision of formal verification counteracts the imprecise and probabilistic nature of LLMs. These three things taken together mean formal verification is likely to go mainstream in the foreseeable future. I suspect that soon the limiting factor will not be the technology, but the culture change required for people to realise that formal methods have become viable in practice.

联系我们 contact @ memedata.com