I fuzzed a verified implementation of zlib and found a buffer overflow in the Lean runtime.
AI agents are getting very good at finding vulnerabilities in large-scale software systems.
Anthropic, was apparently so spooked by the vulnerability-discovery capabilities of Mythos, they decided not to release it as it was "too dangerous" (lol). Whether you believe the hype about these latest models or not, it seems undeniable that the writing is on the wall:
The cost of discovering security bugs is collapsing, and the vast majority of software running today was never built to withstand that kind of scrutiny. We are facing a looming software crisis.
In the face of this oncoming tsunami, recently there has been increasing interest in formal verification as a solution. If we state and prove properties about our code using a mechanical tool, can we build robust, secure and stable software that can overcome this oncoming barrage of attacks?
One recent development in the Lean ecosystem has taken steps towards this question. 10 agents autonomously built and proved an implementation of zlib, lean-zip, an impressive landmark result. Quoting from Leo De Moura, the chief architect of the Lean FRO (here):
With apologies for the AI-slop (Leo has a penchant for it, it seems),
the key result is that lean-zip is not just another implementation of
zlib. It is an implementation that has been verified as correct end to
end, guaranteed by Lean to be entirely free of implementation bugs.
What does "verified as correct" actually look like? Here is one of the main theorems (github):
theorem zlib_decompressSingle_compress (data : ByteArray) (level : UInt8)
(hsize : data.size < 1024 * 1024 * 1024) :
ZlibDecode.decompressSingle
(ZlibEncode.compress data level) = .ok data
For any byte array less than 1 gigabyte, calling
ZlibDecode.decompressSingle on the output of ZlibEncode.compress
produces the original data. The decompress function is exactly the
inverse of compression. This pair of functions is entirely correct.
Or is it?
I pointed a Claude agent at lean-zip over a weekend, armed with AFL++,
AddressSanitizer, Valgrind, and UBSan. Over 105 million fuzzing
executions, it found:
- Zero memory vulnerabilities in the verified Lean application code.
- A heap buffer overflow in the Lean 4 runtime (
lean_alloc_sarray), affecting every version of Lean to date. (bug report, pending fix) - A denial-of-service in
lean-zip's archive parser, which was never verified.