零成本 POSIX 合规性:在 Lean 的类型中编码 Socket 状态机
Zero-Cost POSIX Compliance: Encoding the Socket State Machine in Lean's Types

原始链接: https://ngrislain.github.io/blog/2026-3-25-zerocost-posix-compliance-encoding-the-socket-state-machine-in-lean-4s-type-system/

## Lean 4 与无错误套接字编程 传统语言(如 C、Python 和 Rust)中的套接字库,通常通过运行时检查、文档或寄希望于最佳实践来处理对套接字 API 的潜在错误使用(例如,在未绑定的套接字上发送数据)。这些方法不可避免地将错误检测推迟到运行时。Lean 4 提供了一种新颖的解决方案:**使用依赖类型在编译时强制执行套接字状态机。** Lean 4 将 POSIX 套接字协议的状态(fresh、bound、listening、connected、closed)编码为 `Socket` 结构体内的类型级别参数。像 `bind` 和 `listen` 这样的函数会根据这些状态显式声明前置条件和后置条件。然后,编译器会*验证*这些条件;尝试无效操作(例如,在 `fresh` 套接字上 `send`)会导致类型错误,在*运行前*发生。 至关重要的是,状态信息和证明义务会在编译期间被擦除,从而产生与原始 C 代码一样高效的代码。一个关键特性是通过证明义务来防止双重关闭——编译器会验证套接字是否已经关闭。这种方法消除了运行时开销,同时保证了协议的正确性,提供了一种经过数学验证且性能优越的解决方案。

黑客新闻 新的 | 过去 | 评论 | 提问 | 展示 | 工作 | 提交 登录 零成本 POSIX 兼容性:在 Lean 的类型中编码 Socket 状态机 (ngrislain.github.io) 11 分,由 ngrislain 1 小时前发布 | 隐藏 | 过去 | 收藏 | 1 条评论 帮助 yjftsjthsd-h 4 分钟前 [–] > Lean 4 提供第四种选择:使错误在类型级别不可表示,然后在编译时擦除证明,以便生成的代码与原始 C 相同。 你是否可以在不使用实际证明系统的情况下,在更传统的类型/类系统中做到这一点?与其存在 Socket 类型/类,不如创建 Socket_Fresh、Socket_Bound、Socket_Listening、Socket_Connected,以及可能存在的 Socket_Closed(不确定,需要考虑这是否可行),每个构造函数都接受前一个构造函数。或者这会使其难以使用?回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

The best runtime check is the one that never runs.

The problem

The POSIX socket API is a state machine. A socket must be created, then bound, then set to listen, before it can accept connections. Calling operations in the wrong order — send on an unbound socket, accept before listen, close twice — is undefined behaviour in C.

Every production socket library deals with this in one of three ways:

  1. Runtime checks — assert the state at every call, throw on violation (Python, Java, Go).

  2. Documentation — trust the programmer to read the man page (C, Rust).

  3. Ignore it — let the OS return EBADF and hope someone checks the return code.

All three push the bug to runtime. Lean 4 offers a fourth option: make the bug unrepresentable at the type level, then erase the proof at compile time so the generated code is identical to raw C.

The state machine

Socket state machine

Five states, seven transitions, and one proof obligation. That is the entire POSIX socket protocol. Let us encode it.

Step 1: States as an inductive type

inductive SocketState where
  | fresh      -- socket() returned, nothing else happened
  | bound      -- bind() succeeded
  | listening  -- listen() succeeded
  | connected  -- connect() or accept() produced this socket
  | closed     -- close() succeeded — terminal state
deriving DecidableEq

DecidableEq gives us by decide for free — the compiler can prove any two concrete states are distinct without any user effort.

Step 2: The socket carries its state as a phantom parameter

structure Socket (state : SocketState) where
  protected mk ::
  raw : RawSocket    -- opaque FFI handle (lean_alloc_external)

The state parameter exists only at the type level. It is erased at runtime: a Socket .fresh and a Socket .connected have the exact same memory layout (a single pointer to the OS file descriptor). Zero overhead.

The constructor is protected to prevent casual state fabrication.

Step 3: Each function declares its pre- and post-state

-- Creation: produces .fresh
def socket (fam : Family) (typ : SocketType) : IO (Socket .fresh)

-- Binding: requires .fresh, produces .bound
def bind (s : Socket .fresh) (addr : SockAddr) : IO (Socket .bound)

-- Listening: requires .bound, produces .listening
def listen (s : Socket .bound) (backlog : Nat) : IO (Socket .listening)

-- Accepting: requires .listening, produces .connected
def accept (s : Socket .listening) : IO (Socket .connected × SockAddr)

-- Connecting: requires .fresh, produces .connected
def connect (s : Socket .fresh) (addr : SockAddr) : IO (Socket .connected)

-- Sending/receiving: requires .connected
def send (s : Socket .connected) (data : ByteArray) : IO Nat
def recv (s : Socket .connected) (maxlen : Nat)     : IO ByteArray

The Lean 4 kernel threads these constraints through the program. If you write send freshSocket data, the kernel sees Socket .fresh where it expects Socket .connected, and reports a type error. No runtime check. No assertion. No exception. No branch in the generated code.

Step 4: Double-close prevention via proof obligation

def close (s : Socket state)
          (_h : state ≠ .closed := by decide)
          : IO (Socket .closed)

This is where dependent types shine brightest. The second parameter is a proof that the socket is not already closed. Let us trace what happens for each concrete state:

CallProof obligationby decideResult
close (s : Socket .fresh).fresh ≠ .closedtrivially truecompiles
close (s : Socket .bound).bound ≠ .closedtrivially truecompiles
close (s : Socket .listening).listening ≠ .closedtrivially truecompiles
close (s : Socket .connected).connected ≠ .closedtrivially truecompiles
close (s : Socket .closed).closed ≠ .closedimpossibletype error

For the first four, the default tactic by decide discharges the proof automatically — the caller writes nothing. For the fifth, the proposition .closed ≠ .closed is logically false: no proof can exist, so the program is rejected at compile time.

The proof is erased during compilation. The generated C code is:

lean_object* close(lean_object* socket) {
    close_fd(lean_get_external_data(socket));
    return lean_io_result_mk_ok(lean_box(0));
}

No branch. No flag. No state field. The proof did its job during type-checking and vanished.

Step 5: Distinctness theorems

We also prove that all five states are pairwise distinct:

theorem SocketState.fresh_ne_bound     : .fresh ≠ .bound     := by decide
theorem SocketState.fresh_ne_listening : .fresh ≠ .listening := by decide
theorem SocketState.fresh_ne_connected : .fresh ≠ .connected := by decide
theorem SocketState.fresh_ne_closed    : .fresh ≠ .closed    := by decide
theorem SocketState.bound_ne_listening : .bound ≠ .listening := by decide
-- ... (11 theorems total)

These are trivially proved by decide (the kernel evaluates the BEq instance). They exist so that downstream code can use them as lemmas without re-proving distinctness.

What the compiler actually rejects

-- ❌ send on a fresh socket
let s ← socket .inet .stream
send s "hello".toUTF8
-- Error: type mismatch — expected Socket .connected, got Socket .fresh

-- ❌ accept before listen
let s ← socket .inet .stream
let s ← bind s addr
accept s
-- Error: type mismatch — expected Socket .listening, got Socket .bound

-- ❌ double close
let s ← socket .inet .stream
let s ← close s
close s
-- Error: state ≠ .closed — proposition .closed ≠ .closed is false

-- ✅ correct sequence
let s ← socket .inet .stream
let s ← bind s ⟨"0.0.0.0", 8080⟩
let s ← listen s 128
let (conn, addr) ← accept s
let _ ← send conn "HTTP/1.1 200 OK\r\n\r\n".toUTF8
let _ ← close conn
let _ ← close s

Try it yourself

Open this example in the Lean 4 Playground — try uncommenting the failing examples to see the type errors live.

Uncomment bad_double_close or bad_send_fresh and the kernel rejects the program immediately — the error messages tell you exactly which state transition is invalid.

The punchline

ApproachLines of state-checking codeRuntime costCatches at
C (man page)00never (UB)
Python (runtime)~50branch per callruntime
Rust (typestate)~300compile-time
Lean 4 (dependent types)00compile-time

Lean 4 is unique: the proof obligation state ≠ .closed is a real logical proposition that the kernel verifies. It is not a lint, not a static analysis heuristic, not a convention. It is a mathematical proof of protocol compliance, checked by the same kernel that verifies Mathlib's theorems — and then thrown away so the generated code runs at the speed of C.

---

This post is part of the Hale documentation — a port of Haskell's web ecosystem to Lean 4 with maximalist typing.

联系我们 contact @ memedata.com