正式地使用Lean4验证PBS儿童节目

正式地使用Lean4验证PBS儿童节目
Formally Verifying PBS Kids with Lean4

原始链接: https://www.shadaj.me/writing/cyberchase-lean

这篇帖子详细描述了使用Lean定理证明器对《网络追击》（Cyberchase）一集中策略的正式验证。作者受到该节目强调数学原理推导的启发，解决了一个游戏中，玩家移除龙，但要避免移除一只会导致失败的红色龙的问题。核心挑战在于证明，将对手逼入能被四整除的状态（“毒数”）的策略保证获胜。这需要用Lean定义游戏规则，解决递归函数可能导致的非终止问题（通过证明游戏状态总是减少来解决），并最终运用数学归纳法。这个过程突出了Lean的力量和复杂性，展示了如`rw`（重写）、`simp`（简化）、`induction`（归纳）等策略，以及利用现有和新创建的定理。作者强调了Lean的严格检查，即使对于由自动化策略（如`omega`）生成的步骤，也能确保证明的正确性。最终，完整的、经过正式验证的证明可在GitHub上找到，展示了像Lean这样的工具如何超越直观理解，为复杂系统提供绝对的确定性，其应用范围延伸到分布式系统和编译器验证等领域。作者鼓励读者探索Lean，从自然数游戏开始。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录正式使用Lean4验证PBS儿童节目 (shadaj.me) 11点由 shadaj 2小时前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

Cyberchase is an educational TV show aimed at children ages 8 - 12 that has been airing on PBS Kids since 2002. As someone who grew up without cable, PBS Kids was the go-to channel after school. From elementary through middle school, Cyberchase became a weekly staple that I enjoyed personally and was approved by my parents for its mathematical content.

The show features three kids (Matt, Jackie, and Inez, known as the Cybersquad), who are recruited by a friendly AI (Motherboard) to protect the world (Cyberspace) from an evil villain (Hacker, voiced by Christopher Lloyd of Back to the Future fame). Hacker’s mayhem leads to some creative puzzles, which the Cybersquad solve by applying their math skills.

Cyberchase has a strong focus on discovering math concepts from first principles. Rather than being taught facts axiomatically, the Cybersquad must first derive a general principle before they can apply it to defeat Hacker. At least for me, this was extremely effective both in getting me interested in math and developing fundamental math skills. I attribute a lot of my success studying computer science to watching Cyberchase as a kid.

Software engineering faces a very similar set of problems. Modern systems are composed of several complex pieces of software, with axiomatic assumptions about the behavior of these building blocks. These assumptions range from trivial, such as handling integer overflow, to incredibly complex: transaction isolation in a distributed database. Incorrect assumptions about the provided behavior can lead to catastrophic consequences.

Interactive Theorem Proving aims to solve this problem. These programming languages allow us to write down proofs of math and code as a composition of mathematical facts, which themselves come with proofs—all the way down to fundamental axioms of mathematics. Such formal proofs can be easily checked by a computer (any mistakes show up as type errors) and the checker itself is very unlikely to have bugs because the proof captures all details down to the lowest level¹11. This is in contrast to SMT solvers, which are more autonomous but involve complex and sometimes-buggy heuristics. There is ongoing work for SMT solvers to emit their results in a formal proof language for verification. This is in contrast to SMT solvers, which are more autonomous but involve complex and sometimes-buggy heuristics. There is ongoing work for SMT solvers to emit their results in a formal proof language for verification. .

In this blog post, we’ll take a look at Lean, a popular theorem proving language used by mathematicians such as Terry Tao²22. And created by my colleagues in the Automated Reasoning Group at AWS. And created by my colleagues in the Automated Reasoning Group at AWS. . We’ll use Lean to formally model one of my favorite episodes of Cyberchase: "Problem Solving in Shangri-La"³33. Evidently, a favorite episode for many mathematicians. I’ve found comments on r/math discussing it Evidently, a favorite episode for many mathematicians. I’ve found comments on r/math discussing it . As we develop the proof, we’ll explore how the Lean language works, techniques for writing proofs in Lean, and why Lean is so powerful.

If you have 20 minutes to spare (or less if you watch at 2x speed), I highly recommend that you watch the original episode of Cyberchase. It’s a lot of fun, and will set up the intuition we’ll need for the formal proof.

Let’s get to the core puzzle at hand. The Cybersquad are competing against Hacker in a turn-based strategy game⁴44. In fact, it is a variant of Nim, a well-studied combinatorial game. In fact, it is a variant of Nim, a well-studied combinatorial game. . Between the two teams, there is a pool of 15 dragons: 14 green and one red. Each team must take turns removing 1, 2, or 3 dragons from this shared pool. If a team takes the red dragon, they lose!

We’ll begin our formal proof by defining the behavior of the game. In Lean, we can define functions just like a regular language, but the types and syntax will be a bit more aligned with math. For example, we will use Nat to take natural numbers ( $n \geq 0$

Some math notation that will be useful throughout this blog post:

Nat / $\mathbb{N}$
$\lnot x$
$\lor$
$\land$

First, let us define a strategy that the Cybersquad will follow during the game. To simplify our proof later, the input to the strategy will be the number of green dragons left (that way 0 is still a valid state). In their first game against Hacker, the squad doesn’t quite have a strategy; let’s write down a similarly naive strategy of always taking one dragon.

def squadStrategy (green_dragons: Nat): Nat :=
  1

Next, we need to write a simulator that determines if the squad can win with their strategy when playing against Hacker, starting with a given number of green dragons. There is an immediate issue: Hacker is our enemy and we do not know what his strategy will be. If we want the squad’s strategy to be foolproof, we have to assume the worst case for Hacker: that he will always make the best move.

We can model this with a non-deterministic policy for Hacker. We will simulate all possible decisions that Hacker may take at each step, and check if any of those decisions (taking one, two, or three dragons) lead to a victory. This branching structure captures all possible strategies for Hacker, since any sequence of decisions is one path down the tree.

We can implement the simulator as a pair of functions: one that simulates a step by the squad and one that simulates a step by Hacker. Each function will return whether the squad or Hacker (respectively) would win the game after making that move (given the current game state—the number of green dragons—as a parameter).

This approach results in a natural recursive definition: the squad wins if after making their move Hacker does not win from the resulting state, and vice-versa (with several ∨ for Hacker to capture the non-deterministic strategy). Because these functions are mutually recursive (each invokes the other), Lean asks us to wrap the pair in a mutual block.

mutual
def squadWins (green_dragons: Nat): Bool :=
  if green_dragons = 0 then
    False
  else
    ¬ hackerWins (green_dragons - (squadStrategy green_dragons))

def hackerWins (green_dragons: Nat): Bool :=
  if green_dragons = 0 then
    False
  else
    ¬(squadWins (green_dragons - 3)) ∨ ¬(squadWins (green_dragons - 2)) ∨ ¬(squadWins (green_dragons - 1))
end

If we add this code, Lean will raise a error! It is concerned that the mutual recursion between these two functions may never terminate, which is not allowed in Lean (since then they would not be well-formed mathematical functions).

fail to show termination for
  squadWins
  hackerWins

...

  failed to eliminate recursive application
    hackerWins (green_dragons - squadStrategy green_dragons)

...

green_dragons : ℕ
h✝ : ¬green_dragons = 0
⊢ green_dragons - squadStrategy green_dragons < green_dragons

Lean helpfully provides a proof objective that is sufficient to show termination. If we can show that the total number of green dragons reduces after the squad makes their move, we know that eventually the game will terminate since number of dragons will go to zero⁵55. The eagle-eyed will notice this objective is actually unnecessarily strong. Even if the squadStrategy does not decrease the dragons, hackerWins always does so the function will terminate. But proving that requires a bit more trickery with setting up a measure. The eagle-eyed will notice this objective is actually unnecessarily strong. Even if the squadStrategy does not decrease the dragons, hackerWins always does so the function will terminate. But proving that requires a bit more trickery with setting up a measure. . In Lean, all proof objectives are displayed with the same notation:

var1: type1
var2: type2
...
⊢ logical expression that we want to prove

Variables in a proof can hold a value (such as green_dragons above, which is a value in $\mathbb{N}$

This is a great time to write our first proof in Lean! Before we dive into the details, it is helpful to set up the proof outline and check that Lean is satisfied. To locally prove a fact, we use the have keyword, which defines a local variable of some type (which, as a reminder, will be the fact itself). The right-hand sign of the declaration will contain the proof of this fact, using the by keyword to enter proof-building mode.

mutual
def squadWins (green_dragons: Nat): Bool :=
  if green_dragons = 0 then
    False
  else
    
    have decreasing: (squadStrategy green_dragons) > 0 := by
      sorry

    
    ¬ hackerWins (green_dragons - (squadStrategy green_dragons))


end

The sorry keyword in Lean gives you a "proof by laziness" — it lets you prove anything and is helpful to scaffold a proof without filling in all the details. You’ll notice that I’ve actually adjusted the proof goal a bit to make it simpler. Instead of showing that recursive call has a smaller parameter, we will prove that the squad always removes at least one dragon (which implies the former). Lean is smart enough to know that:

\mathit{green\_dragons} \neq 0 \land x > 0 \implies \mathit{green\_dragons} - x < \mathit{green\_dragons}

正式地使用Lean4验证PBS儿童节目 Formally Verifying PBS Kids with Lean4

正式地使用Lean4验证PBS儿童节目
Formally Verifying PBS Kids with Lean4