形式化验证 Advent of Code 使用 Dijkstra 程序构造

形式化验证 Advent of Code 使用 Dijkstra 程序构造
Formally verifying Advent of Code using Dijkstra's program construction

原始链接: https://haripm.com/blog/aoc-day-3-without-thinking/

这篇帖子详细介绍了作者使用UCD程序构造模块中学到的原理，以及埃兹赫尔·W·戴克斯特拉的结构化编程思想来解决最近的Advent of Code问题的方法。核心思想是形式化程序验证——定义前置条件和后置条件，并推导定理来证明正确性。作者使用量化符号细致地构建了一个“领域模型”，以表示问题的逻辑。这包括定义函数（如`C.n`）并通过定理建立基本情况和递归关系。这个过程展示了如何将问题分解为更小、可证明的步骤。虽然这种形式化方法可以得到经过严格验证的解决方案，但将其翻译成Gleam等实用语言时，会导致代码复杂且可读性降低。作者承认这种权衡，指出该方法对于绝对正确性至关重要的关键软件具有价值，尽管它不适用于日常编程。最终，这项练习是对理论概念应用于实际编码挑战的有效实践。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录正式验证 Advent of Code 使用 Dijkstra 的程序构造 (haripm.com) 7 分，来自 seafoamteal 2 小时前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

I’m doing Advent of Code again this year, and part 1 of today’s problem reminded me immediately of some of the problems I’m doing in my Program Construction module at UCD. In the class, we cover the foundations of Edsger W. Dijsktra’s Structured Programming. It teaches you how to formally verify your program by finding the pre-conditions and post-conditions, then deriving and proving theorems that build up towards the final program.

It’s a very different style of thinking about programming than most people are used to, but I’ll try my best to explain the notation and the logic I’m using.

We start out by writing our post-condition. This is what we want to be true once our program has finished running — in other words, it’s what we want to calculate. We’re going to use this funky-looking syntax called Quantified Notation.

As an intro, here’s a basic quantified expression:

$<+\;i : 0 \leq i < N: f.i>$

$f.i$

$f.0 + f.1 + f.2 + f.3 + \dotsb + f.(N - 1)$

For those of you more familiar with functional programming, you’ll find that this is just a reduction over a list. $f$

With that exposition out of the way, here’s the postcondition for our AoC problem (for a single bank of batteries). You should go read the problem statement over on the AoC website for this to make sense :)

$<\uparrow i, j : 0 \leq i < j < N : 10 \cdot f.i + f.j>$

This is a quantification over two variables: $i$

Now we get to build our domain model. It’s the collection of definitions and theorems that we can use later on when constructing our program.

Model

(0) $C.n \equiv <\uparrow i, j : 0 \leq i < j < n : 10 \cdot f.i + f.j> \quad,\quad 2 \leq n \leq N$

We extract our post-condition into a reusable function defined over all $n$

Now observe:

$\quad C.2 \\ =\quad\quad\quad {\text{(0)}} \\ \quad <\uparrow i, j : 0 \leq i < j < 2 : 10 \cdot f.i + f.j> \\ =\quad\quad\quad {\text{split off j = 1}} \\ \quad <\uparrow i, j : 0 \leq i < j < 1 : 10 \cdot f.i + f.j> \; \uparrow \; <\uparrow i : 0 \leq i \lt 1 : 10 \cdot f.i + f.1>\\ =\quad\quad\quad {\text{empty range, single point}} \\ \quad id_\uparrow \; \uparrow \; 10 \cdot f.0 + f.1 \\ =\quad\quad\quad {\text{arithmetic}} \\ \quad 10 \cdot f.0 + f.1 \\$

So now we know that:

(1) $C.2 = f.0 + f.1$

Now that we have our “base case”, or the initial value of our accumulator, we can look into using associativity to find $C.(n+1)$

Observe again:

$\quad C.(n+1) \\ =\quad\quad\quad {\text{(0)}} \\ \quad <\uparrow i, j : 0 \leq i < j < n + 1 : 10 \cdot f.i + f.j> \\ =\quad\quad\quad {\text{split off j = n}} \\ \quad <\uparrow i, j : 0 \leq i < j < n : 10 \cdot f.i + f.j> \; \uparrow \; <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n>\\ =\quad\quad\quad {\text{(0)}} \\ \quad C.n \; \uparrow \; <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n>\\ =\quad\quad\quad {\text{(3)}} \\ \quad C.n \; \uparrow \; D.n\\$

This gives us

(2) $C.(n+1) \equiv C.n + D.n \quad,\quad 2 \leq n \lt N$

and

(3) $D.n \equiv <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n> \quad,\quad 2 \leq n \lt N$

Did you see how I sneakily used (3) up there before defining it properly? Let’s actually simplify $D.n$

Observe once more:

$\quad D.n \\ =\quad\quad\quad {\text{(3)}} \\ \quad <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n> \\ =\quad\quad\quad {+ \: \text{distributes over} \: \uparrow} \\ \quad <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i> + f.n \\ =\quad\quad\quad {* \: \text{distributes over} \: \uparrow} \\ \quad 10 \: \cdot <\uparrow i : 0 \leq i \lt n : f.i> + f.n \\ =\quad\quad\quad {\text{(6)}} \\ \quad 10 \: \cdot E.n + f.n \\$

So:

(3) $D.n \equiv 10 \cdot E.n + f.n \quad,\quad 2 \leq n \lt N$

And:

(6) $E.n \equiv \: <\uparrow i : 0 \leq i \lt n : f.i> \quad,\quad 2 \leq n \lt N$

What happened to (4) and (5), you ask? Well, we have to derive the base case and associative case for $D.n$

Observe:

$\quad E.2 \\ =\quad\quad\quad {\text{(6)}} \\ \quad <\uparrow i : 0 \leq i \lt 2 : f.i> \\ =\quad\quad\quad {\text{split off i = 1}} \\ \quad <\uparrow i : 0 \leq i \lt 1 : f.i> \: \uparrow f.1 \\ =\quad\quad\quad {\text{single point}} \\ \quad f.0 \: \uparrow f.1 \\$

And so:

(7) $E.2 = f.0 \: \uparrow f.1$

And once more:

$\quad E.(n+1) \\ =\quad\quad\quad {\text{(6)}} \\ \quad <\uparrow i : 0 \leq i \lt n + 1 : f.i> \\ =\quad\quad\quad {\text{split off i = n}} \\ \quad <\uparrow i : 0 \leq i \lt 1 : f.i> \: \uparrow f.n \\ =\quad\quad\quad {\text{(6)}} \\ \quad E.n \: \uparrow f.n \\$

Which gives us:

(8) $E.(n + 1) \equiv E.n \: \uparrow f.n \quad,\quad 2 \leq n \lt N - 1$

And now back to $D.n$

$\quad D.2 \\ =\quad\quad\quad {\text{(3)}} \\ \quad 10 \cdot E.2 + f.2 \\ =\quad\quad\quad {\text{(7)}} \\ \quad 10 \cdot (f.0 \: \uparrow f.1) + f.2 \\$

And similarly:

$\quad D.(n + 1) \\ =\quad\quad\quad {\text{(3)}} \\ \quad 10 \cdot E.(n + 1) + f.(n + 1) \\ =\quad\quad\quad {\text{(8)}} \\ \quad 10 \cdot (E.n \: \uparrow f.n) + f.(n + 1) \\$

So the last pieces of our model are:

(4) $D.2 = 10 \cdot (f.0 \: \uparrow f.1) + f.2$

(5) $D.(n + 1) \equiv 10 \cdot (E.n \: \uparrow f.n) + f.(n + 1) \quad,\quad 2 \leq n \lt N - 1$

Lovely. We now have everything we need to go and construct our program loop.

Program Loop

Let’s first rewrite our postcondition in terms of the theorems from our model.

$C.N$

We can then strengthen this postcondition by rewriting it like so:

$C.n \wedge n = N$

Strengthen is a funny name, but that’s all there is to it. We’re pulling $N$

Invariants: these are the things that are always true during the program
Variant: a measure of how much work there is left to do
Guard: a boolean check that lets you know when to break out of the loop; that is, when your variant has bottomed out because there is no more work left

By splitting our post-condition into two parts, we can use the first part as our invariant and the second as our loop guard.

Invariants

Let’s say we have a variable $r$

However, the definition of $C.(n+1)$

Let’s get some variables involved for those too:

$d = D.n$
$e = E.n$

So our invariants are:

(P0) $r = C.n \wedge d = D.n \wedge e = E.n$
(P1) $2 \leq n \leq N - 1$

Establish the invariants

We can “establish” our invariants by initialising our variables to values such that the equalities we defined as invariants are true. We know the values of $C.2$

$n := 2$
$r := C.2 := 10 * f.0 + f.1$
$d := D.2 := 10 * (f.0 \:\uparrow f.1) + f.2$
$e := E.2 := (f.0 \:\uparrow f.1)$

Loop Guard

Remember how I said we could use the $n =N$

This is because you can’t calculate $C.(N+1)$

And so, our loop actually can’t go all the way up to $n = N$

If you paid attention to the invariants bit, you’ll realise that this means we’ll only have $r = C.(N-1)$

Anyway, this is our loop guard:

$n \neq N - 1$

Variant

And our corresponding variant is:

$VF = (N - 1) - n$

When $VF$

Calculating the loop body

We’re starting off with $n = 2$

We don’t know what to set $r$

$\quad (n, r, d, e := n + 1, K_1, K_2, K_3).(P0) \\ =\quad\quad\quad\text{text substitution} \\ \quad K_1 = C.(n+1) \wedge K_2 = D.(n+1) \wedge K_3 = E.(n+1) \\ =\quad\quad\quad\text{(2), (5), (8)} \\ \quad K_1 = C.n \:\uparrow D.n \wedge K_2 = 10 \cdot (E.n \:\uparrow f.n) + f.(n+1) \wedge K_3 = E.n \:\uparrow f.n \\ =\quad\quad\quad\text{(P0)} \\ \quad K_1 = r \:\uparrow d \wedge K_2 = 10 \cdot (e \:\uparrow f.n) + f.(n+1) \wedge K_3 = e \:\uparrow f.n \\$

Now we know exactly how to update each variable within the loop. Home stretch now!

Writing our program

// establish invariants
{n, r, d, e := 2, 10 * f.0 + f.1, 10 * max(f.0, f.1) + f.2, max(f.0, f.1)}

// loop body
; do n != N - 1 ->
    n, r, d, e := n + 1, max(r, d), 10 * max(e, f.n) + f.(n + 1), max(e, f.n)
  od

// calculate C.N from C.(N - 1) and D.(N - 1)
; r := max(r, d)

// postcondition achieved!
{r = C.N}

The above program is written in Guarded Command Language, another invention of Dijkstra’s. Dijkstra was adamant that it never be implemented for a real computer:

“Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs.” ~ EWD1036

So let’s translate it to a real programming language. I’ve been solving AoC in Gleam so far. This presents a little challenge, as Gleam is a functional programming language, while GCL is imperative, so we can’t do a 1-to-1 translation. However, with a little bit of cleverness, this is what we get.

pub fn pt_1(input: List(List(Int))) {
  input
  |> list.map(fn(bank) {
    let assert [first, second, ..rest] = bank
    let assert Ok(third) = list.first(rest)

    let #(r, d, _) =
      rest
      |> list.window_by_2
      |> list.fold(
        #(
          10 * first + second,
          10 * int.max(first, second) + third,
          int.max(first, second),
        ),
        fn(acc, el) {
          let #(r, d, e) = acc
          let #(f_n, f_n_1) = el
          #(int.max(r, d), 10 * int.max(e, f_n) + f_n_1, int.max(e, f_n))
        },
      )

    int.max(r, d)
  })
  |> int.sum
}

Yeah, LOL. LMAO, even. Absolutely not. I’m not quantifying over 12 variables. It feels like it should theoretically be possible, but I don’t want to find out. I just did it in the most straightforward way possible.

First, BIG shout-out to Mr. Henry McLoughlin, who taught me Program Construction. He’s a simply lovely person and his enthusiasm for his subject is infectious. I don’t know if I’d have enjoyed the module as much if anyone else taught it. Henry, if you’re reading this, you’re awesome! Thank you so much.

I guessed what Part 2 would be as soon as I read Part 1, and so if I was aiming for speed, I should have just written do_pt_2 for the general case and reused it across both parts. I would probably have had an easier time of it too. However, 1. I wanted to use what I learned in class and 2. I had fun doing it this way. I think barely anyone else would have done it this way.

It has the advantage of being rigorously proved to work, but at the cost of being harder to understand at first glance. I can see how this is useful for high-stakes software where the extra expenditure of mental energy on making sure the code is 100% watertight is worth it, but this is probably not what I’d reach for during everyday programming. It did result in a very, very terse program though, which is super impressive.

The eagle-eyed among you might say that we don’t really need both d and e, in the loop, as d is derived from e. This is true. Program construction doesn’t always produce the most efficient programs. That is between the programmer and the compiler to figure out.

Oh, the title. Yes. That seemed like a lot of thinking, you might object. It probably was if you’re not familiar with Program Construction yet, but once you’ve derived a couple of these theorems, you’ll find that there is no thinking involved. Not in the sense that once you’re good at something, you can do it almost mechanically, but in the sense that there’s only one way this could have gone. Starting from that post-condition, the theorems we proved fall out automatically as we continue expanding our model, and the same can be said for our loop body. Program construction is really easy in that way, because all you’re doing is following the program derivation to its logical

end.