形式化验证 Advent of Code 使用 Dijkstra 程序构造
Formally verifying Advent of Code using Dijkstra's program construction

原始链接: https://haripm.com/blog/aoc-day-3-without-thinking/

这篇帖子详细介绍了作者使用UCD程序构造模块中学到的原理,以及埃兹赫尔·W·戴克斯特拉的结构化编程思想来解决最近的Advent of Code问题的方法。核心思想是形式化程序验证——定义前置条件和后置条件,并推导定理来证明正确性。 作者使用量化符号细致地构建了一个“领域模型”,以表示问题的逻辑。这包括定义函数(如`C.n`)并通过定理建立基本情况和递归关系。这个过程展示了如何将问题分解为更小、可证明的步骤。 虽然这种形式化方法可以得到经过严格验证的解决方案,但将其翻译成Gleam等实用语言时,会导致代码复杂且可读性降低。作者承认这种权衡,指出该方法对于绝对正确性至关重要的关键软件具有价值,尽管它不适用于日常编程。最终,这项练习是对理论概念应用于实际编码挑战的有效实践。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 正式验证 Advent of Code 使用 Dijkstra 的程序构造 (haripm.com) 7 分,来自 seafoamteal 2 小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

I’m doing Advent of Code again this year, and part 1 of today’s problem reminded me immediately of some of the problems I’m doing in my Program Construction module at UCD. In the class, we cover the foundations of Edsger W. Dijsktra’s Structured Programming. It teaches you how to formally verify your program by finding the pre-conditions and post-conditions, then deriving and proving theorems that build up towards the final program.

It’s a very different style of thinking about programming than most people are used to, but I’ll try my best to explain the notation and the logic I’m using.

We start out by writing our post-condition. This is what we want to be true once our program has finished running — in other words, it’s what we want to calculate. We’re going to use this funky-looking syntax called Quantified Notation.

As an intro, here’s a basic quantified expression:

<+  i:0i<N:f.i><+\;i : 0 \leq i < N: f.i>

f.if.i is the syntax we’ll use for accessing the ithi^{th}

f.0+f.1+f.2+f.3++f.(N1)f.0 + f.1 + f.2 + f.3 + \dotsb + f.(N - 1)

For those of you more familiar with functional programming, you’ll find that this is just a reduction over a list. ff is the list in question and ++ is the operation we want to use to combine each element with the accumulator. However, program construction is designed around an imperative language, and so we need an index variable ii to keep track of our position in the array. We also have to specify the range of ii, which is from 00 to the length of the array.

With that exposition out of the way, here’s the postcondition for our AoC problem (for a single bank of batteries). You should go read the problem statement over on the AoC website for this to make sense :)

<i,j:0i<j<N:10f.i+f.j><\uparrow i, j : 0 \leq i < j < N : 10 \cdot f.i + f.j>

This is a quantification over two variables: ii and jj. It’s essentially the same as before, but when we reduce using the max operator (\uparrow), we have to reduce over 10f.i+f.j10 \cdot f.i + f.j

Now we get to build our domain model. It’s the collection of definitions and theorems that we can use later on when constructing our program.

Model

(0) C.n<i,j:0i<j<n:10f.i+f.j>,2nNC.n \equiv <\uparrow i, j : 0 \leq i < j < n : 10 \cdot f.i + f.j> \quad,\quad 2 \leq n \leq N

We extract our post-condition into a reusable function defined over all nn from 22 — the minimum valid length of an array for this calculation — to NN, the actual length of our array.

Now observe:

C.2=(0)<i,j:0i<j<2:10f.i+f.j>=split off j = 1<i,j:0i<j<1:10f.i+f.j>    <i:0i<1:10f.i+f.1>=empty range, single pointid    10f.0+f.1=arithmetic10f.0+f.1\quad C.2 \\ =\quad\quad\quad {\text{(0)}} \\ \quad <\uparrow i, j : 0 \leq i < j < 2 : 10 \cdot f.i + f.j> \\ =\quad\quad\quad {\text{split off j = 1}} \\ \quad <\uparrow i, j : 0 \leq i < j < 1 : 10 \cdot f.i + f.j> \; \uparrow \; <\uparrow i : 0 \leq i \lt 1 : 10 \cdot f.i + f.1>\\ =\quad\quad\quad {\text{empty range, single point}} \\ \quad id_\uparrow \; \uparrow \; 10 \cdot f.0 + f.1 \\ =\quad\quad\quad {\text{arithmetic}} \\ \quad 10 \cdot f.0 + f.1 \\

So now we know that:

(1) C.2=f.0+f.1C.2 = f.0 + f.1

Now that we have our “base case”, or the initial value of our accumulator, we can look into using associativity to find C.(n+1)C.(n+1)

Observe again:

C.(n+1)=(0)<i,j:0i<j<n+1:10f.i+f.j>=split off j = n<i,j:0i<j<n:10f.i+f.j>    <i:0i<n:10f.i+f.n>=(0)C.n    <i:0i<n:10f.i+f.n>=(3)C.n    D.n\quad C.(n+1) \\ =\quad\quad\quad {\text{(0)}} \\ \quad <\uparrow i, j : 0 \leq i < j < n + 1 : 10 \cdot f.i + f.j> \\ =\quad\quad\quad {\text{split off j = n}} \\ \quad <\uparrow i, j : 0 \leq i < j < n : 10 \cdot f.i + f.j> \; \uparrow \; <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n>\\ =\quad\quad\quad {\text{(0)}} \\ \quad C.n \; \uparrow \; <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n>\\ =\quad\quad\quad {\text{(3)}} \\ \quad C.n \; \uparrow \; D.n\\

This gives us

(2) C.(n+1)C.n+D.n,2n<NC.(n+1) \equiv C.n + D.n \quad,\quad 2 \leq n \lt N

and

(3) D.n<i:0i<n:10f.i+f.n>,2n<ND.n \equiv <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n> \quad,\quad 2 \leq n \lt N

Did you see how I sneakily used (3) up there before defining it properly? Let’s actually simplify D.nD.n.

Observe once more:

D.n=(3)<i:0i<n:10f.i+f.n>=+distributes over<i:0i<n:10f.i>+f.n=distributes over10<i:0i<n:f.i>+f.n=(6)10E.n+f.n\quad D.n \\ =\quad\quad\quad {\text{(3)}} \\ \quad <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i + f.n> \\ =\quad\quad\quad {+ \: \text{distributes over} \: \uparrow} \\ \quad <\uparrow i : 0 \leq i \lt n : 10 \cdot f.i> + f.n \\ =\quad\quad\quad {* \: \text{distributes over} \: \uparrow} \\ \quad 10 \: \cdot <\uparrow i : 0 \leq i \lt n : f.i> + f.n \\ =\quad\quad\quad {\text{(6)}} \\ \quad 10 \: \cdot E.n + f.n \\

So:

(3) D.n10E.n+f.n,2n<ND.n \equiv 10 \cdot E.n + f.n \quad,\quad 2 \leq n \lt N

And:

(6) E.n<i:0i<n:f.i>,2n<NE.n \equiv \: <\uparrow i : 0 \leq i \lt n : f.i> \quad,\quad 2 \leq n \lt N

What happened to (4) and (5), you ask? Well, we have to derive the base case and associative case for D.nD.n just like we did for C.nC.n. We’ll need our E.nE.n theorems first, though.

Observe:

E.2=(6)<i:0i<2:f.i>=split off i = 1<i:0i<1:f.i>f.1=single pointf.0f.1\quad E.2 \\ =\quad\quad\quad {\text{(6)}} \\ \quad <\uparrow i : 0 \leq i \lt 2 : f.i> \\ =\quad\quad\quad {\text{split off i = 1}} \\ \quad <\uparrow i : 0 \leq i \lt 1 : f.i> \: \uparrow f.1 \\ =\quad\quad\quad {\text{single point}} \\ \quad f.0 \: \uparrow f.1 \\

And so:

(7) E.2=f.0f.1E.2 = f.0 \: \uparrow f.1

And once more:

E.(n+1)=(6)<i:0i<n+1:f.i>=split off i = n<i:0i<1:f.i>f.n=(6)E.nf.n\quad E.(n+1) \\ =\quad\quad\quad {\text{(6)}} \\ \quad <\uparrow i : 0 \leq i \lt n + 1 : f.i> \\ =\quad\quad\quad {\text{split off i = n}} \\ \quad <\uparrow i : 0 \leq i \lt 1 : f.i> \: \uparrow f.n \\ =\quad\quad\quad {\text{(6)}} \\ \quad E.n \: \uparrow f.n \\

Which gives us:

(8) E.(n+1)E.nf.n,2n<N1E.(n + 1) \equiv E.n \: \uparrow f.n \quad,\quad 2 \leq n \lt N - 1

And now back to D.nD.n. You know the drill, observe:

D.2=(3)10E.2+f.2=(7)10(f.0f.1)+f.2\quad D.2 \\ =\quad\quad\quad {\text{(3)}} \\ \quad 10 \cdot E.2 + f.2 \\ =\quad\quad\quad {\text{(7)}} \\ \quad 10 \cdot (f.0 \: \uparrow f.1) + f.2 \\

And similarly:

D.(n+1)=(3)10E.(n+1)+f.(n+1)=(8)10(E.nf.n)+f.(n+1)\quad D.(n + 1) \\ =\quad\quad\quad {\text{(3)}} \\ \quad 10 \cdot E.(n + 1) + f.(n + 1) \\ =\quad\quad\quad {\text{(8)}} \\ \quad 10 \cdot (E.n \: \uparrow f.n) + f.(n + 1) \\

So the last pieces of our model are:

(4) D.2=10(f.0f.1)+f.2D.2 = 10 \cdot (f.0 \: \uparrow f.1) + f.2

(5) D.(n+1)10(E.nf.n)+f.(n+1),2n<N1D.(n + 1) \equiv 10 \cdot (E.n \: \uparrow f.n) + f.(n + 1) \quad,\quad 2 \leq n \lt N - 1

Lovely. We now have everything we need to go and construct our program loop.

Program Loop

Let’s first rewrite our postcondition in terms of the theorems from our model.

C.NC.N

We can then strengthen this postcondition by rewriting it like so:

C.nn=NC.n \wedge n = N

Strengthen is a funny name, but that’s all there is to it. We’re pulling NN out of C.NC.N. Why? Because every loop has 3 fundamental things:

  • Invariants: these are the things that are always true during the program
  • Variant: a measure of how much work there is left to do
  • Guard: a boolean check that lets you know when to break out of the loop; that is, when your variant has bottomed out because there is no more work left

By splitting our post-condition into two parts, we can use the first part as our invariant and the second as our loop guard.

Invariants

Let’s say we have a variable rr, and rr is always equal to C.nC.n, for whatever value nn is at the moment. This is an invariant.

However, the definition of C.(n+1)C.(n+1)

Let’s get some variables involved for those too:

  • d=D.nd = D.n
  • e=E.ne = E.n

So our invariants are:

  • (P0) r=C.nd=D.ne=E.nr = C.n \wedge d = D.n \wedge e = E.n
  • (P1) 2nN12 \leq n \leq N - 1

Establish the invariants

We can “establish” our invariants by initialising our variables to values such that the equalities we defined as invariants are true. We know the values of C.2C.2, D.2D.2, and E.2E.2 from when we derived them. So, let’s set n=2n = 2

  • n:=2n := 2
  • r:=C.2:=10f.0+f.1r := C.2 := 10 * f.0 + f.1
  • d:=D.2:=10(f.0f.1)+f.2d := D.2 := 10 * (f.0 \:\uparrow f.1) + f.2
  • e:=E.2:=(f.0f.1)e := E.2 := (f.0 \:\uparrow f.1)

Loop Guard

Remember how I said we could use the n=Nn =N

This is because you can’t calculate C.(N+1)C.(N+1)

And so, our loop actually can’t go all the way up to n=Nn = N

If you paid attention to the invariants bit, you’ll realise that this means we’ll only have r=C.(N1)r = C.(N-1)

Anyway, this is our loop guard:

nN1n \neq N - 1

Variant

And our corresponding variant is:

VF=(N1)nVF = (N - 1) - n

When VFVF becomes 0, we exit the loop.

Calculating the loop body

We’re starting off with n=2n = 2

We don’t know what to set rr, dd, and ee to, but we can find out. Let’s set them to some temporary variables K1,K2,K3K_1, K_2, K_3

(n,r,d,e:=n+1,K1,K2,K3).(P0)=text substitutionK1=C.(n+1)K2=D.(n+1)K3=E.(n+1)=(2), (5), (8)K1=C.nD.nK2=10(E.nf.n)+f.(n+1)K3=E.nf.n=(P0)K1=rdK2=10(ef.n)+f.(n+1)K3=ef.n\quad (n, r, d, e := n + 1, K_1, K_2, K_3).(P0) \\ =\quad\quad\quad\text{text substitution} \\ \quad K_1 = C.(n+1) \wedge K_2 = D.(n+1) \wedge K_3 = E.(n+1) \\ =\quad\quad\quad\text{(2), (5), (8)} \\ \quad K_1 = C.n \:\uparrow D.n \wedge K_2 = 10 \cdot (E.n \:\uparrow f.n) + f.(n+1) \wedge K_3 = E.n \:\uparrow f.n \\ =\quad\quad\quad\text{(P0)} \\ \quad K_1 = r \:\uparrow d \wedge K_2 = 10 \cdot (e \:\uparrow f.n) + f.(n+1) \wedge K_3 = e \:\uparrow f.n \\

Now we know exactly how to update each variable within the loop. Home stretch now!

Writing our program

// establish invariants
{n, r, d, e := 2, 10 * f.0 + f.1, 10 * max(f.0, f.1) + f.2, max(f.0, f.1)}

// loop body
; do n != N - 1 ->
    n, r, d, e := n + 1, max(r, d), 10 * max(e, f.n) + f.(n + 1), max(e, f.n)
  od

// calculate C.N from C.(N - 1) and D.(N - 1)
; r := max(r, d)

// postcondition achieved!
{r = C.N}

The above program is written in Guarded Command Language, another invention of Dijkstra’s. Dijkstra was adamant that it never be implemented for a real computer:

“Finally, in order to drive home the message that this introductory programming course is primarily a course in formal mathematics, we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs.” ~ EWD1036

So let’s translate it to a real programming language. I’ve been solving AoC in Gleam so far. This presents a little challenge, as Gleam is a functional programming language, while GCL is imperative, so we can’t do a 1-to-1 translation. However, with a little bit of cleverness, this is what we get.

pub fn pt_1(input: List(List(Int))) {
  input
  |> list.map(fn(bank) {
    let assert [first, second, ..rest] = bank
    let assert Ok(third) = list.first(rest)

    let #(r, d, _) =
      rest
      |> list.window_by_2
      |> list.fold(
        #(
          10 * first + second,
          10 * int.max(first, second) + third,
          int.max(first, second),
        ),
        fn(acc, el) {
          let #(r, d, e) = acc
          let #(f_n, f_n_1) = el
          #(int.max(r, d), 10 * int.max(e, f_n) + f_n_1, int.max(e, f_n))
        },
      )

    int.max(r, d)
  })
  |> int.sum
}

Yeah, LOL. LMAO, even. Absolutely not. I’m not quantifying over 12 variables. It feels like it should theoretically be possible, but I don’t want to find out. I just did it in the most straightforward way possible.

First, BIG shout-out to Mr. Henry McLoughlin, who taught me Program Construction. He’s a simply lovely person and his enthusiasm for his subject is infectious. I don’t know if I’d have enjoyed the module as much if anyone else taught it. Henry, if you’re reading this, you’re awesome! Thank you so much.

I guessed what Part 2 would be as soon as I read Part 1, and so if I was aiming for speed, I should have just written do_pt_2 for the general case and reused it across both parts. I would probably have had an easier time of it too. However, 1. I wanted to use what I learned in class and 2. I had fun doing it this way. I think barely anyone else would have done it this way.

It has the advantage of being rigorously proved to work, but at the cost of being harder to understand at first glance. I can see how this is useful for high-stakes software where the extra expenditure of mental energy on making sure the code is 100% watertight is worth it, but this is probably not what I’d reach for during everyday programming. It did result in a very, very terse program though, which is super impressive.

The eagle-eyed among you might say that we don’t really need both d and e, in the loop, as d is derived from e. This is true. Program construction doesn’t always produce the most efficient programs. That is between the programmer and the compiler to figure out.

Oh, the title. Yes. That seemed like a lot of thinking, you might object. It probably was if you’re not familiar with Program Construction yet, but once you’ve derived a couple of these theorems, you’ll find that there is no thinking involved. Not in the sense that once you’re good at something, you can do it almost mechanically, but in the sense that there’s only one way this could have gone. Starting from that post-condition, the theorems we proved fall out automatically as we continue expanding our model, and the same can be said for our loop body. Program construction is really easy in that way, because all you’re doing is following the program derivation to its logical

end.

联系我们 contact @ memedata.com