There's an old compiler-building tutorial that has become part of the field's lore: the Let's Build a Compiler series by Jack Crenshaw (published between 1988 and 1995).
I ran into it in 2003 and was very impressed, but it's now 2025 and this tutorial is still being mentioned quite often in Hacker News threads. Why is that? Why does a tutorial from 35 years ago, built in Pascal and emitting Motorola 68000 assembly - technologies that are virtually unknown for the new generation of programmers - hold sway over compiler enthusiasts? I've decided to find out.
The tutorial is easily available and readable online, but just re-reading it seemed insufficient. So I've decided on meticulously translating the compilers built in it to Python and emit a more modern target - WebAssembly. It was an enjoyable process and I want to share the outcome and some insights gained along the way.
The result is this code repository. Of particular interest is the TUTORIAL.md file, which describes how each part in the original tutorial is mapped to my code. So if you want to read the original tutorial but play with code you can actually easily try on your own, feel free to follow my path.
A sample
To get a taste of the input language being compiled and the output my compiler generates, here's a sample program in the KISS language designed by Jack Crenshaw:
var X=0
{ sum from 0 to n-1 inclusive, and add to result }
procedure addseq(n, ref result)
var i, sum { 0 initialized }
while i < n
sum = sum + i
i = i + 1
end
result = result + sum
end
program testprog
begin
addseq(11, X)
end
.
It's from part 13 of the tutorial, so it showcases procedures along with control constructs like the while loop, and passing parameters both by value and by reference. Here's the WASM text generated by my compiler for part 13:
(module
(memory 8)
;; Linear stack pointer. Used to pass parameters by ref.
;; Grows downwards (towards lower addresses).
(global $__sp (mut i32) (i32.const 65536))
(global $X (mut i32) (i32.const 0))
(func $ADDSEQ (param $N i32) (param $RESULT i32)
(local $I i32)
(local $SUM i32)
loop $loop1
block $breakloop1
local.get $I
local.get $N
i32.lt_s
i32.eqz
br_if $breakloop1
local.get $SUM
local.get $I
i32.add
local.set $SUM
local.get $I
i32.const 1
i32.add
local.set $I
br $loop1
end
end
local.get $RESULT
local.get $RESULT
i32.load
local.get $SUM
i32.add
i32.store
)
(func $main (export "main") (result i32)
i32.const 11
global.get $__sp ;; make space on stack
i32.const 4
i32.sub
global.set $__sp
global.get $__sp
global.get $X
i32.store
global.get $__sp ;; push address as parameter
call $ADDSEQ
;; restore parameter X by ref
global.get $__sp
i32.load offset=0
global.set $X
;; clean up stack for ref parameters
global.get $__sp
i32.const 4
i32.add
global.set $__sp
global.get $X
)
)
You'll notice that there is some trickiness in the emitted code w.r.t. handling the by-reference parameter (my previous post deals with this issue in more detail). In general, though, the emitted code is inefficient - there is close to 0 optimization applied.
Also, if you're very diligent you'll notice something odd about the global variable X - it seems to be implicitly returned by the generated main function. This is just a testing facility that makes my compiler easy to test. All the compilers are extensively tested - usually by running the generated WASM code and verifying expected results.
Insights - what makes this tutorial so special?
While reading the original tutorial again, I had on opportunity to reminisce on what makes it so effective. Other than the very fluent and conversational writing style of Jack Crenshaw, I think it's a combination of two key factors:
- The tutorial builds a recursive-descent parser step by step, rather than giving a long preface on automata and table-based parser generators. When I first encountered it (in 2003), it was taken for granted that if you want to write a parser then lex + yacc are the way to go . Following the development of a simple and clean hand-written parser was a revelation that wholly changed my approach to the subject; subsequently, hand-written recursive-descent parsers have been my go-to approach for almost 20 years now.
- Rather than getting stuck in front-end minutiae, the tutorial goes straight to generating working assembly code, from very early on. This was also a breath of fresh air for engineers who grew up with more traditional courses where you spend 90% of the time on parsing, type checking and other semantic analysis and often run entirely out of steam by the time code generation is taught.
To be honest, I don't think either of these are a big problem with modern resources, but back in the day the tutorial clearly hit the right nerve with many people.
What else does it teach us?
Jack Crenshaw's tutorial takes the syntax-directed translation approach, where code is emitted while parsing, without having to divide the compiler into explicit phases with IRs. As I said above, this is a fantastic approach for getting started, but in the latter parts of the tutorial it starts showing its limitations. Especially once we get to types, it becomes painfully obvious that it would be very nice if we knew the types of expressions before we generate code for them.
I don't know if this is implicated in Jack Crenshaw's abandoning the tutorial at some point after part 14, but it may very well be. He keeps writing how the emitted code is clearly sub-optimal and can be improved, but IMHO it's just not that easy to improve using the syntax-directed translation strategy. With perfect hindsight vision, I would probably use Part 14 (types) as a turning point - emitting some kind of AST from the parser and then doing simple type checking and analysis on that AST prior to generating code from it.
Conclusion
All in all, the original tutorial remains a wonderfully readable introduction to building compilers. This post and the GitHub repository it describes are a modest contribution that aims to improve the experience of folks reading the original tutorial today and not willing to use obsolete technologies. As always, let me know if you run into any issues or have questions!