大型语言模型可以编写一种语言解释器。

大型语言模型可以编写一种语言解释器。
FAWK: LLMs can write a language interpreter

原始链接: https://martin.janiczek.cz/2025/11/21/fawk-llms-can-write-a-language-interpreter.html

受Advent of Code的启发，作者发现自己更偏好函数式编程原则——不变性、词法作用域和函数返回数组，而AWK缺乏这些特性。试图将函数式方法强加于AWK以解决复杂问题，很快就暴露了该语言的局限性和意外复杂性。这引发了一个思考实验：如果AWK融入现代特性，例如一流数组、函数/lambda、词法作用域和管道，会怎样？然后，作者利用大型语言模型（具体为Cursor Agent与Sonnet 4.5）*构建*了一种名为“FAWK”的新语言，基于这些想法。令人惊讶的是，LLM成功创建了一个带有大量测试的Python解释器，甚至处理了复杂的特性，如打印重定向和多维数组。虽然承认对生成的代码缺乏深入理解，作者计划在未来的Advent of Code挑战中使用FAWK，并探索LLM加速编程语言实现的可能性，即使这会将开发者的角色转变为代码审查和完善。FAWK解释器可在GitHub上找到。

一个黑客新闻的讨论集中在大型语言模型（LLM）编写语言解释器的能力上。一篇文章突出了一篇博客文章（janiczek.cz），表明LLM可以做到这一点，引发了具有相关经验的用户们的回复。一位用户使用`awk`成功实现了2021年Advent of Code的前十天，发现这具有挑战性。另一位用户多年来一直在尝试使用LLM创建“hashlangs”（小型语言），包括C和Racket，直到最近才取得有限的成功。第三位用户正在利用Copilot和Antigravity构建一种新的语言，旨在实现安全性和性能，并将其降低到机器代码。一位评论者表达了对可运行代码演示的渴望，而不是仅仅讨论概念。这场对话凸显了人们对LLM在语言创建方面的潜力日益增长的乐观情绪，以及其中涉及的实际挑战。

原文

After reading the book The AWK Programming Language (recommended!), I was planning to try AWK out on this year’s Advent of Code. Having some time off from work this week, I tried to implement one of the problems in it to get some practice, set up my tooling, see how hard AWK would be, and… I found I’m FP-pilled.

I knew I’m addicted to the combination of algebraic data types (tagged unions) and exhaustive pattern matching, but what got me this time was immutability, lexical scope and the basic human right of being allowed to return arrays from functions.

Part 1 of the Advent of Code problem was easy enough, but for part 2 (basically a shortest path search with a twist, to not spoil too much), I found myself unable to switch from my usual functional BFS approach to something mutable, and ended up trying to implement my functional approach in AWK.

It got hairy very fast: I needed to implement:

hashing of strings and 2D arrays (by piping to md5sum)
a global ~~set~~ array of seen states
a way to serialize and deserialize a 2D array to/from a string
and a few associative arrays for retrieving this serialized array by its hash.

I was very lost by the time I had all this; I spent hours just solving what felt like accidental complexity; things that I’d take for granted in more modern languages.

Now, I know nobody said AWK is modern, or functional, or that it promises any convenience for anything other than one-liners and basic scripts that fit under a handful of lines. I don’t want to sound like I expect AWK to do any of this; I knew I was stretching the tool when going in. But I couldn’t shake the feeling that there’s a beautiful AWK-like language within reach, an iteration on the AWK design (the pattern-action way of thinking is beautiful) that also gives us a few of the things programming language designers have learnt over the 48 years since AWK was born.

Dreaming of functional AWK

Stopping my attempts to solve the AoC puzzle in pure AWK, I wondered: what am I missing here?

What if AWK had first-class arrays?

BEGIN {
  # array literals
  normal   = [1, 2, 3]
  nested   = [[1,2], [3,4]]
  assoc    = ["foo" => "bar", "baz" => "quux"]
  multidim = [(1,"abc") => 999]

  five = range(1,5)
  analyze(five)
  print five  # --> still [1, 2, 3, 4, 5]! was passed by value
}

function range(a,b) {
  r = []
  for (i = a; i <= b; i++) {
    r[length(r)] = i
  }
  return r  # arrays can be returned!
}

function analyze(arr) {
  arr[0] = 100
  print arr[0]  # --> 100, only within this function
}

What if AWK had first-class functions and lambdas?

BEGIN {
  # construct anonymous functions
  double = (x) => { x * 2 }
  add = (a, b) => { c = a + b; return c }

  # functions can be passed as values
  apply = (func, value) => { func(value) }

  print apply(double,add(1,3))  # --> 8
  print apply(inc,5)  # --> 6
}

function inc(a) { return a + 1 }

What if AWK had lexical scope instead of dynamic scope?

# No need for this hack anymore ↓     ↓
#function foo(a, b         ,local1, local2) {
function foo(a, b) {
  local1 = a + b
  local2 = a - b
  return local1 + local2
}

BEGIN {
  c = foo(1,2)
  print(local1)  # --> 0, the local1 from foo() didn't leak!
}

What if AWK had explicit globals, and everything else was local by default?

BEGIN { global count }
END {
  foo()
  print count  # --> 1
  print mylocal # --> 0, didn't leak
}
function foo() { count++; mylocal++ }

(This one, admittedly, might make programs a bit more verbose. I’m willing to pay that cost.)

What if AWK had pipelines? (OK, now I’m reaching for syntax sugar…)

BEGIN {
  result = [1, 2, 3, 4, 5] 
      |> filter((x) => { x % 2 == 0 })
      |> map((x) => { x * x })
      |> reduce((acc, x) => { acc + x }, 0)

  print "Result:", result
}

Making it happen

TL;DR: Janiczek/fawk on GitHub

Now for the crazy, LLM-related part of the post. I didn’t want to spend days implementing AWK from scratch or tweaking somebody else’s implementation. So I tried to use Cursor Agent for a larger task than I usually do (I tend to ask for very small targeted edits), and asked Sonnet 4.5 for a README with code examples, and then a full implementation in Python.

And it did it.

Note: I also asked for implementations in C, Haskell and Rust at the same time, not knowing if any of the four would succeed, and they all seem to have produced code that at least compiles/runs. I haven’t tried to test them or even run them though. The PRs are here.

I was very impressed—I still am! I expected the LLM to stumble and flail around and ultimately get nothing done, but it did what I asked it for (gave me an interpreter that could run those specific examples), and over the course of a few chat sessions, I guided it towards implementing more and more of “the rest of AWK”, together with an excessive amount of end-to-end tests.

Take a look at those tests!

The only time I could see it struggle was when I asked it to implement arbitrary precision floating point operations without using an external library like mpmath. It attempted to use Taylor series, but couldn’t get it right for at least a few minutes. I chickened out and told it to uv add mpmath and simplify the interpreter code. In a moment it was done.

Other things that I thought it would choke on, like print being both a statement (with > and >> redirection support) and an expression, or multi-dimensional arrays, or multi-line records, these were all implemented correctly. Updating the test suite to also check for backwards compatibility with GAWK - not an issue. Lexical scoping and tricky closure environment behaviour - handled that just fine.

What now?

As the cool kids say, I have to update my priors. The frontier of what the LLMs can do has moved since the last time I tried to vibe-code something. I didn’t expect to have a working interpreter the same day I dreamt of a new programming language. It now seems possible.

The downside of vibe coding the whole interpreter is that I have zero knowledge of the code. I only interacted with the agent by telling it to implement a thing and write tests for it, and I only really reviewed the tests. I reckon this would be an issue in the future when I want to manually make some change in the actual code, because I have no familiarity with it.

This also opened new questions for me wrt. my other projects where I’ve previously run out of steam, eg. trying to implement a Hindley-Milner type system for my dream forever-WIP programming language Cara. It seems I can now just ask the LLM to do it, and it will? But then, I don’t want to fall into the trap where I am no longer able to work on the codebase myself. I want to be familiar with and able to tinker on the code. I’d need to spend my time reviewing and reading code instead of writing everything myself. Perhaps that’s OK.

Performance of FAWK might be an issue as well, though right now it’s a non-goal, given my intended use case is throwaway scripts for Advent of Code, nothing user-facing. And who knows, based on what I’ve seen, maybe I can instruct it to rewrite it in Rust and have a decent chance of success?

For now, I’ll go dogfood my shiny new vibe-coded black box of a programming language on the Advent of Code problem (and as many of the 2025 puzzles as I can), and see what rough edges I can find. I expect them to be equal parts “not implemented yet” and “unexpected interactions of new PL features with the old ones”.

If you’re willing to jump through some Python project dependency hoops, you can try to use FAWK too at your own risk, at Janiczek/fawk on GitHub.