用 Go 和 C 实现 Forth
Implementing Forth in Go and C

原始链接: https://eli.thegreenplace.net/2025/implementing-forth-in-go-and-c/

## Forth:深入研究与实现之旅 作者重温了 Forth 语言,这是 20 年前在嵌入式硬件研究中首次接触的语言,并重新燃起兴趣,最终实现了两个版本。Forth 在两个层面运作:用户层面用于实际脚本编写,以及一个“黑客”层面,展现了其独特的、可自我扩展的本质——甚至控制流也是作为语言词定义的。 第一个实现 `goforth` 是一个纯 Go 解释器,可用但无法完全体现 Forth 的核心原则。第二个,`ctil`(用 C 编写),更接近经典的线程字典方法,允许像 `IF` 和 `THEN` 这样的核心 Forth 结构*在* Forth 内部实现,展示了其强大的元编程能力。 Forth 的设计源于早期计算的限制,提供了一种简洁、基于栈、拼接式的编程模型。虽然在历史上具有价值且具有启发性,但作者发现现代 Forth 具有挑战性,因为它缺乏显式的参数传递和返回值,使得代码比更传统的语言更难阅读和推理。尽管如此,实现 Forth 仍然是一次宝贵的学习经历,提供了对栈式机器、解释、编译和抽象的见解。 这两个实现以及测试框架都可在 [GitHub](链接未在文本中提供) 上找到。在项目期间,也发现了一些学习 Forth 的资源很有帮助。

## Go & C Forth 实现 - 摘要 一篇关于用 Go 和 C 实现 Forth 的博客文章引发了 Hacker News 的讨论。作者最初尝试用 Go 实现,但发现由于宿主语言控制执行,这限制了 Forth 的真正能力,例如在语言内部定义新的原始指令。这是因为代码没有以数据的形式存储,从而阻止了自举。 C 实现通过允许代码和数据驻留在同一内存空间中来解决这个问题,从而实现了标准的 Forth 功能。 许多评论者强调了在高级语言中复制 Forth 独特功能(特别是其基于堆栈的操作和对内存的直接控制)的挑战。 讨论还涉及 Forth 在固件(Mac 上的 Open Firmware/FCode)中的历史应用,以及最佳实践,例如最少的堆栈操作、大量的注释(堆栈效果)和分解代码以提高可读性。一个关键点是,真正的 Forth 系统需要能够自我扩展并与较低级别的机器指令紧密交互,这在不牺牲核心原则的情况下很难实现。
相关文章

原文

I first ran into Forth about 20 years ago when reading a book about designing embedded hardware. The reason I got the book back then was to actually learn more about the HW aspects, so having skimmed the Forth chapter I just registered an "oh, this is neat" mental note and moved on with my life. Over the last two decades I heard about Forth a few more times here and there, such as that time when Factor was talked about for a brief period, maybe 10-12 years ago or so.

It always occupied a slot in the "weird language" category inside my brain, and I never paid it much attention. Until June this year, when a couple of factors combined fortuitously:

And something clicked. I'm going to implement a Forth, because... why not?

So I spent much of my free hacking time over the past two months learning about Forth and implementing two of them.

Forth: the user level and the hacker level

It's useful to think of Forth (at least standard Forth, not offshoots like Factor) as having two different "levels":

  1. User level: you just want to use the language to write programs. Maybe you're indeed bringing up new hardware, and find Forth a useful calculator + REPL + script language. You don't care about Forth's implementation or its soul, you just want to complete your task.
  2. Hacker level: you're interested in the deeper soul of Forth. Isn't it amazing that even control flow constructs like IF...THEN or loops like BEGIN...UNTIL are just Forth words, and if you wanted, you could implement your own control flow constructs and have them be first-class citizens, as seamless and efficient as the standard ones?

Another way to look at it (useful if you belong to a certain crowd) is that user-level Forth is like Lisp without macros, and hacker-level Forth has macros enabled. Lisp can still be great and useful without macros, but macros take it to an entire new level and also unlock the deeper soul of the language.

This distinction will be important when discussing my Forth implementations below.

goforth and ctil

Logo of goforth

There's a certain way Forth is supposed to be implemented; this is how it was originally designed, and if you get closer to the hacker level, it becomes apparent that you're pretty much required to implement it this way - otherwise supporting all of the language's standard words will be very difficult. I'm talking about the classical approach of a linked dictionary, where a word is represented as a "threaded" list , and this dictionary is available for user code to augment and modify. Thus, much of the Forth implementation can be written in Forth itself.

The first implementation I tried is stubbornly different. Can we just make a pure interpreter? This is what goforth is trying to explore (the Go implementation located in the root directory of that repository). Many built-in words are supported - definitely enough to write useful programs - and compilation (the definition of new Forth words using : word ... ;) is implemented by storing the actual string following the word name in the dictionary, so it can be interpreted when the word is invoked.

This was an interesting approach and in some sense, it "works". For the user level of Forth, this is perfectly usable (albeit slow). However, it's insufficient for the hacker level, because the host language interpreter (the one in Go) has all the control, so it's impossible to implement IF...THEN in Forth, for example (it has to be implemented in the host language).

That was a fun way to get a deeper sense of what Forth is about, but I did want to implement the hacker level as well, so the second implementation - ctil - does just that. It's inspired by the jonesforth assembly implementation, but done in C instead .

ctil actually lets us implement major parts of Forth in Forth itself. For example, variable:

: variable create 1 cells allot ;

Conditionals:

\ IF, ELSE, THEN work together to compile to lower-level branches.
\
\ IF ... THEN compiles to:
\   0BRANCH OFFSET true-part rest
\ where OFFSET is the offset of rest
\
\ IF ... ELSE ... THEN compiles to :
\   0BRANCH OFFSET true-part BRANCH OFFSET2 false-part rest
\ where OFFSET is the offset of false-part and OFFSET2 is the offset of rest
: if immediate
  ' 0branch ,
  here
  0 ,
  ;

: then immediate
  dup
  here swap -
  swap
  ! ;

: else immediate
  ' branch ,
  here
  0 ,
  swap
  dup
  here swap -
  swap
  ! ;

These are actual examples of ctil's "prelude" - a Forth file loaded before any user code. If you understand Forth, this code is actually rather mind-blowing. We compile IF and the other words by directly laying our their low-level representation in memory, and different words communicate with each other using the data stack during compilation.

Thoughts on Forth itself

Forth made perfect sense in the historic context in which it was created in the early 1970s. Imagine having some HW connected to your computer (a telescope in the case of Forth's creator), and you have to interact with it. In terms of languages at your disposal - you don't have much, even BASIC wasn't invented yet. Perhaps your machine still didn't have a C compiler ported to it; C compilers aren't simple, and C isn't very great for exploratory scripting anyway. So you mostly just have your assembly language and whatever you build on top.

Forth is easy to implement in assembly and it gives you a much higher-level language; you can use it as a calculator, as a REPL, and as a DSL for pretty much anything due to its composable nature.

Forth certainly has interesting aspects; it's a concatenative language, and thus inherently point-free. A classical example is that instead of writing the following in a more traditional syntax:

eat(bake(prove(mix(ingredients))))

You just write this:

ingredients mix prove bake eat

There is no need to explicitly pass parameters, or to explicitly return results. Everything happens implicitly on the stack.

This is useful for REPL-style programming where you use your language not necessarily for writing large programs, but more for interactive instructions to various HW devices. This dearth of syntax is also what makes Forth simple to implement.

All that said, in my mind Forth is firmly in the "weird language" category; it's instructive to learn and to implement, but I wouldn't actually use it for anything real these days. The stack-based programming model is cool for very terse point-free programs, but it's not particularly readable and hard to reason about without extensive comments, in my experience.

Consider the implementation of a pretty standard Forth word: +!. It expects and address at the top of stack, and an addend below it. It adds the addend to the value stored at that address. Here's a Forth implementation from ctil's prelude:

: +!        ( addend addr -- )
  tuck      ( addr addend addr )
  @         ( addr addend value-at-addr )
  +         ( addr updated-value )
  swap      ( updated-value addr )
  ! ;

Look at that stack wrangling! It's really hard to follow what goes where without the detailed comments showing the stack layout on the right of each instruction (a common practice for Forth programs). Sure, we can create additional words that would make this simpler, but that just increases the lexicon of words to know.

My point is, there's fundamental difficulty here. When you see this C code:

int func(int a, int b) {
  return foo(a, bar(b));
}

Even without any documentation, you can immediately know several important things:

  • bar has one parameter and one return value
  • foo has two parameters and one return value
  • func also has two parameters and one return value
  • It's immediately obvious how the various values flow from one function call to the next.

Written in Forth :

How can you know the arity of the functions without adding explicit comments? Sure, if you have a handful of words like bar and foo you know like the back of your hand, this is easy. But imagine reading a large, unfamiliar code base full of code like this and trying to comprehend it.

联系我们 contact @ memedata.com