将Scheme编译为WebAssembly
Compiling Scheme to WebAssembly

原始链接: https://eli.thegreenplace.net/2026/compiling-scheme-to-webassembly/

## Bob 项目更新:Scheme 到 WebAssembly 开源项目 Bob,一套用 Python 实现的 Scheme 语言工具集,最近庆祝了它的 15 周年。最初创建 Bob 是为了探索字节码虚拟机,现在它包含了 Python 和 C++ 中的解释器、编译器和虚拟机。开发者最近添加了一个新的编译器,能够将 Scheme 直接编译到 WebAssembly (WASM)。 此举旨在解决将具有垃圾回收和闭包等特性的高级语言编译到 WASM 的复杂性,并获得使用 WASM GC 扩展的实践经验。该项目成功地在 WASM 环境中表示 Scheme 对象,如对、布尔值和符号,利用 WASM GC 进行内存管理,并使用线性内存进行字符串表示。 主要挑战包括实现 `write` 函数,以便在 WASM 文本中直接输出 Scheme 值,最终借助了 AI 的帮助。WASM 编译器依赖于两个宿主函数进行基本输出,保持核心逻辑自包含。生成的编译器大约有 1000 行代码(其中很大一部分是生成的 WASM 文本),提供了一个关于实际代码发射到 WASM 的良好示例。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 将 Scheme 编译到 WebAssembly (thegreenplace.net) 12 分,由 chmaynard 1小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 nhatcher 47分钟前 [–] Eli Bendersky 的文章总是富有洞察力且有趣。我真的很想看到一种小型语言,可以编译到 wasm 在浏览器中运行。当然,你可以使用像 Lua 这样的语言,它也有自己的 vm 在 wasm 中。或者 Rhai 及其解释器。但我正在寻找一种编译到 wasm 的语言,wasm 体积小于 1Mb。 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

One of my oldest open-source projects - Bob - has celebrated 15 a couple of months ago. Bob is a suite of implementations of the Scheme programming language in Python, including an interpreter, a compiler and a VM. Back then I was doing some hacking on CPython internals and was very curious about how CPython-like bytecode VMs work; Bob was an experiment to find out, by implementing one from scratch for R5RS Scheme.

Several months later I added a C++ VM to Bob, as an exercise to learn how such VMs are implemented in a low-level language without all the runtime support Python provides; most importantly, without the built-in GC. The C++ VM in Bob implements its own mark-and-sweep GC.

After many quiet years (with just a sprinkling of cosmetic changes, porting to GitHub, updates to Python 3, etc), I felt the itch to work on Bob again just before the holidays. Specifically, I decided to add another compiler to the suite - this one from Scheme directly to WebAssembly.

The goals of this effort were two-fold:

  1. Experiment with lowering a real, high-level language like Scheme to WebAssembly. Experiments like the recent Let's Build a Compiler compile toy languages that are at the C level (no runtime). Scheme has built-in data structures, lexical closures, garbage collection, etc. It's much more challenging.
  2. Get some hands-on experience with the WASM GC extension . I have several samples of using WASM GC in the wasm-wat-samples repository, but I really wanted to try it for something "real".

Well, it's done now; here's an updated schematic of the Bob project:

Bob project diagram with all the components it includes

The new part is the rightmost vertical path. A WasmCompiler class lowers parsed Scheme expressions all the way down to WebAssembly text, which can then be compiled to a binary and executed using standard WASM tools .

Highlights

The most interesting aspect of this project was working with WASM GC to represent Scheme objects. As long as we properly box/wrap all values in refs, the underlying WASM execution environment will take care of the memory management.

For Bob, here's how some key Scheme objects are represented:

;; PAIR holds the car and cdr of a cons cell.
(type $PAIR (struct (field (mut (ref null eq))) (field (mut (ref null eq)))))

;; BOOL represents a Scheme boolean. zero -> false, nonzero -> true.
(type $BOOL (struct (field i32)))

;; SYMBOL represents a Scheme symbol. It holds an offset in linear memory
;; and the length of the symbol name.
(type $SYMBOL (struct (field i32) (field i32)))

$PAIR is of particular interest, as it may contain arbitrary objects in its fields; (ref null eq) means "a nullable reference to something that has identity". ref.test can be used to check - for a given reference - the run-time type of the value it refers to.

You may wonder - what about numeric values? Here WASM has a trick - the i31 type can be used to represent a reference to an integer, but without actually boxing it (one bit is used to distinguish such an object from a real reference). So we don't need a separate type to hold references to numbers.

Also, the $SYMBOL type looks unusual - how is it represented with two numbers? The key to the mystery is that WASM has no built-in support for strings; they should be implemented manually using offsets to linear memory. The Bob WASM compiler emits the string values of all symbols encountered into linear memory, keeping track of the offset and length of each one; these are the two numbers placed in $SYMBOL. This also allows to fairly easily implement the string interning feature of Scheme; multiple instances of the same symbol will only be allocated once.

Consider this trivial Scheme snippet:

The compiler emits the symbols "foo" and "bar" into linear memory as follows :

(data (i32.const 2048) "foo")
(data (i32.const 2051) "bar")

And looking for one of these addresses in the rest of the emitted code, we'll find:

(struct.new $SYMBOL (i32.const 2051) (i32.const 3))

As part of the code for constructing the constant cons list representing the argument to write; address 2051 and length 3: this is the symbol bar.

Speaking of write, implementing this builtin was quite interesting. For compatibility with the other Bob implementations in my repository, write needs to be able to print recursive representations of arbitrary Scheme values, including lists, symbols, etc.

Initially I was reluctant to implement all of this functionality by hand in WASM text, but all alternatives ran into challenges:

  1. Deferring this to the host is difficult because the host environment has no access to WASM GC references - they are completely opaque.
  2. Implementing it in another language (maybe C?) and lowering to WASM is also challenging for a similar reason - the other language is unlikely to have a good representation of WASM GC objects.

So I bit the bullet and - with some AI help for the tedious parts - just wrote an implementation of write directly in WASM text; it wasn't really that bad. I import only two functions from the host:

(import "env" "write_char" (func $write_char (param i32)))
(import "env" "write_i32" (func $write_i32 (param i32)))

Though emitting integers directly from WASM isn't hard, I figured this project already has enough code and some host help here would be welcome. For all the rest, only the lowest level write_char is used. For example, here's how booleans are emitted in the canonical Scheme notation (#t and #f):

(func $emit_bool (param $b (ref $BOOL))
    (call $emit (i32.const 35)) ;; '#'
    (if (i32.eqz (struct.get $BOOL 0 (local.get $b)))
        (then (call $emit (i32.const 102))) ;; 'f'
        (else (call $emit (i32.const 116))) ;; 't'
    )
)

Conclusion

This was a really fun project, and I learned quite a bit about realistic code emission to WASM. Feel free to check out the source code of WasmCompiler - it's very well documented. While it's a bit over 1000 LOC in total , more than half of that is actually WASM text snippets that implement the builtin types and functions needed by a basic Scheme implementation.


联系我们 contact @ memedata.com