为Plush编程语言构建一个复制垃圾回收器

为Plush编程语言构建一个复制垃圾回收器
Building a Copying GC for the Plush Programming Language

原始链接: https://pointersgonewild.com/2025-11-29-building-a-copying-gc-for-the-plush-programming-language/

## Plush：为基于 Actor 的语言带来垃圾回收 Plush 是一种动态类型的编程语言，灵感来自 Lox 和 JavaScript，专为声音和图形实验而设计。其关键特性是基于 Actor 的并行性，通过消除手动锁和内存安全问题来简化多线程。虚拟机被设计成没有全局锁，允许 Actor 独立运行，仅在消息传递期间同步。一个主要挑战是在这种 Actor 模型中处理垃圾回收，尤其是在 Actor 之间发送可能包含循环引用对象图时。所采用的解决方案是对消息数据进行*静默复制*——当一个 Actor 发送消息时，虚拟机将引用的对象复制到接收者的内存空间中。为了优化分配速度，每个 Actor 都有两个分配器：一个用于内部使用的私有分配器，另一个用于传入消息的邮箱分配器。这避免了在分配期间的锁竞争。在经过长时间的拖延后，一个复制垃圾回收器终于在合作者的帮助下实现，目前已完成 90%。有了垃圾回收器，Plush 现在可以处理更复杂的程序，并通过实时 3D Boing Ball 演示得到了证明。目前的工作重点是优化收集性能，并欢迎贡献，尤其是在性能分析和演示程序开发方面。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录为Plush编程语言构建一个复制垃圾回收器 (pointersgonewild.com) 9点由 ibobev 2小时前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

This is the fourth post in a series about Plush, a dynamically-typed programming language I've been building in my spare time. The language draws inspiration from Lox/JavaScript, and I created it as a platform to experiment and have fun with sound and graphics programming. It's very easy to just open a window and draw some pixels with very little boilerplate.

One of the key features of this language is that it has actor-based parallelism, which makes it relatively easy, even for beginners, to write multithreaded programs without having to think about locks and memory safety. I took some care to implement the Virtual Machine (VM) in such a way that internally, there is no global VM lock, meaning that independent actors only need to synchronize with each other when they are sending and receiving messages, but otherwise, they can execute fully independently.

One of the trickier aspects of the Plush VM design is that this is a garbage collected language, and actors can send messages (meaning heap-allocated objects) to one another. So, from the beginning, there was a design question of how I could make it so that these messages, which can be entire subgraphs of objects with cycles in them, could be sent from one actor to another, with dead objects being garbage-collected, and at the same time keep all of this efficient.

In an actor-based system, you don't want two actors holding a mutable reference to the same object. That means if you send an object from one actor to another, you either need to copy it, or you need to freeze the object (make it immutable). I wanted to avoid needing to freeze objects before sending messages because I felt it added complexity to the system and would mean the programmer would need to keep track of what is and isn't frozen. If you want to send a subgraph of objects to another actor, but you want to keep a local copy, then you first need to deepcopy these objects, and then deepfreeze your subgraph. It's conceptually much simpler if you can just send the root of the subgraph as a message, and the host VM copies everything for you behind the scenes. Furthermore, there's an issue with requiring messages to be immutable. The assumption there is that immutable objects can be safely shared between actors, but if you do that, then it means you need a VM-wide garbage collection to periodically synchronize with all your actors. Supporting this kind of immutable object sharing isn't inherently a bad idea, but it does inevitably add complexity to the system.

In Plush, when you send a message, the VM silently copies the subgraph of objects referenced in the message. That does open up more questions with regards to memory management though. For one thing, if I want to send a message from actor A to B, then I need to copy the message into a memory space (an allocator) that is owned by B. However, I'd like to avoid a situation where B needs to take a lock on its own memory allocator every time it wants to allocate anything, because that would greatly slow down allocations. The solution I came up with is to give each actor two allocators. An actor has an internal private allocator that it can allocate objects from without ever needing to take a lock, and a mailbox allocator which is used to send messages to it.

The double allocator design has the key benefit that each actor can do its own private allocations without locking, which maximizes allocation performance. I decided to go with a bump allocator and a copying GC design. This keeps things simple and means that allocating memory is very fast. There was just one problem: I kept putting off implementing the garbage collector because I was dreading it. Past experience has taught me that implementing a GC can be very error-prone. There are often lots of sneaky bugs that show up, and you can end up needing to refactor key parts of your VM.

You can write many cool programs even without a GC. There are lots of programs that never allocate much memory during their execution. However, not having a GC does eventually become limiting. I tried to write code to train a neural network to do beat detection in Plush, and that program allocated a lot of matrices. I could only run it for a few minutes or so before it would crash, even on my desktop with 128GB of RAM. I was running into the limits of Alloc-Until-You-Crash Technology (TM). Still, I kept procrastinating on the GC, until my friend Marco jumped in two weeks ago and offered to help. Collaborating with someone on the feature made it much easier to find the motivation to tackle this challenge.

The good news is, it took Marco and I just over two weeks and we got the GC about 90% complete. There were some segmentation faults to debug, but it seems like we fixed most of the issues fairly quickly. Based on past experience, I went ahead and wrote a number of adversarial test programs to try and stress test the GC and surface bugs. That strategy proved effective. Turns out it's sometimes not that stressful to do hard things if you break them up into many small steps.

For fun, with the help of LLMs, I wrote a program that is a tribute to the classic Amiga Boing Ball demo. The 3D animation runs seamlessly at 60 frames per second on my MacBook Air M1. It performs GC about once per second, and the pause isn't noticeable. Even though it's only flat-shaded polygons, I feel pretty proud that the Plush interpreter is fast enough to do 3D graphics in real-time.

There are still a few missing bits in the garbage collector. Namely, if you send many large objects to an actor and its message queue gets backlogged, you could fill the message allocator and cause the program to panic. There needs to be logic in place so that actors wait and retry if the receiver's allocator gets full. There also needs to be logic for the message allocator to grow in size when necessary. However, all the essentials are in place, and it's now possible to run long-running programs on Plush.

If you would like to contribute, one issue I've noticed is that collections can be fairly slow once you get past 300K objects, which I don't think should be the case. There is a test program that grows a linked list which demonstrates the problematic slowdown. I suspect that this might have to do with Rust hashmap slowness, or that there may be some quadratic behavior somewhere. We could use some help profiling and investigating that issue. I'm also looking for contributors to write fun demo programs using Plush. We now have the ability to do both graphics and audio output, so it would be possible to write a simple music tracker, for example, or some kind of procedural infinite dungeon of some kind. Infinite Wolfenstein, anyone?

为Plush编程语言构建一个复制垃圾回收器 Building a Copying GC for the Plush Programming Language

为Plush编程语言构建一个复制垃圾回收器
Building a Copying GC for the Plush Programming Language