垃圾回收很有用。
Garbage Collection Is Useful

原始链接: https://dubroy.com/blog/garbage-collection-is-useful/

## 增量解析与垃圾回收的见解 作者在使用Ohm(一种增量解析器)和ProseMirror构建双向文本编辑器时遇到了性能问题。目标是在底层文本更改时,利用Ohm的增量解析能力来高效地更新ProseMirror文档。 最初的解决方案是“追踪”所有节点,以识别编辑期间被删除的节点——本质上是一种垃圾回收方法。然而,这违背了增量性的目的,即使对于小的更改也需要对整个文档进行扫描。 回忆起论文“垃圾回收的统一理论”中的一个关键见解,作者切换到“引用计数”方法。他们不再追踪*存活*节点,而是通过在文档更新时递减引用计数来追踪*死亡*节点。这使得他们能够快速识别被删除的节点,而无需遍历整个文档,从而显著提高了增量更新的性能。核心思想是关注识别哪些内容被*删除*,而不是哪些内容*保留*,这反映了垃圾回收中追踪和引用计数之间的对偶性。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 垃圾回收是有用的 (dubroy.com) 8 分,由 surprisetalk 发表于 2 小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文
November 14, 2025

A long time ago, I spent a few years working on garbage collection in the J9 Java VM. And even though I’ve since done mostly done higher-level stuff, having a deeper knowledge of GC has continued to come in useful.

Yesterday, it was an insight from one of my favourite GC papers, A Unified Theory of Garbage Collection, which helped me solve a tricky problem.

ohm’s incremental parsing

I’m working with a team that’s using Ohm to parse text documents and render a rich text version in ProseMirror. The goal is bidirectional updates: changes in ProseMirror should propagate to the text version, and vice versa.

Ohm supports incremental parsing, which means that if you parse some text and then make a small edit, it can quickly reparse by reusing portions of the previous result.

It also supports limited form of incremental transforms. You can define an attribute, which is kind of like a memoized visitor, and the attribute value for a given node will only need to be recalculated if the edit affected one of its subtrees. So you can easily implement a form of persistent data structure, where each new value (e.g., an AST) shares a bunch of structure with the previous one.

the problem

Using this machinery, I tried making an pmNodes attribute that produced a ProseMirror document for a given input. When the text document is edited, it produces a new tree which shares a bunch of nodes with the previous one.

The `pmNodes` tree before and after an edit.

Then, my plan was to construct a ProseMirror transaction that would turn the old tree into the new one. To do that, it’s helpful to know which nodes appeared in the old document, but not the new one.

My first implementation of this was equivalent to tracing garbage collection — after each edit, I walked the entire document, and recorded all the nodes in a Set. The difference between the sets told me which nodes had died.

But this kind of defeats the purpose of incrementality — if you have a long document and make a small edit, we should be able to process without visiting every node in the document.

the solution

Then I remembered A Unified Theory of Garbage Collection, a 2004 OOPSLA paper by some former colleagues of mine:

Tracing and reference counting are uniformly viewed as being fundamentally different approaches to garbage collection that possess very distinct performance properties. We have implemented high-performance collectors of both types, and in the process observed that the more we optimized them, the more similarly they behaved — that they seem to share some deep structure.

We present a formulation of the two algorithms that shows that they are in fact duals of each other. Intuitively, the difference is that tracing operates on live objects, or “matter”, while reference counting operates on dead objects, or “anti-matter”. For every operation performed by the tracing collector, there is a precisely corresponding anti-operation performed by the reference counting collector.

This was the answer I needed! Rather than visiting all the live objects, I wanted to only visit the dead ones, and reference counting would let me do that.

So I added a way of maintaining a reference count for all the nodes in the doc. When we produce a new document, we decrement the reference count of the old root node (it will always be 0 afterwards). So we recursively decrement the ref count of its children, and so on. This gives me exactly what I wanted — a way to find all the nodes that were not reused, without having to visit most of the nodes in the doc.

联系我们 contact @ memedata.com