琥珀树:介于花楸红与翠绿之间的中间色
Amber Tree: A Middle Ground Between Rowan Red and Green Trees

原始链接: https://blog.gplane.win/posts/introducing-amber-tree.html

Rowan 的语法树在功能丰富、堆分配的“红树(Red tree)”与高性能但受限的“绿树(Green tree)”之间做出了权衡。为了优化 `wasm-language-tools`(该工具极少需要向上遍历),作者开发了“琥珀树(Amber Tree)”。 琥珀树是一种轻量级包装器,存储了对 `GreenNode` 的引用以及预计算的 `TextRange`。通过利用 Rust 的借用检查器将生命周期直接与绿树绑定,它在没有循环父引用开销的情况下,同时实现了易用性和内存安全。 这种架构转变带来了两大性能提升: 1. **通用遍历:** 基准测试显示,相较于已优化的红树实现,速度提升了 23% 到 50%。 2. **分配效率:** 通过使标记文本(token text)的生命周期与树结构对齐,格式化程序避免了不必要的堆分配和字符串拷贝。这使得 `wat_formatter` 的执行时间减少了约 59%。 最终,琥珀树充当了一个中间地带,在舍弃不必要的父/兄弟节点导航的同时,既提供了用户友好的 API 和位置感知能力,又保留了绿树的性能优势。

抱歉。
相关文章

原文

Problems of rowan red tree and green tree

Rowan provides two kinds of syntax trees:

  • Red tree: Feature-rich and API-friendly. It allows traversal not just to children and tokens, but also to parents and siblings via parent references. However, these parent references create cyclic references, so nodes must be heap-allocated and wrapped in Rc to reduce cloning overhead.
  • Green tree: Much better performance, but with trade-offs. You can only access children, and the API doesn’t directly distinguish between nodes and tokens - it returns NodeOrToken enum, which is awkward to work with. Green nodes also don’t store a text range, so there’s no way to know where a node appears in the source code.

It’s worth noting that in rowan, the red and green trees aren’t two independent structures. They are different views of the same underlying data: the red tree is essentially a wrapper around the green tree.

For most use cases in my project, wasm-language-tools, I don’t need to traverse up to parents or siblings. I use the red tree primarily because its API is more convenient and it provides text ranges. So can I design a syntax tree that offers a reasonably friendly API and text range support, without parent/sibling consideration, while approaching green tree performance?

Absolutely. And it turns out to be much simpler than you might expect. Since its capabilities sit somewhere between the red and green trees, I call it the “Amber Tree.”

Implementation

Here’s the definition of an amber node:

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub struct AmberNode<'a> {
    green: &'a GreenNode,
    range: TextRange,
}
  • green: Holds reference to the green node, making the amber node lightweight. Thanks to Rust’s borrow checker, the lifetime of the AmberNode is tied to its GreenNode, ensuring memory safety without runtime overhead.
  • range: Stores the node’s source location, solving the green node’s lack of text range info.

Both fields are Copy (GreenNode itself isn’t Copy but the reference is) so AmberNode can also be Copy naturally.

From here, we can implement some APIs we need. Because we hold a reference to the green tree, traversing to children or deeper descendants remains cheap and efficient.

I replaced parts of my language service that previously used the red tree with the amber tree. Note that these benchmarks compare against my own optimized implementation of the red tree (not the vanilla rowan implementation), so the baseline is already quite tuned.

BenchmarkBeforeAfterChange
unchanged text15.469 µs7.609 µs↓ 50.8%
changed text224.73 µs172.39 µs↓ 23.3%

Improving formatter

wat_formatter is a code formatter for WebAssembly Text Format (WAT) - similar to rustfmt for Rust but for .wat files. wat_formatter rarely needs parent or sibling access, making it a perfect candidate for the amber tree. This switch unlocked a significant secondary optimization.

With the red tree, SyntaxToken text is stored on the heap, and its lifetime is decoupled from the token itself. This meant I couldn’t pass the &str directly to tiny_pretty; I had to call .text().to_string(), incurring unnecessary heap allocations and string copies.

With the amber tree, the token text’s lifetime is tied directly to the green tree reference held by the node. This allows me to pass the &str directly to tiny_pretty with zero allocations.

Due to the design of SyntaxToken in red tree, the lifetime of token text (which is &str) isn’t related to the lifetime of SyntaxToken itself, so we can’t pass that &str to tiny_pretty directly. To work around, we have to call .text().to_string() but this introduces heap allocation and string copying, decreasing the performance. Now with amber tree, the lifetime of token text is consistent with the lifetime of token itself because amber token holds the reference to GreenToken, and we can directly use the &str returned by amber token.

The result? The formatter’s execution time dropped from ~85.191 µs to ~34.808 µs - a ~59% reduction in time, or roughly a 2.4x speedup.

Conclusion

By sacrificing upward traversal, the amber tree provides a sweet spot: the ergonomics of the red tree with performance approaching the green tree.

More detail and implementation can be found in this pull request:#36 Rewrite rowan with our own implementation in wasm-language-tools.

联系我们 contact @ memedata.com