重构 CRuby 中的内存管理

重构 CRuby 中的内存管理
Reworking Memory Management in CRuby

原始链接: https://railsatscale.com/2025-09-16-reworking-memory-management-in-cruby/

## 将 MMTk 与 Ruby 集成：性能提升 Shopify 和 ANU 正在合作将内存管理工具包 (MMTk) – 一个用于构建高性能垃圾收集器的模块化框架 – 与 Ruby 集成。这旨在提供超越 Ruby 当前的标记和清除算法的显著性能提升，利用 MMTk 的高级收集器，如分代混合收集器。 Ruby 中存在两个 MMTk 实现：一个更高级、实验性的分支，由 MMTk 团队维护；以及在 Ruby 的模块化 GC 框架中的重新实现。本摘要重点介绍 MMTk 团队的实现。主要挑战包括适应 Ruby 的移动垃圾收集器（在 2.7 中引入）和优化终结处理。团队通过识别“潜在固定父对象”（PPPs） – 在对象移动期间需要特殊处理的对象 – 并最大限度地减少它们的数量以实现高效处理来解决前者。在并行垃圾收集过程中，发现终结阶段存在一个主要瓶颈，尤其是在 `free`（内存释放）方面。解决方案是通过 MMTk 使用 Ruby 对象直接为常见类型（数组、字符串）分配缓冲区，从而最大限度地减少对系统 `malloc` 的依赖，实现自动内存管理。目前的工作重点是优化的内存布局、对象移动技术以及与 Ruby 的 JIT 编译器的集成。此次合作旨在提高 Ruby 的性能，并为开发人员和研究人员提供宝贵的见解。

This blog post was adapted from our paper and talk at the International Symposium on Memory Management 2025.

Click here to read the paper

We would first like to acknowledge the late Chris Seaton, who initiated our collaboration with the Australian National University on this project. We are thankful for his contribution, vision, and leadership. Without him, none of this would have been possible.

Background

The Australian National University (ANU) and Shopify are collaborating on integrating the Memory Management Toolkit (MMTk) with Ruby. We are supporting the project and working alongside ANU researchers to explore how to build a next-generation garbage collector for Ruby.

If you’re not familiar with MMTk, it offers a highly modular, VM-neutral framework for rapidly building high-performance garbage collectors. Once a language plugs into MMTk, it can leverage a wide range of built-in garbage collection algorithms, ranging from canonical collectors such as NoGC, Mark and Sweep, and Immix to more performant collectors such as Generational Immix and Sticky Immix. Many of these algorithms are considerably more sophisticated than the Mark and Sweep algorithm used in Ruby and have the potential to deliver significant performance gains.

There are currently two implementations of MMTk in Ruby: one is maintained by the MMTk team and is a fork of Ruby (in the mmtk/ruby and mmtk/mmtk-ruby repositories), the other lives inside Ruby using the modular GC framework (in the ruby/mmtk repository). You might be wondering, why are there two implementations? The MMTk team’s implementation is much more advanced, with around 5 years of development. They continue to use it to experiment and develop new techniques to further leverage MMTk’s powers and improve performance. The implementation upstreamed to Ruby uses the modular GC framework and is designed to be part of an ecosystem of garbage collectors for Ruby. However, it is a reimplementation that uses techniques and knowledge from the MMTk team’s implementation, but is still quite behind.

In this blog post, we will follow the paper and mostly focusing on MMTk team’s implementation. However, if you want to learn more about the modular GC framework, you can watch this talk at RubyKaigi 2025 or read this blog post.

Challenges

In the paper, we discuss some of the challenges we faced and solutions we used while integrating MMTk with Ruby. In this blog post, we highlight some of these challenges, but please read the paper if you want the entire picture.

Copying Garbage Collector

When Ruby 2.7 introduced a moving garbage collector, it marked the first time that the memory location of objects could be moved. To facilitate this, there needed to be additional code in each of the data types in Ruby to update the address of the object after it has been moved. To ensure backwards compatibility, each data type needed to opt-in to using a new API that supports object movement, and all the existing types would pin the objects they refer to. A pinned object cannot move.

This pinning system works for Ruby’s default (built-in) garbage collector, because it has a marking phase to determine objects that are live and objects that are pinned followed by a compaction phase to move non-pinned objects. However, many of MMTk’s algorithms combine the marking and moving phases, meaning that an object is moved the moment it is marked. For algorithms like Immix, objects can be pinned, but they must be specified ahead of time. One solution would be to scan the heap twice: first to determine which objects get pinned, and again to mark all live objects and move the unpinned objects. However, this is inefficient because it essentially involves scanning the whole Ruby heap twice.

Fortunately, it’s been more than 5 years since a moving garbage collector was introduced to Ruby, so almost all the types in Ruby and many native gems support it. We introduced a new concept called Potentially Pinning Parents, or PPP for short. An object is a PPP if it could potentially contain references that cannot be moved. Earlier this year, we made an effort to reduce PPP objects. In fact, as of the time of writing, there are no user-facing Ruby objects that are PPPs except for ones defined in native gems (which we do not have any control over). There are still a few internal Ruby objects that are PPPs, but we are working on eliminating those as well.

Since we now know whether an object is a PPP at allocation time, MMTk keeps a list of PPP objects that are alive. Using that list, during a garbage collection cycle, it inspects every PPP object to determine the child objects that should be pinned before moving onto the phase to mark and move objects. Since the set of PPP objects is now small, this phase can be completed very quickly.

Finalization

Before Ruby 3.2, all Ruby objects were allocated out of the garbage collector in fixed 40-byte slots. This meant that any additional data for the object needed to be allocated externally, usually through the system using malloc. In Ruby 3.2, we introduced Variable Width Allocation which allows us to allocate dynamic slot sizes through the garbage collector. However, because of legacy reasons and technical limitations of Variable Width Allocation, there are still many cases where we need to allocate memory out of the system through malloc.

One of the superpowers of MMTk is that it supports parallelism in the garbage collector. Unlike Ruby’s default garbage collector, MMTk can split the work that needs to be done during a GC cycle (marking, sweeping, moving, etc.) into small chunks (MMTk calls these “work packets”) and process these work packets in parallel across multiple CPU cores.

It’s important to note however that while MMTk can perform its GC work in parallel, it does not run concurrently with the VM. In that sense, MMTk is a parallelized GC implementation, but it is not concurrent, meaning that Ruby code cannot run while the garbage collector is running, so it still requires the Ruby VM to be stopped.

There were many challenges that we had to overcome to move from a serial garbage collector to a parallel one, including removing dependence on thread-local variables and race conditions. However, while those issues were apparent as crashes and unexpected behavior, we ran into a tricky problem: our garbage collection cycles were slower the more threads we used!

This was counterintuitive, because if each CPU core does less work, then shouldn’t it run faster? We looked at performance profiles more closely, and saw that it was the finalization phase that was slower. The finalization phase iterates over all dead objects to run code to do things like reclaim memory or close file descriptors. Specifically, we found that the culprit was free, the function that frees memory allocated through malloc. In the following graph, we freed 100 million 32-byte pieces of memory using free. We measure the time taken (in milliseconds) with the work split across a varying number of threads and using various implementations of malloc. We see that for glibc, jemalloc, and tcmalloc, they all scale negatively with the number of threads. The only allocator that offers any scalability is mimalloc, but we see little to no gain past a factor of 4. This is likely due to mimalloc’s design for a fast free that maximizes concurrency.

Threads	glibc	jemalloc	tcmalloc	mimalloc
1	1,263	3,935	4,988	903
2	5,002	11,719	13,539	493
3	5,787	17,606	11,374	346
4	6,790	22,478	17,295	265
5	8,058		17,785	291
6	7,473		19,227	243
10	9,400		23,350	230
100	11,260		24,195	228

Another difference between MMTk and the default GC is that if an object does not require finalization (i.e. it does not have any resources that need to be reclaimed), then we don’t need to visit it at all, further improving performance. MMTk can use a bump pointer allocator, which increments a pointer for every allocation until it reaches the end of the allocation space. Meanwhile, the default GC in Ruby uses a freelist allocator, which uses a linked list of free slots to allocate objects into. Since building the freelist requires visiting all dead objects anyway, the default GC won’t be able to take advantage of this improvement.

The solution to this challenge was to avoid using malloc. Instead, MMTk allocates the buffer for common types (Array, String, and MatchData objects) using hidden Ruby objects instead. Since these buffer objects are now Ruby objects, they are also allocated through MMTk. As a result, these buffers now have automatic memory management, rather than manual memory management like malloc. This means that Array, String, and MatchData need to mark their buffer objects to keep those buffers alive in the marking phase, but, in return, they don’t need to do anything anymore during the finalization phase.

Future Work & Conclusion

In this blog post, we looked at a few of the challenges we encountered in integrating MMTk with Ruby and the solutions we used. We hope that sharing our experiences can provide insights for Ruby developers, garbage collector researchers, and language designers.

Work continues in MMTk’s fork of Ruby to experiment with more optimized memory layouts, new techniques for object movement, and integrations between JIT compilers and the garbage collector. We are also using the lessons we learned with MMTk to make improvements into Ruby upstream.