用 C# 编写 .NET 垃圾回收器 – 第 6 部分:标记和清除
Writing a .NET Garbage Collector in C# – Part 6: Mark and Sweep

原始链接: https://minidump.net/writing-a-net-gc-in-c-part-6/

## .NET 垃圾回收器:实现标记阶段 本期内容重点在于使用 C# 实现自定义 .NET 垃圾回收器的“标记”阶段。标记阶段用于识别可达对象,从而确定哪些对象可以被释放。它从三个来源扫描“根”——无条件被认为是存活的引用:局部变量/线程静态存储(由运行时通过 `GcScanRoots` 处理)、GC句柄和最终化队列。 `GcScanRoots` 方法使用回调函数来遍历局部变量。`ScanContext` 结构体提供线程 ID 和栈限制等信息。为了标记对象,实现利用了方法表指针的对齐方式;设置最低有效位表示对象已被标记。 深度优先搜索 (DFS) 遍历对象图,使用栈来避免递归。目前,为了简单起见,忽略了“内部指针”(对象内的引用)。 标记完成后,“扫描”阶段遍历堆。未标记的非空闲对象将被清除(内存尚未重用,以确保过早回收不会导致崩溃),并替换为空闲对象,以保持堆的可遍历性。 虽然已经建立了一个功能性的基础,但由于缺少对 GC 句柄、最终化队列和内部指针的支持,当前的 GC 仍然会导致应用程序崩溃——这些是未来开发的主题。完整的代码可在 GitHub 上获取。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 用C#编写.NET垃圾回收器 – 第6部分:标记和清除 (minidump.net) 40点 由 pjmlp 7小时前 | 隐藏 | 过去的 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

After a long (way too long) break, it’s time to resume our journey towards building a .NET garbage collector in C#. In the previous parts, we saw how to implement the minimal set of GC APIs to allow a simple application to run, and how to lay out the objects in memory to make the heap walkable. We then learned how to find the references of a given managed object. If you need a refresher, don’t hesitate to jump back to those past articles:

  • Part 1: Introduction and setting up the project
  • Part 2: Implementing a minimal GC
  • Part 3: Using the DAC to inspect the managed objects
  • Part 4: Walking the managed heap
  • Part 5: Decoding the GCDesc to find the references of a managed object

If you don’t have time to read everything, I would recommend focusing on part 4 which explains the layout of the heap.

Now we have all the pieces of the puzzle to start implementing the mark phase of our garbage collection. The goal of the mark phase is to find all the objects that are currently reachable by user code, to deduce which ones aren’t reachable anymore and can be freed.

Marking starts from the roots. That is: references that the GC treats as unconditionally live at the beginning of a collection. The roots can be sorted into three buckets: the local variables and thread-static storage, the GC handles, and the finalization queue. You might also think of static fields, however in practice the static variables are kept alive by GC handles.

While the GC is responsible for the last two buckets, the first one is handled directly by the runtime. The IGCToCLR interface exposes a GcScanRoots method that takes a callback, which will be called for every local variable. In addition to the callback, the GcScanRoots method takes 3 arguments: condemned and max_gen which are only used for some corner cases with server GC, and so are largely inconsequential for us, and a ScanContext:

[StructLayout(LayoutKind.Sequential)]
public struct ScanContext
{
    public IntPtr thread_under_crawl;
    public int thread_number;
    public int thread_count;
    public IntPtr stack_limit; // Lowest point on the thread stack that the scanning logic is permitted to read
    public bool promotion; //TRUE: Promotion, FALSE: Relocation.
    public bool concurrent; //TRUE: concurrent scanning
    public IntPtr _unused1;
    public IntPtr pMD;
    public int _unused3;
}

Likewise, most of the fields in ScanContext are only useful for server GC (where threads are affinitized to a given heap). We’re going to use only two of them:

  • promotion tells the execution engine whether we’re scanning for the promotion (where we’re going to mark objects that are going to be promoted to an upper generation) or for the relocation (where we’re going to update pointers after moving objects in memory). As far as I can tell, it mostly impacts what roots are reported when conservative mode is enabled, so it shouldn’t affect us for now. Still, we’re going to set it to true.
  • _unused1 is a pointer-sized field that the GC can fill to its own discretion. We’re going to use it to store a pointer to our instance of GCHeap. We need it because the callback given to GcScanRoots must be decorated with UnmanagedCallersOnly because it will be called from native code, and methods decorated with this attribute must be static.
private void MarkPhase()
{
   ScanContext scanContext = new();
   scanContext.promotion = true;
   scanContext._unused1 = GCHandle.ToIntPtr(_handle);

   var scanRootsCallback = (delegate* unmanaged<GCObject**, ScanContext*, uint, void>)&ScanRootsCallback;
   _gcToClr.GcScanRoots((IntPtr)scanRootsCallback, 2, 2, &scanContext);
}

[UnmanagedCallersOnly]
private static void ScanRootsCallback(GCObject** obj, ScanContext* context, uint flags)
{
   var handle = GCHandle.FromIntPtr(context->_unused1);
   var gcHeap = (GCHeap)handle.Target!;
   gcHeap.ScanRoots(*obj, context, (GcCallFlags)flags);
}

private void ScanRoots(GCObject* obj, ScanContext* context, GcCallFlags flags)
{
   // TODO: actual implementation of the callback
}

Don’t worry, I wasn’t planning on dropping the term ‘conservative mode’ without explaining it. Conservative mode is enabled by setting DOTNET_gcConservative=1. It switches the execution engine from “precise” root tracking (where .NET knows exactly what is a root and what isn’t) to “conservative” root tracking. In conservative root tracking, the execution engine scans the whole stack and reports any value that points to the range of memory managed by the GC. It greatly complicates the work of the GC because any reported root could be a false positive. As I understand, it’s mostly meant for new environments where the CLR isn’t fully implemented and doesn’t support precise root tracking yet. It can also be used to test some new features. At this point, I have no plan to implement support for conservative mode in our custom GC.

Inside of our ScanRoots callback, we need to discover all the outgoing references from the given object, then browse the reference tree and mark all the objects that we discover. We’ve already seen in part 5 how to get the references from the given object, and we implemented it into an EnumerateObjectReferences method. To mark the objects, we need to store somewhere the information that we found a given object. There are a few ways to do that. For instance, on x64 the object header includes 4 bytes of padding to keep the alignment, and we could theoretically repurpose that space. However, I plan to use those bytes later to implement different kind of optimizations for other issues that the GC is going to face. Instead, we’re going to do the same thing as the actual .NET GC.

As a reminder, the layout of an object in memory is this:

                      Object         
                +-----------------+  
                | Object header   |  
                +-----------------+  
Object ref ---> | MethodTable*    | 
                +-----------------+ 
                |  Field1         | 
                +-----------------+ 
                |  Field2         | 
                +-----------------+ 
                |  Field3         |
                +-----------------|
                |  Field4         |
                +-----------------+

The trick is to take advantage of the fact that the method-table is aligned on a pointer boundary. It means that the two least-significant bits of the method-table pointer on a 32-bit runtime (three on a 64-bit runtime) are always going to be 0. Whenever it marks an object, the GC just sets the least-significant bit of the method-table pointer to 1. Of course, it must make sure to restore the original pointer at the end of the garbage collection, before resuming the runtime.

Accordingly, we add the Mark and Unmark methods on our implementation of GCObject, which will flip the value of the least-significant bit of the method-table pointer. IsMarked checks the current value of that bit. Last but not least, we introduce a MethodTable property which applies a bit mask to the method-table pointer so that we don’t have to worry about whether the object is marked or not whenever we just want to access the method-table (ideally we should only do so in code paths where an object might be marked, but we don’t really care about that level of performance at this point).

[StructLayout(LayoutKind.Sequential)]
public unsafe struct GCObject
{
    public MethodTable* RawMethodTable;
    public uint Length;

    public readonly MethodTable* MethodTable => (MethodTable*)((nint)RawMethodTable & ~1);

    public bool IsMarked() => ((nint)RawMethodTable & 1) != 0;

    public void Mark() => RawMethodTable = (MethodTable*)((nint)MethodTable | 1);

    public void Unmark() => RawMethodTable = (MethodTable*)((nint)MethodTable & ~1);

    // ...
}

Great, now we can implement our actual traversal of the reference tree! We use a DFS (depth-first search) instead of a BFS (breadth-first search) under the assumption that we are more likely to sequentially scan objects that are related to each other, which could improve cache locality. We must be careful to not use recursion, as the object graph can get really big.

private Stack<IntPtr> _markStack = new();

private void ScanRoots(GCObject* obj, ScanContext* context, GcCallFlags flags)
{
    if ((IntPtr)obj == 0)
    {
        return;
    }

    if (flags.HasFlag(GcCallFlags.GC_CALL_INTERIOR))
    {
        // TODO
        return;
    }

    _markStack.Push((IntPtr)obj);

    while (_markStack.Count > 0)
    {
        var ptr = _markStack.Pop();
        var o = (GCObject*)ptr;

        if (o->IsMarked())
        {
            continue;
        }

        o->EnumerateObjectReferences(_markStack.Push);
        o->Mark();
    }
}

Notice that for now we ignore roots that are marked with the GC_CALL_INTERIOR flag. Those are interior pointers. Consider for instance the following code:

private static ref int GetInteriorPointer()
{
    var array = new int[10];
    return ref array[5];
}

GetInteriorPointer returns a reference to an int, but that int is stored inside of an array. If the array is ever collected while some code holds that reference, bad things will happen. Therefore, by some mechanism, that ref int (called interior pointer) must keep the whole array alive. This is challenging for the GC because we are handed a pointer to an arbitrary part of an object, and we must somehow find the beginning of that object to mark it. For now, we’re just going to pretend that this problem doesn’t exist, and ignore interior pointers entirely.

After we marked the objects that are referenced, directly or indirectly, by the roots, we need to do a full scan of the heap. Whenever we find an object that isn’t marked, we know that this object isn’t reachable anymore and we can clear it. For now, our GC doesn’t reuse memory so we’re just going to clear the memory to make sure that user applications will crash if we accidentally collect an object that is still reachable. To keep the heap in a walkable state, we replace the old object with a free object (free objects are explained in part 4). Also, when we find a marked object, we make sure to unmark it to restore its method-table pointer back to its original state.

The WalkHeapObjects method has the same logic as the TraverseHeap method of part 4, but cleaned up and rewritten into an enumerator to be easily reusable.

private void Sweep()
{
    foreach (IntPtr ptr in WalkHeapObjects())
    {
        var obj = (GCObject*)ptr;

        bool marked = obj->IsMarked();
        obj->Unmark();

        bool isFreeObject = obj->MethodTable == _freeObjectMethodTable;

        if (!marked && !isFreeObject)
        {
            var startPtr = ptr - IntPtr.Size; // Include the header
            var endPtr = Align(startPtr + (nint)obj->ComputeSize());

            // Clear the memory
            new Span<byte>((void*)startPtr, (int)(endPtr - startPtr)).Clear();

            // Allocate a free object to keep the heap walkable
            AllocateFreeObject(ptr, (uint)(endPtr - startPtr - SizeOfObject));
        }
    }
}

private IEnumerable<IntPtr> WalkHeapObjects()
{
    foreach (var segment in _segments)
    {
        foreach (var obj in WalkHeapObjects(segment.Start, segment.Current))
        {
            yield return obj;
        }
    }
}

private IEnumerable<IntPtr> WalkHeapObjects(nint start, nint end)
{
    var ptr = start + IntPtr.Size;

    while (ptr < end)
    {
        yield return ptr;
        ptr = FindNextObject(ptr);
    }

    static unsafe nint FindNextObject(nint current)
    {
        var obj = (GCObject*)current;
        return Align(current + (nint)obj->ComputeSize());
    }
}

And that’s it! Of course, if you try to run an application with the GC at this stage, it will quickly crash: we have yet to implement support for interior pointers, and we have completely skipped the other two types of roots: GC handles and the finalization queue. This will be the subject of our next articles.

As usual, the full code is available on GitHub.

联系我们 contact @ memedata.com