表示层次结构

表示层次结构
Representing Hierarchies

原始链接: https://gpfault.net/posts/first-child-next-sibling.html

## 表示层次数据结构本文讨论了两种常见的表示层次数据的方法，例如 HTML 元素或文件系统，其中节点可以有多个子节点。第一种，即“显而易见的方法”，在每个节点中使用一个容器（如 `std::vector`）来存储指向其子节点的指针。这允许通过索引快速访问子节点，但会引入动态内存分配开销、潜在的缓存缺失以及管理复杂性。或者，一种“无分配的方法”利用每个节点中指向 `first_child` 和 `next_sibling` 的指针。这消除了动态分配和额外的间接寻址，简化了内存管理，并且非常适合侵入式数据结构。然而，通过索引访问子节点需要线性遍历兄弟指针。作者更倾向于无分配方法，当算法主要*遍历*层次结构，而不是需要按索引访问特定子节点时，例如在 3D 变换层次结构中常见的场景。最佳选择取决于应用程序的具体访问模式。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录表示层次结构 (gpfault.net) 14 分，作者 ibobev 1 天前 | 隐藏 | 过去 | 收藏 | 1 条评论 jeffjeffbear 1 天前 [–] 我不太喜欢这个方案，因为它会遇到所有常见的链表问题，而且只有在采用他们所说的侵入式方法时，指针才会被分配到结构体中，才能实现“无分配”。使用 std::vector 指针方法不会使用过多的内存。我自己喜欢分配一个过大的块，然后将所有内容塞进去，如果关心性能，就处理到该数组中的索引。如果你关心局部性，甚至可以以某种方式展平树。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

published on Jan 01 2026

Very often we have to deal with hierarchies of things. HTML elements, files and folders, scene graphs... basically, anything where a node might have an arbitrary number of "child" nodes, which can also have their own children, and so on. There are a couple of general ways to represent this type of structure.

The Obvious Way

One obvious way is storing an array of pointers to child nodes within a node:



struct node {
  std::vector<node*> children;
};

An actual implementation might have more details, might actually own the child nodes, or not use `std::vector`, but you get the idea.

This approach is actually pretty nice if the algorithms you'll be running on this structure ever need to access a child with a particular index (e.g. the third child of the root's first child).

The downside here is that the `children` array needs to be backed by dynamically allocated memory. So now we have more allocations. We also are going to have to worry about _how_ these allocations are made (unless we are okay with always using the same allocator for all of them). There are also more opportunities for cache misses due to extra pointer dereferencing.

One could try to improve this by using a fixed-size array (if you have a small upper bound on the potential number of children), or implementing small buffer optimization (i.e. small string optimization, but for buffers).

If you don't actually care about accessing children by index though, all of this sounds like a drag.

The Allocation-Free Way

There's a pretty obvious alternative representation:



struct node {
  node* first_child;
  node* next_sibling;
};

A node contains a pointer to its own first child, and a pointer to the next child of its parent (i.e. its sibling). This completely obviates all the memory management problems and eliminates extra indirection. It's also very convenient to use with intrusive data structures.

Here's an example of what an instance of this structure could look like. Here, the root has three children, the first child of the root has two, and the third child of the root has one. All the other nodes have no children.

The downside is - accessing a child with a particular index is no longer "free" since you have to linearly follow the sibling pointers starting at the first child.

I had found this representation useful in the past (for example, when dealing with 3D transform hierarchies) because in many cases, my code was just spending time traversing parts of the tree, not caring about any specific child of a given node in particular.

Like this post? Follow me on bluesky for more!