索引，数量，偏移量，大小

索引，数量，偏移量，大小
Index, Count, Offset, Size

原始链接: https://tigerbeetle.com/blog/2026-02-16-index-count-offset-size/

## 更少 Bug 的命名：TigerBeetle 方法作者讨论了计算机科学中一个持续存在的问题：命名以防止错误。他们认识到许多错误源于简单的拼写错误或变量误用（尤其是在具有/不具有阴影的语言中），并提倡利用强静态类型——但承认其局限性，尤其是在索引方面。他们的解决方案在 TigerBeetle 项目中实现，围绕一致的命名约定：使用“**count**”表示*项目数量*，使用“**index**”指代*特定项目的位置*。这强制执行不变式 `index < count`，使不正确的组合立即可见。他们还使用“**size**”表示处理原始字节数组时的字节数，并避免使用模棱两可的术语“length”。这种约定，结合“大端命名”（限定符作为后缀）和保持名称长度一致，创建了自文档代码，其中潜在错误“显现”。作者认为，即使是小的防御性技术，分层使用也能显著降低错误的可能性，尽管这并非万能药。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录索引，计数，偏移量，大小 (tigerbeetle.com) 5 分，ingve 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论帮助指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

Wherein we make progress towards solving one of the most vexing problems of Computer Science — naming things.

I am at a point in my career where the bulk of my bugs are stupid — I simply fail to type in the code I have in my mind correctly. In languages with shadowing (like Rust), I will fail to use a shadowed variable from the outer scope. In languages without shadowing (like Zig), I will use the wrong version of a variable.

Pests like these are annoying, so I am always on the lookout for tricks to minimize the probability of bugs. One of the best possible tricks is of course strong static typing. Types are good at preventing me from doing stupid things by accident. But types have limitations. The text of a well-typed program is a two-in-one artifact — a specification of behavior of a machine (the algorithm), and a proof that the behavior is not unacceptable. Zero cost abstractions are code without behavior, just proofs!

The art of skillful typing lies in minimizing verbosity of the proof, while maximizing the amount of unwanted behaviors ruled out, weighted by the probability and the cost of misbehavior. But this ratio is not always favorable — the code can be so proof-heavy that it becomes impossible to understand what it actually does!

There’s one particular cranny where types don’t seem to usefully penetrate yet: indexing and associated off-by-one errors.

If you don’t need indexing arithmetic, you can use newtype pattern to prevent accessing oranges with apple-derived indexes. You can even go further and bind indexes to specific arrays, using, e.g., Gankra trick, but I haven’t seen that to be useful in practice.

If, however, you compute indexes, you need to be extra careful to stay in bounds of an array, and need to be mindful that the maximum valid index is one less than the length of the array. While we don’t solve this problem perfectly at TigerBeetle, I think we have a naming convention that helps:

Thanks @marler8997 for the illustration idea!

We consistently use count whenever we talk about the number of items, and index to point to a particular item. The positive invariant is index < count. Consistency is the trick — there are certain valid and invalid ways to combine indexes and counts in an expression, and, if there’s always an _index or a _count suffix in the name, wrong combinations immediately jump out at you, dear reader, even if you don’t understand the specifics of the code.

In low-level code you often need to switch between a well-typed representation []Tand raw bytes []u8. To not confuse the two index spaces, the “count of bytes” is always called a size. By definition,

size = @sizeOf(T) * count;

NodePool:

pub fn release(pool: *NodePool, node: Node) void {
    comptime assert(meta.Elem(Node) == u8);
    comptime assert(meta.Elem(@TypeOf(pool.buffer)) == u8);

    assert(@intFromPtr(node) >= @intFromPtr(pool.buffer.ptr));
    assert(
        @intFromPtr(node) + node_size <=
            @intFromPtr(pool.buffer.ptr) + pool.buffer.len
    );

    const node_offset =
        @intFromPtr(node) - @intFromPtr(pool.buffer.ptr);

    const node_index =
        @divExact(node_offset, node_size);

    assert(!pool.free.isSet(node_index));
    pool.free.set(node_index);
}

ewah implementation:

pub fn decode(
    source: []align(@alignOf(Word)) const u8,
    target_words: []Word,
) usize {
    assert(source.len % @sizeOf(Word) == 0);
    assert(disjoint_slices(u8, Word, source, target_words));

    const source_words = mem.bytesAsSlice(Word, source);

    var source_index: usize = 0;
    var target_index: usize = 0;
    while (source_index < source_words.len) {
        const marker: *const Marker =
            @ptrCast(&source_words[source_index]);
        source_index += 1;

        @memset(
            target_words[target_index..][0..marker.uniform_word_count],
            if (marker.uniform_bit == 1) ~@as(Word, 0) else 0,
        );
        target_index += marker.uniform_word_count;

        stdx.copy_disjoint(
            .exact,
            Word,
            target_words[target_index..][0..marker.literal_word_count],
            source_words[source_index..][0..marker.literal_word_count],
        );
        source_index += marker.literal_word_count;
        target_index += marker.literal_word_count;
    }
    assert(source_index == source_words.len);
    assert(target_index <= target_words.len);

    return target_index;
}

Note well that the index/count convention synergizes with two other TigerStyle shticks. We use “big endian naming”, where qualifiers are appended as suffixes:

source
source_words
source_index

And we try to make sure that dual names have the same length:

source
target

The code aligns itself, and makes the bugs pop out:

source_index += marker.literal_word_count;
target_index += marker.literal_word_count;

索引，数量，偏移量，大小 Index, Count, Offset, Size

索引，数量，偏移量，大小
Index, Count, Offset, Size