黑白数组:快速、有序,基于O(log N)内存分配。
Black-White Array: fast, ordered and based on with O(log N) memory allocations

原始链接: https://github.com/dronnix/bwarr

## 黑白数组 (BWArr) 总结 BWArr 是一种新的、快速的、基于数组的数据结构,设计为 BTree 等基于树的结构的直接替代品。它为插入、删除和搜索操作提供 **O(log N)** 的摊还时间复杂度,其主要优势在于:仅需 **O(log N)** 的内存分配——减少垃圾回收压力。 与传统结构不同,BWArr 原生支持重复元素(多重集行为),并且由于其无指针设计,具有较低的内存开销,从而提高 CPU 缓存效率。虽然通常效率很高,但批量操作和序列化仍在开发中。 在频繁插入时,对于非常大的数据集,性能可能会出现延迟峰值,可以通过异步操作来缓解。对长序列元素执行删除操作可能会暂时影响最大/最小值的查找和迭代性能,但摊还复杂度仍然是 **O(log N)**。基准测试表明,其性能与 Google 的 BTree 具有竞争力,并且计划针对不同的架构进行进一步优化。需要 Go 1.22+。

一种新的“黑白数组”数据结构最近在Hacker News上分享,旨在实现快速、有序的数据存储,并限制内存分配。然而,初步性能测试表明,对于诸如获取和迭代之类的常见任务,它的速度比传统的B树慢。 批评指出潜在的效率低下,包括尽管声称避免了指针追逐,但在Go实现中存在隐藏的指针追逐,以及显著的内存开销(数据大小的1.5-3倍)。该结构类似于Log-Structured Merge (LSM) 树的一种变体,使用单个数组来管理数据段。 虽然声称搜索时间为O(log N),但这仅适用于均匀随机的数据分布;其他输入可能会将性能降级到O(log² N)。一位评论者建议优化措施,例如预先排序连续的活动段以减少二分查找操作,以及使用多路合并以加快插入速度。一个关键的结论是预先分配足够的空间(“Reserve”)以避免代价高昂的中间数组复制。
相关文章

原文

CI codecov Go Reference Go Report Card

The Black-White Array (aka BWArr) is a fast, ordered data structure based on arrays with $O(\log N)$ memory allocations.

The idea of Black-White Array was invented and published by professor Z. George Mou in Black-White Array: A New Data Structure for Dynamic Data Sets. This repository contains the first public implementation.

Professor Z. George Mou

  • $O(\log N)$ memory allocations for Inserts - no pressure on GC;
  • Fast insert, delete, and search operations $O(\log N)$ time amortized complexity;
  • Array-based and pointerless makes it CPU-friendly: cache locality / sequential iteration / etc;
  • Supports duplicate elements natively (multiset behavior) - no need for wrapping values into structs to make them unique;
  • Drop-in replacement for github.com/google/btree and github.com/petar/GoLLRB;
  • Low memory overhead - no pointers per element, compact memory representation;
  • Batch-friendly: arrays under the hood allow efficient bulk operations (work in progress);
  • Easily serializable (work in progress);
  • One per $N$ insert operations complexity falls down to $O(N)$, though amortized remains $O(\log N)$. For real-time systems, it may introduce latency spikes for collections with millions of elements. Could be mitigated by async/background inserts.
  • For a small number of elements Search()/Delete() operations may take $O((\log N)^2)$. 50% of elements take $O(\log N)$ time, 75% - $O(2\log N)$, 87.5% - $O(3\log N)$, etc.
  • When deleting long series of elements, a Max()/Min() operation can take $O(N/4)$. Amortized complexity for series of calls remains $O(\log N)$.
  • When deleting long series of elements, iteration step can take $O(N/4)$. Amortized complexity for iteration over the whole collection remains $O(\log N)$ per element.

Benchmarks in comparison with Google BTree.

Measures the time, allocs and allocated KBs to insert N unique random int64 values into an empty data structure. Both BWArr and BTree start empty and insert all values one by one.

Insert performance Insert Allocs Insert Alloc_Bytes

Allocations on smaller values:

Insert Allocs small

Measures the time to look up N values by their keys in a pre-populated data structure. The data structure is populated with all values before timing starts, then each value is retrieved by key.

Get performance

Measures the time to iterate through all N values in sorted and non-sorted orders. Ordered Iterate performance Unordered Iterate performance

Additional benchmarks and details are available in the bwarr-bench repository. More methods will be added, also expect separate benchmarks for AMD64 and ARM64 architectures.

Requires Go 1.22 or higher.

go get github.com/dronnix/bwarr

Then import in your code:

import "github.com/dronnix/bwarr"
package main

import (
    "fmt"

    "github.com/dronnix/bwarr"
)

func main() {
    // Create a BWArr with an integer comparison function
    // The second parameter (10) is the initial capacity hint
    bwa := bwarr.New(func(a, b int64) int {
        return int(a - b)
    }, 10)

    // Insert elements
    bwa.Insert(42)
    bwa.Insert(17)
    bwa.Insert(99)
    bwa.Insert(23)
    bwa.Insert(8)

    fmt.Printf("Length: %d\n", bwa.Len()) // Output: Length: 5

    // Get an element
    val, found := bwa.Get(42)
    if found {
        fmt.Printf("Found: %d\n", val) // Output: Found: 42
    }

    // Delete an element
    deleted, found := bwa.Delete(17)
    if found {
        fmt.Printf("Deleted: %d\n", deleted) // Output: Deleted: 17
    }

    // Iterate in ascending order
    fmt.Println("Elements in sorted order:")
    bwa.Ascend(func(item int64) bool {
        fmt.Printf("  %d\n", item)
        return true // return false to stop iteration early
    })
    // Output:
    //   8
    //   23
    //   42
    //   99
}
联系我们 contact @ memedata.com