高性能GPU布谷鸟过滤器
Show HN: GPU Cuckoo Filter – faster queries than Blocked Bloom, with deletion

原始链接: https://github.com/tdortman/cuckoo-filter

该库提供了一种高性能、GPU加速的Cuckoo过滤器实现,这是一种空间高效的概率数据结构。它作为毕业论文的一部分开发,利用CUDA技术,与基于CPU和其他过滤器实现(TCF、GQF、Bloom)相比,显著提高了批量插入、查找和删除操作的速度。 主要特性包括可配置的指纹大小、桶大小和驱逐策略(DFS/BFS),以及排序插入以优化内存访问。它通过gossip和进程间通信(IPC)支持多GPU设置,用于过滤器共享。 在NVIDIA GH200上的基准测试表明,与CPU Cuckoo过滤器相比,查找速度提高了高达973倍,插入速度提高了583倍,并且相对于其他替代方案也获得了显著的提升。该库是header-only的,兼容C++20,并使用Meson进行构建。它提供了单GPU和多GPU过滤器实现,并包含示例应用程序、基准测试和测试。

一种新的GPU Cuckoo Filter实现,由开发者tdortman(github.com/tdortman)在Hacker News上分享,比Blocked Bloom filters提供更快的查询速度,*并且*支持删除——这是Bloom filters通常不具备的功能。 讨论集中在性能分析上,特别是误报率(FPR),开发者现在已将其添加到项目的readme中。有人担心处理无法直接驻留在GPU上的大型数据集时可能出现瓶颈,并认为PCIe总线可能会成为限制因素。 开发者承认了这个问题,指出即使使用NVLink,GPU之间的通信对于非常大的数据集也可能成为瓶颈,因为被查询的数据需要位于GPU上才能获得最佳性能。该项目源于一篇相关的论文,提供了更详细的FPR分析。
相关文章

原文

Documentation

A high-performance CUDA implementation of the Cuckoo Filter data structure, developed as part of the thesis "Design and Evaluation of a GPU-Accelerated Cuckoo Filter".

This library provides a GPU-accelerated Cuckoo Filter implementation optimized for high-throughput batch operations. Cuckoo Filters are space-efficient probabilistic data structures that support insertion, lookup, and deletion operations with a configurable false positive rate.

  • CUDA-accelerated batch insert, lookup, and delete operations
  • Configurable fingerprint size and bucket size
  • Multiple eviction policies (DFS, BFS)
  • Sorted insertion mode for improved memory coalescing
  • Multi-GPU support via gossip
  • IPC support for cross-process filter sharing
  • Header-only library design

Benchmarks at 80% load factor on an NVIDIA GH200 (H100 HBM3, 3.4 TB/s). The GPU Cuckoo Filter is compared against:

L2-Resident (4M items, ~8 MiB)

Comparison Insert Query Delete
GPU vs CPU Cuckoo 360× faster 973× faster N/A
Cuckoo vs TCF 6× faster 42× faster 100× faster
Cuckoo vs GQF 585× faster 6× faster 273× faster
Cuckoo vs Bloom 0.6× (slower) 1.4× faster N/A

DRAM-Resident (268M items, ~512 MiB)

Comparison Insert Query Delete
GPU vs CPU Cuckoo 583× faster 1504× faster N/A
Cuckoo vs TCF 1.9× faster 11.3× faster 35.3× faster
Cuckoo vs GQF 9.6× faster 2.6× faster 3.8× faster
Cuckoo vs Bloom 0.7× (slower) 1.0× (equal) N/A

Note

For a more comprehensive evaluation including additional systems and analysis, see the accompanying thesis.

  • CUDA Toolkit (>= 12.9)
  • C++20 compatible compiler
  • Meson build system (>= 1.3.0)
meson setup build
meson compile -C build

Benchmarks and tests are built by default. To disable them:

meson setup build -DBUILD_BENCHMARKS=false -DBUILD_TESTS=false
#include <CuckooFilter.cuh>

// Configure the filter: key type, fingerprint bits, max evictions, block size, bucket size
using Config = CuckooConfig<uint64_t, 16, 500, 256, 16>;

// Create a filter with the desired capacity
CuckooFilter<Config> filter(1 << 20);  // capacity for ~1M items

// Insert keys (d_keys is a device pointer)
filter.insertMany(d_keys, numKeys);

// Or use sorted insertion
filter.insertManySorted(d_keys, numKeys);

// Check membership
filter.containsMany(d_keys, d_results, numKeys);

// Delete keys
filter.deleteMany(d_keys, d_results, numKeys);

The CuckooConfig template accepts the following parameters:

Parameter Description Default
T Key type -
bitsPerTag Fingerprint size in bits (8, 16, 32) -
maxEvictions Maximum eviction attempts before failure 500
blockSize CUDA block size 256
bucketSize Slots per bucket (must be power of 2) 16
AltBucketPolicy Alternate bucket calculation policy XorAltBucketPolicy
evictionPolicy Eviction strategy (DFS or BFS) BFS
WordType Atomic type (uint32_t or uint64_t) uint64_t

For workloads that exceed single GPU capacity:

#include <CuckooFilterMultiGPU.cuh>

CuckooFilterMultiGPU<Config> filter(numGPUs, totalCapacity);
filter.insertMany(h_keys, numKeys);
filter.containsMany(h_keys, h_results, numKeys);
include/           - Header files
  CuckooFilter.cuh           - Main filter implementation
  CuckooFilterMultiGPU.cuh   - Multi-GPU implementation
  CuckooFilterIPC.cuh        - IPC support
  bucket_policies.cuh        - Alternative bucket policies
  helpers.cuh                - Helper functions
src/               - Example applications
benchmark/         - benchmarks
tests/             - Unit tests
scripts/           - Scripts for running/plotting benchmarks
联系我们 contact @ memedata.com