新型高质量哈希算法在M4上测速达到71GB/s。
New high-quality hash measures 71GB/s on M4

原始链接: https://github.com/Nicoshev/rapidhash

Rapidhash 号称是最快的哈希函数,速度超越了wyhash,并在速度、质量和兼容性方面表现出色。其吞吐量极高,在苹果M4 CPU上超过70GB/s,并在各种架构的基准测试中始终优于xxh3。 它设计为通用的哈希函数,能够很好地运行在AMD64和AArch64系统上,并使用标准编译器。它也同时支持C和C++编译。此外,rapidhash成功通过了所有SMHasher和SMHasher3测试,展现了其优异的质量。碰撞研究表明其碰撞概率接近理想值。使用15Gi键的实验表明,测得的碰撞率仅比统计预期高14%,表明其具有很强的均匀分布性。这与其速度相结合,使rapidhash成为一种高效的哈希解决方案。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 M4 上新型高性能哈希算法速度达 71GB/s (github.com/nicoshev) nicoshev11 1小时前 5 分 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:

原文

The fastest recommended hash function by SMHasher.

The fastest passing hash in SMHasher3.

rapidhash is wyhash' official successor, with improved speed, quality and compatibility.

Fast
Extremely fast for both short and large inputs.
Surpasses 70GB/s on Apple's M4 cpus.
The fastest hash function passing all tests in SMHasher.
The fastest hash function passing all tests in SMHasher3.

Universal
Optimized for both AMD64 and AArch64 systems.
Compatible with gcc, clang, icx and MSVC.
It does not use machine-specific vectorized or cryptographic instruction sets.
Prepared for both C and C++ compilation.

Excellent
Passes all tests in both SMHasher and SMHasher3.
Collision-based study showed a collision probability lower than wyhash and close to ideal.
Outstanding collision ratio when tested with datasets of 16B and 66B keys:

Input Len Nb Hashes Expected Nb Collisions
8 15 Gi 7.0 3
12 15 Gi 7.0 4
16 15 Gi 7.0 8
24 15 Gi 7.0 7
32 15 Gi 7.0 3
40 15 Gi 7.0 3
64 15 Gi 7.0 8
80 15 Gi 7.0 6
120 15 Gi 7.0 5
128 15 Gi 7.0 6
160 15 Gi 7.0 9
1024 15 Gi 7.0 7
4096 15 Gi 7.0 8
8 62 Gi 120.1 123
12 62 Gi 120.1 108
16 62 Gi 120.1 102
24 62 Gi 120.1 112
32 62 Gi 120.1 137
48 62 Gi 120.1 149
64 62 Gi 120.1 161
120 62 Gi 120.1 172
128 62 Gi 120.1 187
160 62 Gi 120.1 183

More results can be found in the collisions folder

Average latency when hashing keys of 4, 8 and 16 bytes

Hash M1 Pro M3 Pro Neoverse V2 AMD Turin
rapidhash 1.79ns 1.38ns 2.07ns 2.31ns
xxh3 1.92ns 1.50ns 2.15ns 2.35ns

Peak throughput when hashing files of 16Kb-2Mb

Hash M1 Pro M3 Pro M3 Ultra M4 Neoverse V2
rapidhash 47GB/s 57GB/s 61GB/s 71GB/s 37GB/s
xxh3 37GB/s 43GB/s 47GB/s 49GB/s 34GB/s

The benchmarking program can be found in the bench folder

Collision-based hash quality study

A perfect hash function distributes its domain uniformly onto the image.
When the domain's cardinality is a multiple of the image's cardinality, each potential output has the same probability of being produced.
A function producing 64-bit hashes should have a $p=1/2^{64}$ of generating each output.

If we compute $n$ hashes, the expected amount of collisions should be the number of unique input pairs times the probability of producing a given hash.
This should be $(n*(n-1))/2 * 1/2^{64}$, or simplified: $(n*(n-1))/2^{65}$.
In the case of hashing $15*2^{30}$ (~16.1B) different keys, we should expect to see $7.03$ collisions.

We present an experiment in which we use rapidhash to hash $77$ datasets of $15*2^{30}$ (15Gi) keys each.
For each dataset, the amount of collisions produced is recorded as measurement.
Ideally, the average among measurements should be $7.031$ and the results collection should be a binomial distribution.
We obtained a mean value of $8.026$, just $14$% over $7.031$.
Each dataset individual result and the collisions test program can be found in the collisions folder.
The default seed $0$ was used in all experiments.

联系我们 contact @ memedata.com