Bzip2 依赖包已从 C 语言完全切换到 Rust 语言。
Bzip2 crate switches from C to 100% Rust

原始链接: https://trifectatech.org/blog/bzip2-crate-switches-from-c-to-rust/

bzip2 0.6.0 版本用一个更快、更易维护的 Rust 实现(通过 libbz2-rs-sys crate)替换了之前的 C 实现。尽管 bzip2 的使用现在有限,但许多协议和库仍然需要它,因此这个更新非常重要。 Rust 实现的压缩和解压缩性能都有所提升,基准测试显示速度显著加快。一个主要优势是简化了交叉编译,特别是针对 WebAssembly、Windows 和 Android,消除了编译 C 代码的复杂性。此 Rust 实现默认情况下也避免了符号冲突,因为它不导出符号。 代码经过了严格的测试,包括使用 MIRI 测试不安全代码,以及安全审计,其中发现并解决了一个轻微的逻辑错误。总的来说,这项现代化工作提供了一个更快、更可靠、更容易集成的 bzip2 库,允许开发者无需进一步考虑即可使用它。

一个用Rust编写的全新bzip2库旨在取代Linux发行版中现有的C语言实现。讨论集中在这个转换的可行性和好处上。关键点包括:该库提供兼容的C ABI以实现无缝集成,动态链接的考虑以及潜在的性能改进。 一些评论者对“用Rust重写”的基准测试表示怀疑,并提到了过去uutils出现的问题。其他人强调了独立验证性能声明的重要性。讨论还涉及到在zstd时代bzip2的相关性,一些人认为它仍然用于遗留数据,并且更快速、更安全的实现很有价值。还提到了安全优势,例如解决现有的CVE和减少缓冲区溢出。有人质疑资源分配是否更倾向于“UI界面美化”而不是核心性能,以及周期节省对电池续航时间和云计算成本的实际影响。
相关文章

原文

Today we published bzip2 version 0.6.0, which uses our rust implementation of the bzip2 algorithm, libbz2-rs-sys, by default. The bzip2 crate is now faster and easier to cross-compile.

The libbz2-rs-sys crate can also be built as a C dynamic library, if you have a C project that would benefit from these improvements.

Why though?

Why bother working on this algorithm from the 90s that sees very little use today? The thing is that many protocols and libraries still need to support bzip2 to be compliant with their specification, so many project still, deep down in their dependency tree, depend on bzip2. We've used our experience from zlib-rs to modernize the bzip2 implementation.

We've previously written about the implementation details of libbz2-rs-sys in "Translating bzip2 with c2rust", now let's look at the benefits of this work.

Improved performance

Our rust implementation generally outperforms the C implementation, though there are a couple of cases where we only match C performance. We are not aware of any cases where we are substantially slower.

For compression, we are a fair amount faster. For bzip2, the level indicates how much working memory is used. It doesn't influence performance by much, and for sample3.ref level 1 already allocates more memory than the file is large, so higher levels are irrelevant.

namec (cpu cycles)rust (cpu cycles)Δ
sample3.ref (level 1)38.51M ± 77.03K33.53M ± 90.52K-14.87%
silesia-small.tar (level 1) 3.43G ± 2.06M 3.00G ± 6.31M-14.30%
silesia-small.tar (level 9) 3.47G ± 4.86M 3.17G ± 4.43M- 9.66%

For decompression there is a bit more of a spread, but we again see significant speedups across the board.

namec (cpu cycles)rust (cpu cycles)Δ
sample3.bz2 2.53M ± 30.08K 2.42M ± 8.95K- 4.48%
sample1.bz2 9.63M ± 40.44K 8.86M ± 10.64K- 8.63%
sample2.bz220.47M ± 55.28K19.02M ± 36.13K- 7.67%
dancing-color.ps.bz287.46M ± 481.02K83.16M ± 548.86K- 5.17%
re2-exhaustive.txt.bz2 1.89G ± 12.29M 1.76G ± 12.64M- 7.65%
zip64support.tar.bz2 2.32G ± 12.09M 2.11G ± 15.42M-10.00%

One caveat is that on our macOS benchmark machine we occasionally see some lower numbers for decompression. We are not sure what causes the variance, and measuring performance on macOS in a detailed way has turned out to be difficult (e.g there is no tool like perf to automate performance tracking that we could get to work).

Enabling cross-compilation

Cross-compilation of a rust project with C dependencies often works out of the box (because the cc crate tries to handle it), but when it doesn't the errors can be hard to debug. Similarly linking to system libraries can cause confusing and hard-to-reproduce issues.

For bzip2, compilation to webassembly has long been an issue. By removing the C dependency and using rust code instead, the complications of compiling C just disappear: cross-compilation just works. Also building for windows or android just works. Besides providing a better experience for users, this change is also a major maintenance win.

Symbols are not exported (by default)

Using a C dependency means that its symbols are exported (so that a rust extern block can find them). The exported names can conflict when another dependency declares the same symbols.

By default, libbz2-rs-sys does not export its symbols, which means that it will never conflict with other dependencies. If your rust project does need to emit the symbols, there is a feature flag to enable exporting symbols.

Run tests with miri

Writing a performant bzip2 implementation requires some unsafe code, and replicating the C interface in rust requires a lot more. Luckily we are able to run that code under MIRI.

More importantly, higher-level libraries or applications that use bzip2 can now run with MIRI as well.

Audit

The audit found one logic bug (an off-by-one error), and fixed some limitations in our fuzzer. Beyond that, there were no significant findings (yay!). We do want to thank the reviewers from Radically Open Security, specifically Christian Reitter, for sharing their fuzzing experience. The full audit report can be found here.

Conclusion

The bzip2 crate is faster now. You can go back to never having to think about it.

Thanks



联系我们 contact @ memedata.com