WAL-RUS:用 Rust 重写的 PostgreSQL 备份工具 WAL-G
WAL-RUS: a Rust Rewrite of WAL-G for PostgreSQL Backups

原始链接: https://clickhouse.com/blog/walrus-postgres-backups-in-rust

ClickHouse Cloud 推出了 **WAL-RUS**,这是一款基于 Rust 开发的开源工具,用于 PostgreSQL 的备份和 WAL 归档。尽管广泛使用的 WAL-G 非常可靠,但其基于 Go 的架构依赖于垃圾回收机制,会导致不可预测的“锯齿状”内存模式和较高的虚拟内存消耗。这种不可预测性迫使运维人员必须过度配置资源,而这些资源本可用于数据库本身。 WAL-RUS 的设计旨在通过以下优势解决这些运维挑战: * **可预测的内存使用**:利用 Rust 的手动内存管理,WAL-RUS 将峰值虚拟内存消耗比 WAL-G 降低了 70% 以上。 * **守护进程架构**:它通过保持持久连接来实现连续、高性能的流式传输,避免了频繁创建新进程带来的开销。 * **无缝兼容性**:WAL-RUS 完全兼容 WAL-G 的配置和归档格式,支持轻松迁移。 WAL-RUS 专为资源受限的环境打造,在不牺牲功能的前提下提供了稳定、高效的性能。它目前支持 ClickHouse Cloud 的托管 PostgreSQL 服务,并作为一个欢迎社区协作的开源项目持续发展。

抱歉。
相关文章

原文

WAL-RUS Blog Banner.jpg

Postgres backups are one of those pieces of infrastructure that should be boring. They sit in the background, continuously archiving WAL files, uploading backups, and making sure that when something goes wrong, recovery is possible.

At ClickHouse Cloud, this path is critical. WAL archival is what allows us to preserve durability and recoverability for our Postgres services. WAL-G has been a strong and reliable tool for this job. It is mature, battle-tested, and has served the Postgres community well.

But as we pushed Postgres into tighter and more resource-constrained environments, we started hitting a specific problem: memory predictability.

That led us to build WAL-RUS, an open-source Rust-based implementation of Postgres backup and WAL archival tooling, designed for predictable memory-efficiency and WAL-G compatibility.

WAL-G is written in Go, a garbage-collected language. While Go makes it easy to build reliable infrastructure software, garbage-collected runtimes make memory usage harder to predict, especially for long-running services like WAL archival.

The challenge isn't just resident memory (memory actively being used), but also virtual memory (memory reserved from the operating system). Go's runtime manages its own memory pools and can reserve significantly more virtual memory than the application is actively using. As workloads change, this footprint can fluctuate in ways that are difficult to reason about and tune. The Go GC guide describes this as a characteristic "sawtooth" pattern, where memory usage grows between garbage collection cycles and then falls after collection, making it difficult to predict peak memory consumption and provision resources efficiently.

For operators, that creates a simple but important problem: how much memory should be reserved for backup infrastructure?

The answer is usually "more than necessary" to avoid unexpected memory pressure. Memory budgeted for WAL archival is memory that cannot be confidently allocated to Postgres itself for queries, shared buffers, and page cache. Postgres runs most reliably with overcommit disabled, making virtual memory a valuable resource modern software often leaves as an afterthought.

WAL-G remains a proven and reliable tool, but as we scaled Postgres into increasingly resource-constrained environments, we wanted a backup system with a more predictable memory profile, delivering the same functionality while consuming fewer resources and making capacity planning simpler.

We weren't looking for new functionality. WAL-G is a mature and reliable backup system we’re happy to contribute to. Our goal was to preserve core functionality and compatibility while providing a more predictable resource profile.

WAL-RUS is a Rust implementation of Postgres backup and WAL archival tooling built to address the operational challenges we encountered with memory predictability and resource usage.

1. Predictable Resource Usage: Unlike garbage-collected runtimes, Rust gives us direct control over memory allocation and concurrency. WAL-RUS uses bounded worker pools and carefully controlled concurrency, making memory consumption easier to reason about and reducing the need to over-provision resources for backup infrastructure.

2. Built for Continuous WAL Archival: WAL-RUS prioritizes WAL-G’s daemon architecture. Instead of spawning a new process and establishing new connections for every WAL file, it maintains persistent object storage connections that continuously process archival requests in the background.

3. Optimized for Streaming Workloads: WAL archival is fundamentally a streaming problem: read WAL files, compress them, and upload to object storage. WAL-RUS minimizes unnecessary buffering and data copies throughout this pipeline, allowing it to perform the same archival work with a smaller and more predictable memory footprint.

4. WAL-G Compatibility: WAL-RUS uses the same WALG_ configuration variables as WAL-G and is continuously tested for interoperability. WAL-G can read archives generated by WAL-RUS, and WAL-RUS can read archives generated by WAL-G, making migration straightforward for existing deployments.

To evaluate WAL-RUS, we built a reproducible benchmark that compares WAL-RUS, WAL-G, and pgBackRest under a sustained, WAL-heavy PostgreSQL workload. The benchmark continuously generates WAL, archives it to S3, and measures how efficiently each archiver uses memory while keeping up with WAL generation. To ensure a fair comparison, all three tools were configured with four concurrent archival workers.

Memory efficiency was the primary motivation behind WAL-RUS, making memory consumption the first metric we examined.

image (23).png

WAL-G reached nearly 2.8 GB of peak virtual memory during the benchmark, while WAL-RUS remained below 1 GB, a reduction of more than 70%. WAL-RUS also maintained a stable memory profile throughout the run, making its resource requirements easier to reason about in production environments. pgBackRest deserves credit here as well. As a C-based implementation without a garbage-collected runtime, it has tight control over memory allocation.

image (24).png

Both WAL-RUS and WAL-G consistently maintained minimal backlog throughout the benchmark, demonstrating they could keep up with the workload being generated. pgBackRest accumulated a larger backlog during periods of intense WAL activity, illustrating their architectural tradeoffs between daemon-based and process-based archival throughput.

image (25).png

CPU utilization is less important, but good to keep an eye on. Usage is comparable between all three, primarily computing LZ4 compression.

WAL-RUS was built to solve a practical problem: delivering reliable PostgreSQL backups and WAL archival with a smaller, more predictable resource footprint. By combining Rust's explicit memory management with a daemonized streaming architecture, WAL-RUS achieves archival throughput comparable to WAL-G while significantly reducing memory consumption.

Importantly, WAL-RUS remains fully compatible with existing WAL-G archives and configuration, making adoption straightforward for existing deployments. WAL-RUS introduces support for using Postgres 17’s wal summaries for incremental backups, which we’re working to upstream to WAL-G.

We didn't build WAL-RUS because WAL-G lacked functionality. WAL-G remains a mature and battle-tested project. We built WAL-RUS because we wanted tighter control over resource usage while preserving compatibility with the ecosystem that WAL-G helped establish.

As we continue to develop and harden the project, we plan to make WAL-RUS the default backup and WAL archival mechanism for our managed Postgres offering in ClickHouse Cloud.

The project is open source, and we welcome feedback, testing, and contributions!

联系我们 contact @ memedata.com