我们仍然选择C++(而不是Rust)用于新的数据库开发。
We still chose C++ (instead of Rust) for new database development

原始链接: https://www.eloqdata.com/blog/2024/10/26/why-cpp

## EloqKV:为什么在2024年选择C++? EloqData的新型分布式数据库EloqKV基于一种名为Data Substrate的新架构构建,并且令人惊讶的是,主要用C++编写。虽然Rust和Go等较新的语言在系统编程中很受欢迎,但该团队出于长期数据库开发的优势,故意选择了C++。 他们的决定取决于三个关键因素:强大的现有数据库资源和库生态系统、对低级硬件和操作系统集成的广泛支持,以及C++经过验证的寿命和成熟的工具链——这对于预计能持续数十年的软件至关重要。他们从Hadoop(Java/JVM)等项目中吸取了教训,其中性能开销被证明存在问题,导致其他人将其重写为C++。 EloqData采用模块化设计,允许在未来集成Rust等语言,以获得益处。他们优先构建坚实的基础,以避免技术债务,即使这意味着初始速度较慢。最终,他们认为C++为复杂且不断发展的数据库系统提供了性能、兼容性和长期可维护性的最佳平衡。这种方法的成功将随着时间的推移而确定,但该团队对他们的选择充满信心。

## EloqDB 选择 C++:总结 EloqData 解释了他们主要使用 C++ 开发 EloqDB 的决定,尽管 Rust 越来越受欢迎。他们强调与现有 C/C++ API 的互操作性,以及利用这些语言构建的成熟数据库技术的重要性。他们还认为,现代 C++ (C++20/23) 可以通过谨慎的编码实践和工具来减轻许多传统的安全问题,例如内存不安全。 评论区的讨论强调了一个核心矛盾:虽然 Rust 提供强大的编译时安全保证,但经验丰富的 C++ 开发者可以通过纪律和现代技术实现类似的结果。有人担心 Rust 的学习曲线对现有团队来说过于陡峭,以及其生态系统的成熟度。 许多评论者指出,C++ 的灵活性虽然是潜在错误的来源,但也允许进行细致的控制。另一些人认为,Rust 的安全优势超过了 C++ 的复杂性,特别是考虑到长期的可维护性和安全性。最终,EloqData 的选择似乎是由实际考虑因素驱动的——现有的专业知识、集成需求以及对 C++ 发展能力的信念——而不是根本拒绝 Rust 的优势。
相关文章

原文

We have recently introduced EloqKV, our distributed database product built on a cutting-edge architecture known as Data Substrate. Over the past several years, the EloqData team has worked tirelessly to develop this software, ensuring it meets the highest standards of performance and scalability. One key detail we’d like to share is that the majority of EloqKV’s codebase was written in C++.

Had we launched our product a decade ago, using C++ would have been an obvious and unremarkable choice. However, it's 2024, and the landscape has changed. Today, languages like Rust, Zig, and other type-safe options like Golang are considered modern and trendy for systems programming. So, when we chose C++, a language that some might view as outdated or less "cool", or even bug-prone and "unsafe", it’s natural for people to wonder why.

In this article, we’d like to share the thought process behind our decision to choose C++ over some of the newer, more fashionable languages, the historical lessons we drew inspiration from, and the upcoming progress we expect in the future.

Choosing a Programming Language Is Important

Selecting the right programming language is crucial for any software project, but it becomes even more significant for complex systems software such as databases. The choice of language influences various aspects, including performance, ease of development, and maintainability. In a domain where efficiency and reliability are paramount, the programming language serves as the foundation upon which the entire system is built.

For databases, the implications of this choice are profound. A database must be capable of handling vast amounts of data while providing fast query responses and ensuring data integrity. These requirements necessitate a language that not only excels in performance but also allows for scalable and efficient development practices. Additionally, databases often undergo continuous development and enhancement over decades, making maintainability a critical factor. A well-chosen language can simplify the process of updating and expanding the software's features over time, ensuring that it remains relevant and effective in an ever-evolving technological landscape.

Consider the Hadoop big data stack, which is predominantly built on the Java Virtual Machine (JVM). While Java and JVM ecosystems have been one of the most popular programming language families and were lauded for their portability and rich features, in retrospect, this choice may not have been without controversy. The performance and memory overhead of the JVM, particularly issues related to garbage collection, has caused numerous challenges for developers. Indeed, RedPanda and ScyllaDB are notable examples of rewriting mature, widely-used Java-based frameworks—Kafka and Cassandra, respectively—in C++ from scratch to avoid the JVM penalties.

Another important consideration is the popularity of the programming language and the availability of developers familiar with it. For instance, Spark and Kafka are developed using Scala, while Couchbase and Rabbitmq are in Erlang. Although these languages offer robust features and capabilities, they are not as widely adopted as other programming languages. This relative lack of popularity can create challenges when it comes to larger-scale developer engagement and finding experienced programmers. Toolchain support is generally not on par with more popular programming languages. A less common language may result in increased difficulty in recruiting talent, slowing down development processes and limiting community support for troubleshooting and innovation.

By the late 2010s, Rust emerged as one of the leading programming languages for developing database software. Newer projects such as TiDB, RisingWave, DataFusion, and NeonDB are prominent examples that leverage Rust's capabilities to build efficient and high-quality databases. Notably, RisingWave even published a blog post detailing their decision to discard ten months of work in C++ to rewrite their entire codebase in Rust. Given that EloqData began its journey around 2021, when Rust was already well-established as a robust programming language with excellent features for building safe and performant databases, one might wonder why we opted for C++ instead.

Building a Database from Scratch in C++ in 2024

When we began our project, we were keenly aware that Rust was a highly competitive language for building the foundations of our database. Our eventual decision to choose C++ was based on three main factors.

The first strength of C/C++ lies in its database ecosystem support. Most existing and popular databases are developed in C/C++, providing a wealth of resources and innovations we could leverage. Our Data Substrate technology aims to create a unified, modular architecture that can capitalize on these existing resources while avoiding the need to reinvent the wheel. Although Rust offers good interoperability with C/C++, its memory management model and certain safety restrictions can complicate integration with many established projects.

Another advantage of C++ is its extensive support for foundational libraries. Since most operating systems and lower-level drivers are written in C or C++, bindings for these languages are often the native and best-supported APIs. Performance-focused libraries for IO and networking, such as DPDK, RDMA and liburing, as well as memory management tools such as mimalloc, are developed with C/C++ and provide native support. In contrast, other languages typically require additional layers to effectively utilize these libraries. We anticipate that this trend will continue, with newer hardware and OS abstractions favoring C/C++ support first and foremost.

The third advantage of C++ is its longevity and mature toolchain. Infrastructure software often requires continuous updates and improvements over several decades. For instance, Oracle Database is over 45 years old, while MySQL and PostgreSQL have been around for around 30 years. Even relatively newer systems like Cassandra, MongoDB, and Redis are over 15 years old. To develop a reliable infrastructure solution, we need to be prepared to maintain the codebase for potentially half a century. A lot can change in the tech world over such a long period—consider that 20 years ago, Perl was a highly popular language, and Delphi was more widely used than Python.

When building long-lasting software, it’s crucial to consider the long-term survivability of the programming language, such as continued improvements on the compilers, up-to-date library developments, and modern IDE, debugger and profiler support. In this respect, C++ is a much safer bet. Its extensive history, active development community, and proven resilience over time give us confidence that it will continue to be relevant and well-supported for decades to come.

Of course, C++ comes with its share of legacies that can present challenges compared to many modern languages. To maximize productivity on C++ projects requires a certain level of discipline. While we won’t elaborate on the myriad best practices we’ve implemented to mitigate some of C++'s shortcomings — since these are well documented and widely discussed elsewhere — we acknowledge that effective use of the language demands a strong commitment to coding standards and testing methodologies. In particular, the most harsh arguments against using C++, i.e. memory unsafeness, can be significantly mitigated when developing with a certain modern subset of the C++ language.

Going Forward

At EloqData, we strongly adhere to a modular design philosophy, as we are committed to building a lasting system that will support decades of continued improvements. We recognize that effective API interface design is crucial for enhancing software development productivity and maintainability. This principle is not only reflected in the overall architecture of Data Substrate, which accommodates various query and storage engines, but is also embedded throughout our software development process. We anticipate that innovations will continue to emerge—such as improved memory allocators, more efficient RPC libraries, and optimized hash-table implementations—and we aim to leverage these innovations as they become available in the future.

It is relatively straightforward for us to experiment with other programming languages in our projects when appropriate. We are eager to replace certain modules with components implemented in type-safe languages like Rust where it makes sense. Rust is an exceptional language with a strong following in the systems community, and we aim to utilize it more in many of our upcoming projects.

In contrast to many startup companies that emphasize rapid iteration, quick feedback loops, and fast prototyping, EloqData has taken a different approach. We place a stronger emphasis on doing things right from the onset to avoid future technical debts. While this focus may slow us down a bit, we believe it is a necessary investment. However, it’s worth noting that avoiding future debts is futile if the product and technology lack a viable future to begin with. Ultimately, whether we made the right choice will take time to determine. Regardless, we take pride in the decisions we’ve made and look forward to seeing how our efforts can help our customers tackle their most challenging data problems.

联系我们 contact @ memedata.com