Memory safety is table stakes

原始链接: https://www.usenix.org/publications/loginonline/memory-safety-merely-table-stakes

Modern systems programming languages like Rust are revolutionizing software development by eliminating memory safety bugs like use-after-free errors. However, the reality is more complex: vast amounts of existing code are written in less safe languages, and external constraints often necessitate their continued use. Interacting with these "foreign" libraries through Foreign Function Interfaces (FFI) introduces significant risks, as vulnerabilities in foreign code can compromise the entire system's memory and type safety. Memory safety alone is insufficient; type safety, which enforces specific invariants over data types, is crucial. Violating either can lead to undefined behavior and break program invariants. To address this challenge, the authors introduce Omniglot, a framework designed to maintain both memory and type safety when interacting with untrusted foreign libraries. This framework aims to provide a robust solution for leveraging existing codebases while mitigating the risks associated with unsafe foreign code. The article emphasizes the fundamental link between memory and type safety and offers insight into Omniglot's functionality.

A Hacker News discussion revolves around a USENIX article arguing that memory safety is now a basic requirement for programming languages. One commenter, timewizard, argues against this, stating that Rust's `unsafe` construct and lack of mature auditing tools weaken its memory safety guarantees. He also emphasizes the ongoing improvements in memory safety within languages like C++ and the paramount importance of performance. xvedejas counters that Safe Rust provides memory safety guarantees, analogous to Python's safety despite being built on C. AlotOfReading points out that Rust offers built-in lints and tools like `cargo-geiger` to detect and eliminate `unsafe` code, offering more control than C++. noisem4ker attributes the continued use of unsafe languages to inertia. The central debate centers on whether memory safety should be considered fundamental and the practicality of achieving it across all software development contexts.

原文

The past few years has seen a massive success story for systems programming. Entire categories of bugs that used to plague systems programmers—like use-after-free, data races, and segmentation faults—have begun to completely disappear. The secret to this new reality is a set of systems programming languages chief among them Rust—whose powerful type systems are able to constructively eliminate these kind of bugs; if it compiles, then it’s correct … or at least, will not contain use-after-free or other memory safety errors. These languages are gaining widespread adoption across industry [1, 2, 3] and academia [4, 5, 6, 7] alike, and are adopted for ambitious and critical systems, such as new high-performance compute libraries, distributed storage systems, and operating systems.

Despite these successes, the reality is a little more complicated. There is a great amount of software already written in other languages. And often, external constraints such as certification requirements or developer expertise force even new components to be written in other, less safe languages. Therefore, an important feature for any new systems programming language is its ability to easily and efficiently interact with existing foreign libraries. Developers building new systems can leverage existing native cryptography, mathematics, graphical, and other libraries immediately, without waiting for them to first be ported to new languages and without suffering a performance hit. They can incrementally migrate existing systems, replacing components in a legacy C/C++ codebase with safe alternatives [1].

Unfortunately, interacting with foreign code can result in subtle, but nonetheless devastating safety violations that re-introduce the very concerns many developers are trying to avoid by using type-safe languages. For example, foreign libraries may themselves include memory safety vulnerabilities, such as OpenSSL’s infamous Heartbleed bug [8]. When foreign code is invoked through a Foreign Function Interface (FFI), it runs in the same address space and with the same privileges as the host language. Therefore, vulnerabilities in native libraries can affect the entire host program and break memory or type safety guarantees.

While we are quick to reach for tools like process isolation, a system call boundary, or a client-server model to solve this, these tools often only help uphold memory safety, which is only half the battle. Each language has specific invariants over its types (like permissible values) which its compiler relies on when producing code. Ensuring that all types are correctly inhabited goes beyond memory safety; it requires type safety. In fact, memory and type safety are intertwined: a violation of one can easily break the other. And finally, some program invariants—like whether references can be aliased—require reasoning about both type and memory safety. Interactions with untrusted code or between different languages that violate these invariants can lead to undefined behavior and, in turn, break other safety properties.

We present Omniglot [9], a new approach and framework we have developed that can maintain both memory and type safety across interactions with untrusted foreign libraries, in different settings: we implement prototypes for Linux userspace applications and a Rust-based kernel. In this article, we want to focus on illustrating the fundamental link between memory and type safety through an example of interacting with a foreign library and provide an intuition on how the Omniglot framework works.