Welcome to LWN.netThe following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net! |
|
The gccrs project is an ambitious effort started in 2014 to implement a Rust compiler within The GNU Compiler Collection (GCC). Even though the task is far from complete, progress has been made since LWN's previous coverage, according to reports from the project. Meanwhile, another hybrid and more mature approach to GCC Rust code generation is available in rustc_codegen_gcc.
In 2022, the goal of gccrs was to be included in the GCC 13 release, but this expectation has not been met. The team is currently aiming for inclusion in GCC 14 (likely to be released by mid-2024), judging from its November 2023 monthly report.
On October 13, Arthur Cohen gave a talk titled "The road to compiling the standard library with gccrs" (the video is available) at EuroRust 2023. In his talk, Cohen gave a little bit of general background on gccrs but mainly focused on what work has recently gone into compiling the Rust standard library, and why gccrs cannot do it yet.
Gccrs targets a specific Rust version, 1.49, released at the end of 2020, rather than trying to keep up with the rapidly developing Rust language. This version was chosen because it is the latest version predating support for const generics, which were introduced in 1.50. However, Cohen expressed regret in his talk that the project has not been able to ignore const generics after all, because they are in use in the standard library, even in 1.49. They were "stabilized" for general availability in 1.50, but there is internal standard library usage in earlier versions as well. Const generics have since been fully implemented, however, and this issue is no longer a hindrance.
A lot of care is being put into gccrs not becoming a "superset" of Rust, as Cohen put it. The project wants to make sure that it does not create a special "GNU Rust" language, but is trying instead to replicate the output of rustc — bugs, quirks, and all. Both the Rust and GCC test suites are being used to accomplish this.
The Rust standard library consists of a number of "crates", which is what software packages are called in Rust lingo. Cohen explained that gccrs is working on supporting compilation of the two most important ones: core and alloc. The core crate is the foundation of the standard library, implementing features such as primitive types and macros; alloc deals with heap-memory allocation and various container types.
Currently gccrs is not able to compile these crates because of various shortcomings, such as incorrect behavior in macro-name resolution and incomplete support for decorator macros. The lack of a borrow checker (discussed more below), while not blocking compilation, means that the compiler cannot properly check the safety of the code. An additional hurdle is formed by missing compiler intrinsics in GCC. Rustc uses some intrinsics provided by LLVM that are not supported by GCC, which means the gccrs team needs to spend time implementing them in GCC.
Another talk (slides available) was given by Pierre-Emmanuel Patry at the GNU Tools Cauldron in September 2023. He mainly focused on progress toward inclusion in GCC 14 as well as macros, which seem to be an interrelated issue because the approach to implement procedural macros necessitates changes to the GCC build system. Procedural macros are function-like macros that emit token streams rather than plain source code text like C or C++ macros. They are implemented in a built-in crate called proc_macro. Such macros are notoriously tricky to implement but also powerful; they form the core of features such as #[attribute] and #[derive()] decorators, and can be used to create compile-time evaluated, domain-specific languages.
In the GNU Cauldron talk, Patry also mentioned that gccrs had more than 800 commits waiting to be upstreamed to GCC.
Taking advantage of the GCC ecosystem
Cohen's EuroRust talk highlighted that one of the major reasons gccrs is being developed is to be able to take advantage of GCC's security plugins. There is a wide range of existing GCC plugins that can aid in debugging, static analysis, or hardening; these work on the GCC intermediate representation. Gccrs intends to support workflows where developers could reuse these plugins with Rust code. As an example, Cohen mentioned that "C programmers have been forgetting to close their file descriptors for 40 years, [so] there are a lot of plugins to catch that". Gccrs intends to enable Rust programmers to use existing GCC plugins and static analyzers to catch bugs in unsafe code.
Cohen listed a few things that gccrs is already useful for. According to him, the Sega Dreamcast homebrew community uses gccrs to create new games for the Dreamcast gaming console, and GCC plugins can already be used to perform static analysis on unsafe Rust code. The Dreamcast community's interest stems from the fact that rustc's LLVM backend does not support the Hitachi SH-4 architecture of the console, whereas GCC does; even in its incomplete state, gccrs is helpful for this embedded use case.
Additionally, he mentioned that the gccrs effort has revealed some unspecified language features, such as Deref and macro name resolution; in response, the project has been able to contribute additions to the Rust specification. Currently Rust does not have a formal specification, but work is underway to create one, as proposed in RFC 3355. "The gccrs people want to be a part" of that effort, Cohen said.
One more reason for gccrs to exist is Rust for Linux, the initiative to add Rust support to the Linux kernel. Cohen said the Linux kernel is a key motivator for the project because there are a lot of kernel people who would prefer the kernel to be compiled only by the GNU toolchain.
Things under development
Gccrs is still missing a lot of core functionality. Cohen listed several important features, such as async/await, LLVM intrinsics that are absent in GCC, and the format_args!() macro used by output macros such as println!(). The borrow checker, which is a compiler subsystem that enforces the reference rules of the language, is a key Rust feature that gccrs will need to provide. Cohen briefly mentioned that the likely solution is a separate borrow-checker project called Polonius, and said Gccrs will most likely have it integrated a few months down the line. Contributor Jakub Dupak has made progress on this in the past few months.
Polonius is a library that implements a borrow checker that is semantically equivalent to the (not quite flawlessly implemented) checker in rustc today, by approaching the computation of reference lifetimes with a radically different algorithm. Polonius aims to one day resolve the shortcomings and corner cases of rustc's current borrow checker. Once it has matured, rustc itself will likely also adopt it in the future.
According to the gccrs monthly report for November 2023, work has begun on the format_args!() macro. This helper macro is responsible for constructing parameters for other string-formatting macros. It involves the Display and Debug traits, and is a necessity for preparing arguments that are later passed to other macros such as format!() and println!(). Without format_args!(), a Rust program cannot create formatted output; this feature is thus necessary before gccrs can compile a "Hello, World" program.
For a deep dive on format_args!(), see Mara Bos's recent blog post.
rustc_codegen_gcc
There is another GCC-based Rust project, called rustc_codegen_gcc, that is more mature and more limited in scope compared to gccrs. It is not a full implementation of a Rust compiler from the ground up; instead, it uses the libgccjit library to hook into an API of the LLVM backend used by rustc. This approach performs much of the compilation with rustc and turns to GCC at a later stage. Despite the "JIT" (just in time) in the name of the library, rustc_codegen_gcc is intended for ahead-of-time compilation. Its stated primary goal is to enable Rust code generation on platforms unsupported by LLVM.
As of October 2023, rustc_codegen_gcc can now compile Rust for Linux without any additional patches. Over the past year, the project seems to have made good progress on many fronts; for example, it has added support for SIMD (single instruction, multiple data) operations and link-time optimization, both of which were earlier identified as causes for test failures. Cohen deferred to rustc_codegen_gcc at several points in his EuroRust talk, encouraging attendees to use it instead of gccrs for now. It is, in fact, already upstreamed into the Rust language repository.
Rust for Linux
Currently, the Rust for Linux project provides documentation for using either rustc or rustc_codegen_gcc to build Rust code for the kernel. The kernel also contains documentation for the minimal supported versions of various build tools, including compilers. For rustc, the version is considered an exact match, rather than a minimum. The currently stated supported rustc version is 1.73.0 (released in October 2023), much more recent than the 1.49 targeted by gccrs. Rust for Linux support is also a stated goal for gccrs, but because of this significant discrepancy, it seems to be quite far off.
Gccrs has progressed nicely in the year since we last looked at it: the
repository has well over 3,000 commits since January 1, 2023.
However, it is not yet in a usable state for almost any practical purpose,
since as a complete implementation from the ground up, gccrs is much more
ambitious in scope than rustc_codegen_gcc. The latter is already
merged to the upstream Rust repository and sees real-world use with Rust
for Linux. We are not yet in a world with multiple implementations of a
compiler for the Rust language, but it is getting closer.
Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.
(Log in to post comments)