基于 GCC 的 Rust 编译器的进展
Progress toward a GCC-based Rust compiler

原始链接: https://lwn.net/SubscriberLink/954787/41470c731eda02a4/

经许可并给予来源适当的信用后,可以出于非商业目的复制这些材料。 特别禁止未经授权的发布或利用。 请联系 Eklektix 以获取有关商业用途、广告或促销考虑的许可。 Eklektix 在 Facebook 和 Twitter 等社交媒体网站上保持活跃。 Eklektix 人员发布的公开帖子和评论归 Eklektix 所有。 Eklektix 人员发布的所有材料均根据 Creative Commons Attribution NonCommercial Share Alike 3.0 United States License 获得许可。 其他人发布的材料受其各自的许可证管辖。 除非另有明确授权,否则此处包含的任何信息均不得解释为授予任何知识产权的任何许可或权利。

然而,根据 Cohen 在 gccrs 上的演讲,主要目标之一是避免创建一个单独的 Rust 子集(称为“GNU Rust”),这意味着特定于这种 Rust 风格的更改或修改。 相反,他们的目标是模仿 rustc 编译器的输出,以最大程度地减少习惯于仅处理 rustc 的程序员的潜在困惑。 因此,尽管输出可能存在一些差异,但其目的是与 rustc 紧密结合,以减少主要或完全使用 rustc 构建的库和应用程序的重大更改数量。 虽然实现 100% 平等可能具有挑战性或不太可能,但在可行和理想的情况下保持兼容性将有助于保持使用不同风格或 Rust 变体的平台和环境的连续性和稳定性。 最终,此类事业的成功在于在满足实际需求与保持理论和概念一致性之间取得平衡。
相关文章

原文

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

December 15, 2023

This article was contributed by Ronja Koistinen

The gccrs project is an ambitious effort started in 2014 to implement a Rust compiler within The GNU Compiler Collection (GCC). Even though the task is far from complete, progress has been made since LWN's previous coverage, according to reports from the project. Meanwhile, another hybrid and more mature approach to GCC Rust code generation is available in rustc_codegen_gcc.

In 2022, the goal of gccrs was to be included in the GCC 13 release, but this expectation has not been met. The team is currently aiming for inclusion in GCC 14 (likely to be released by mid-2024), judging from its November 2023 monthly report.

On October 13, Arthur Cohen gave a talk titled "The road to compiling the standard library with gccrs" (the video is available) at EuroRust 2023. In his talk, Cohen gave a little bit of general background on gccrs but mainly focused on what work has recently gone into compiling the Rust standard library, and why gccrs cannot do it yet.

Gccrs targets a specific Rust version, 1.49, released at the end of 2020, rather than trying to keep up with the rapidly developing Rust language. This version was chosen because it is the latest version predating support for const generics, which were introduced in 1.50. However, Cohen expressed regret in his talk that the project has not been able to ignore const generics after all, because they are in use in the standard library, even in 1.49. They were "stabilized" for general availability in 1.50, but there is internal standard library usage in earlier versions as well. Const generics have since been fully implemented, however, and this issue is no longer a hindrance.

A lot of care is being put into gccrs not becoming a "superset" of Rust, as Cohen put it. The project wants to make sure that it does not create a special "GNU Rust" language, but is trying instead to replicate the output of rustc — bugs, quirks, and all. Both the Rust and GCC test suites are being used to accomplish this.

The Rust standard library consists of a number of "crates", which is what software packages are called in Rust lingo. Cohen explained that gccrs is working on supporting compilation of the two most important ones: core and alloc. The core crate is the foundation of the standard library, implementing features such as primitive types and macros; alloc deals with heap-memory allocation and various container types.

Currently gccrs is not able to compile these crates because of various shortcomings, such as incorrect behavior in macro-name resolution and incomplete support for decorator macros. The lack of a borrow checker (discussed more below), while not blocking compilation, means that the compiler cannot properly check the safety of the code. An additional hurdle is formed by missing compiler intrinsics in GCC. Rustc uses some intrinsics provided by LLVM that are not supported by GCC, which means the gccrs team needs to spend time implementing them in GCC.

Another talk (slides available) was given by Pierre-Emmanuel Patry at the GNU Tools Cauldron in September 2023. He mainly focused on progress toward inclusion in GCC 14 as well as macros, which seem to be an interrelated issue because the approach to implement procedural macros necessitates changes to the GCC build system. Procedural macros are function-like macros that emit token streams rather than plain source code text like C or C++ macros. They are implemented in a built-in crate called proc_macro. Such macros are notoriously tricky to implement but also powerful; they form the core of features such as #[attribute] and #[derive()] decorators, and can be used to create compile-time evaluated, domain-specific languages.

In the GNU Cauldron talk, Patry also mentioned that gccrs had more than 800 commits waiting to be upstreamed to GCC.

Taking advantage of the GCC ecosystem

Cohen's EuroRust talk highlighted that one of the major reasons gccrs is being developed is to be able to take advantage of GCC's security plugins. There is a wide range of existing GCC plugins that can aid in debugging, static analysis, or hardening; these work on the GCC intermediate representation. Gccrs intends to support workflows where developers could reuse these plugins with Rust code. As an example, Cohen mentioned that "C programmers have been forgetting to close their file descriptors for 40 years, [so] there are a lot of plugins to catch that". Gccrs intends to enable Rust programmers to use existing GCC plugins and static analyzers to catch bugs in unsafe code.

Cohen listed a few things that gccrs is already useful for. According to him, the Sega Dreamcast homebrew community uses gccrs to create new games for the Dreamcast gaming console, and GCC plugins can already be used to perform static analysis on unsafe Rust code. The Dreamcast community's interest stems from the fact that rustc's LLVM backend does not support the Hitachi SH-4 architecture of the console, whereas GCC does; even in its incomplete state, gccrs is helpful for this embedded use case.

Additionally, he mentioned that the gccrs effort has revealed some unspecified language features, such as Deref and macro name resolution; in response, the project has been able to contribute additions to the Rust specification. Currently Rust does not have a formal specification, but work is underway to create one, as proposed in RFC 3355. "The gccrs people want to be a part" of that effort, Cohen said.

One more reason for gccrs to exist is Rust for Linux, the initiative to add Rust support to the Linux kernel. Cohen said the Linux kernel is a key motivator for the project because there are a lot of kernel people who would prefer the kernel to be compiled only by the GNU toolchain.

Things under development

Gccrs is still missing a lot of core functionality. Cohen listed several important features, such as async/await, LLVM intrinsics that are absent in GCC, and the format_args!() macro used by output macros such as println!(). The borrow checker, which is a compiler subsystem that enforces the reference rules of the language, is a key Rust feature that gccrs will need to provide. Cohen briefly mentioned that the likely solution is a separate borrow-checker project called Polonius, and said Gccrs will most likely have it integrated a few months down the line. Contributor Jakub Dupak has made progress on this in the past few months.

Polonius is a library that implements a borrow checker that is semantically equivalent to the (not quite flawlessly implemented) checker in rustc today, by approaching the computation of reference lifetimes with a radically different algorithm. Polonius aims to one day resolve the shortcomings and corner cases of rustc's current borrow checker. Once it has matured, rustc itself will likely also adopt it in the future.

According to the gccrs monthly report for November 2023, work has begun on the format_args!() macro. This helper macro is responsible for constructing parameters for other string-formatting macros. It involves the Display and Debug traits, and is a necessity for preparing arguments that are later passed to other macros such as format!() and println!(). Without format_args!(), a Rust program cannot create formatted output; this feature is thus necessary before gccrs can compile a "Hello, World" program.

For a deep dive on format_args!(), see Mara Bos's recent blog post.

rustc_codegen_gcc

There is another GCC-based Rust project, called rustc_codegen_gcc, that is more mature and more limited in scope compared to gccrs. It is not a full implementation of a Rust compiler from the ground up; instead, it uses the libgccjit library to hook into an API of the LLVM backend used by rustc. This approach performs much of the compilation with rustc and turns to GCC at a later stage. Despite the "JIT" (just in time) in the name of the library, rustc_codegen_gcc is intended for ahead-of-time compilation. Its stated primary goal is to enable Rust code generation on platforms unsupported by LLVM.

As of October 2023, rustc_codegen_gcc can now compile Rust for Linux without any additional patches. Over the past year, the project seems to have made good progress on many fronts; for example, it has added support for SIMD (single instruction, multiple data) operations and link-time optimization, both of which were earlier identified as causes for test failures. Cohen deferred to rustc_codegen_gcc at several points in his EuroRust talk, encouraging attendees to use it instead of gccrs for now. It is, in fact, already upstreamed into the Rust language repository.

Rust for Linux

Currently, the Rust for Linux project provides documentation for using either rustc or rustc_codegen_gcc to build Rust code for the kernel. The kernel also contains documentation for the minimal supported versions of various build tools, including compilers. For rustc, the version is considered an exact match, rather than a minimum. The currently stated supported rustc version is 1.73.0 (released in October 2023), much more recent than the 1.49 targeted by gccrs. Rust for Linux support is also a stated goal for gccrs, but because of this significant discrepancy, it seems to be quite far off.

Gccrs has progressed nicely in the year since we last looked at it: the repository has well over 3,000 commits since January 1, 2023. However, it is not yet in a usable state for almost any practical purpose, since as a complete implementation from the ground up, gccrs is much more ambitious in scope than rustc_codegen_gcc. The latter is already merged to the upstream Rust repository and sees real-world use with Rust for Linux. We are not yet in a world with multiple implementations of a compiler for the Rust language, but it is getting closer.



Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.

(Log in to post comments)
联系我们 contact @ memedata.com