``` 在 Rust GPU 内核驱动上进行卡丁车比赛 ```

``` 在 Rust GPU 内核驱动上进行卡丁车比赛 ```
Racing karts on a Rust GPU kernel driver

原始链接: https://www.collabora.com/news-and-blog/news-and-events/racing-karts-on-a-rust-gpu-kernel-driver.html

## Tyr：适用于 Arm Mali GPU 的高性能 Rust 驱动程序 Tyr，一个为 Arm Mali GPU 设计的 Rust 驱动程序，正在快速开发中。虽然上游代码尚未完全准备就绪，但下游原型已经展示了显著的功能——实现了与现有 C 驱动程序相当的性能。最初，Tyr 的重点是建立用户模式和内核模式驱动程序（UMD 和 KMD）之间的通信。现在，它能够成功执行 GPU 任务，运行完整的 GNOME 桌面环境（包括 Weston），甚至可以运行 SuperTuxKart 等 3D 游戏。关键里程碑包括成功提交简单任务、渲染旋转立方体，并最终运行复杂的应用程序。这个在 Rock 5B 板上测试的原型证明了 Rust 在构建高性能 GPU 内核驱动程序方面的可行性。虽然它尚未稳定到可以日常使用，但它为正在进行的上游开发和 Rust 抽象的实验提供了一个重要的测试平台。该项目最终目标是将 Tyr 完全集成到上游，为现代、安全和高性能的 GPU 驱动程序生态系统铺平道路。

Collabora开发者取得一项重要里程碑：在基于Rust的GPU内核驱动程序上运行卡丁车游戏。该驱动程序于七月宣布，已经能够运行GNOME和游戏，预示着快速进展。该项目专注于ARM特定设备，作为标准图形API（如Vulkan和OpenGL）与GPU硬件之间的层。一位评论者询问了优势，并提供了一个指向Collabora博客文章的链接，其中详细介绍了这项工作的内核/用户空间组织。虽然讨论中没有明确说明性能提升，但这一成就表明了潜在的速度改进和更广泛的兼容性，使2026年可能成为该领域进步的激动人心的一年。

原文

A few months ago, we introduced Tyr, a Rust driver for Arm Mali GPUs that continues to see active development upstream and downstream. As the upstream code awaits broader ecosystem readiness, we have focused on a downstream prototype that will serve as a baseline for community benchmarking and help guide our upstreaming efforts.

Today, we are excited to share that the Tyr prototype has progressed from basic GPU job execution to running GNOME, Weston, and full-screen 3D games like SuperTuxKart, demonstrating a functional, high-performance Rust driver that matches C-driver performance and paves the way for eventual upstream integration!

GNOME on Tyr

Setting the stage

I previously discussed the relationship between user-mode drivers (UMDs) and kernel-mode drivers (KMDs) in one of my posts about how GPUs work. Here's a quick recap to help get you up to speed:

One thing to be understood from the previous section is that the majority of the complexity tends to reside at the UMD level. This component is in charge of translating the higher-level API commands into lower-level commands that the GPU can understand. Nevertheless the KMD is responsible for providing key operations such that its user-mode driver is actually implementable, and it must do so in a way that fairly shares the underlying GPU hardware among multiple tasks in the system.

While the UMD will take care of translating from APIs like Vulkan or OpenGL into GPU-specific commands, the KMD must bring the GPU hardware to a state where it can accept requests before it can share the device fairly among the UMDs in the system. This covers power management, parsing and loading the firmware, as well as giving the UMD a way to allocate GPU memory while ensuring isolation between different GPU contexts for security.

This was our initial focus for quite a few months while working on Tyr, and testing was mainly done through the IGT framework. These tests would mainly consist of performing simple ioctls() against the driver and subsequently checking whether the results made sense.

By the way, those willing to further understand the relationship between UMDs and KMDs on Linux should watch a talk given at Kernel Recipes by my colleague Boris Brezillon on the topic!

Submitting a single job

Once the GPU is ready to accept requests and userspace can allocate GPU memory as needed, the UMD can place all the resources required by a given workload in GPU buffers. These can be further referenced by the command buffers containing the instructions to be executed, as we explain in the excerpt below:

With the data describing the model and the machine code describing the shaders, the UMD must ask the KMD to place this in GPU memory prior to execution. It must also tell the GPU that it wants to carry out a draw call and set any state needed to make this happen, which it does by means of building VkCommandBuffers, which are structures containing instructions to be carried out by the GPU in order to make the workload happen. It also needs to set up a way to be notified when the workload is done and then allocate the memory to place the results in.

In this sense, the KMD is the last link between the UMD and the GPU hardware, providing the necessary APIs for job submission and synchronization. It ensures that all the drawing operations built at the userspace level can actually reach the GPU for execution. It is the KMD's responsibility to ensure that jobs only get scheduled once its dependencies have finished executing. It also has to notify (in other words, signal to) the UMD when jobs are done, or the UMD won't really know when the results are valid.

Additionally, before Tyr can execute a complex workload consisting of a vast amount of simultaneous jobs, it must be able to execute a simple one correctly, or debugging will be an unfruitful nightmare. For this matter, we devised the simplest job we could think of: one that merely places a single integer in a given memory location using a MOV instruction on the GPU. Our IGT test then blocks until the KMD signals that the work was carried out.

Reading that memory location and ensuring that its contents match the constant we were expecting shows that the test was executed successfully. In other words, it shows that we were able to place the instructions in one of the GPU's ring buffers and have the hardware iterator pick it up and execute correctly, paving the way for more complex tests that can actually try to draw something.

The test source code for this dummy job is here.

Drawing a rotating cube

With job submission and signalling working, it was time to attempt to render a scene. We chose kmscube, which draws a single rotating cube on the screen, as the next milestone.

It was a good candidate owing to its simple geometry and the fact that it is completely self-contained. In other words, no compositor is needed and rendering takes place in a buffer that's directly handed to the display (KMS) driver.

Getting kmscube to run would also prove that we were really enforcing the job dependencies that were set by the UMD or we would get visual glitches. To do so, we relied on a slightly updated version of the Rust abstractions for the DRM scheduler posted by Asahi Lina a few years ago. The result was a rotating cube that was rendered at the display's refresh rate.

kmscube on Tyr

Using offscreen rendering lets us go even faster, jumping from 30 or 60fps to more than 500 frames per second, matching the performance of the C driver. That's a lot of frames being drawn!

Can it render the whole UI?

The natural progression would be to launch Weston or GNOME. As there is quite a lot going on when a DE like GNOME is running; we were almost expecting it not to work at first, so it came as a huge surprise when GNOME's login page was rendered.

In fact, you can log in to GNOME, open Firefox, and...watch a YouTube video:

YouTube on GNOME on Tyr

Running vkcube under weston also just works!

vk cube on Weston on Tyr

Can it render a game?

The last 3D milestone is running a game or another 3D-intensive application. Not only would that put the GPU through a demanding workload, but it would also allow us to gauge the KMD's performance more accurately. Again, the game is rendered correctly and is completely playable, without any noticeable hiccups or other performance issues, so long as it is run on full screen. Unfortunately, windowed mode still has some glitches: it is a prototype, after all.

Supertuxkart on Tyr

Why is this important?

It's important to clarify what this means and how this plays into the long-term vision for the project.

In fact, it's easier to start by what we are not claiming with this post: Tyr is not ready to be used as a daily-driver, and it will still take time to replicate this upstream, although it is now clear that we will surely get there. And as a mere prototype, it has a lot of shortcuts that we would not have in an upstream version, even though it can run on top of an unmodified (i.e., upstream) version of Mesa.

That said, this prototype can serve as an experimental driver and as a testbed for all the Rust abstraction work taking place upstream. It will let us experiment with different design decisions and gather data on what truly contributes to the project's objective. It is a testament that Rust GPU KMDs can work, and not only that, but they can perform on par with their C counterparts.

Needless to say, we cannot make any assumptions about stability on an experimental driver, it might very well lock up and lose your work after some time, so be aware.

Finally, this was tested on a Rock 5B board, which is fitted with a Rockchip RK3588 system-on-chip and it will probably not work for any other device at the moment. Those with this hardware at hand should feel free to test our branch and provide feedback. The source code can be found here. Make sure to enable CONFIG_TYR_DRM_DEPS and CONFIG_DRM_TYR. Feel free to contribute to Tyr by checking out our issue board!

Below is a video showcasing the Tyr prototype in action. Enjoy!