我为 Emacs 构建了一个 GPU 后端。

我为 Emacs 构建了一个 GPU 后端。
I built a GPU back end for Emacs

原始链接: https://en.andros.dev/blog/4b707a03/how-i-built-a-gpu-backend-for-emacs/

作者出于好奇，试图探究为何现代高性能编辑器仍依赖传统的渲染技术，于是发起了一个将 Emacs 文本渲染从 CPU 转移到 GPU 的项目。通过设计三层抽象——一个中立的基于 C 的绘图层以及特定平台的驱动程序（macOS 使用 Metal，Linux 使用 OpenGL），作者在不修改 Emacs 核心重绘引擎（`xdisp.c`）的情况下，成功实现了 GPU 加速渲染。该项目产出了多项“GPU 独占”功能，包括缓冲区内的视频播放、平滑的交叉淡入淡出动画以及基于着色器的光标特效。虽然性能测试显示，在小屏幕上 GPU 渲染相比基于 CPU 的 Cairo 库并无显著优势，但在 4K 分辨率下，它提供了巨大的吞吐量提升，并支持更丰富的视觉媒体。最终，该贡献因 Emacs 社区禁止使用人工智能生成代码的政策而被拒绝。然而，作者认为此次尝试是成功的。除了技术成就外，该项目还引发了关于软件开发中人工智能伦理、软件自由本质以及当前文本编辑器架构局限性的深入而透明的讨论。作者继续以个人分支的形式维护该项目，将学到的经验置于能否并入主线之上。

开发者“andros”为 Emacs 创建了一个 GPU 加速后端，用 macOS 上的 Metal 和 Linux 上的 OpenGL/EGL 取代了传统的基于 CPU 的渲染管线。该项目的目标是显著提升图形界面性能，特别是在当前基于 GTK 的构建版本常出现延迟的高分辨率 4K 显示器上。该项目在 Emacs 社区引起了广泛关注，用户们迫切希望能测试其性能提升效果。尽管一些用户批评该项目的文档读起来像 AI 生成的文本，但开发者澄清这些内容均为人工编写；开发者指出英语是其第二语言，仅使用 AI 进行润色和编辑。开发者计划在继续解决 Linux 相关的小问题后发布更多版本，最终目标是争取将该成果整合进 Emacs 的上游代码库中。

A few months ago I became obsessed with a silly question: why does my Emacs, on a laptop with a perfectly capable GPU, draw all of its text using the CPU? And that led to others: why can't I play a video inside a buffer? Why can't I have animated cursor effects? Why can't I cross-fade between buffers? I needed to satisfy my curiosity, so I started digging.

I started reading the code, with an AI as my companion. I discovered that every glyph, every underline, every scroll is recomputed and repainted by the processor. Emacs's redisplay engine (xdisp.c) was born in an era when there was no other option, and it is tuned to the millimeter for exactly that. And nobody had managed to slip a GPU underneath without rewriting half of Emacs... until recently.

So I decided to try. What began as a weekend experiment ended up being a complete display backend for macOS with Metal, a second backend for GNU/Linux with OpenGL, a video player inside the buffer, shader-based cursor effects, and a debate of more than a hundred messages on the Emacs developers' mailing list that ranged from cairo's performance to software freedom and the ethics of artificial intelligence.

This article exists because I feel like telling the story, and it might be useful for future implementations. At the end I leave the lessons I take away and a conclusion that is not the one I expected when I started.

A note of honesty up front: I built this project with the help of an LLM as a copilot, from start to finish. I say it here just as I said it in public when I was asked. I will come back to it, because it turned out to be the most important plot twist of the whole journey.

Phase 1: the architecture decision

Anyone's first instinct would be to open the macOS code, the Cocoa backend (nsterm.m), and start replacing CoreGraphics calls with Metal calls. It is the most direct path. And it is exactly what I decided not to do.

The problem with that approach is that it ties you to one platform. If I write "Emacs with Metal", I have an Emacs for Mac and nothing else. I needed to write a display-backend abstraction that would let me have one driver per platform. So I sketched a three-layer architecture on a Post-it:

flowchart TD
    X["Redisplay engine
(xdisp.c, untouched)"]:::core --> P["src/gfxterm.c
Neutral drawing policy (plain C)"]:::policy
    P --> D["src/gfxdrv.h
Driver interface (~25 operations)"]:::iface
    D --> M["src/mtlterm.m (macOS)
Metal driver"]:::mtl
    D --> G["src/glterm.c (GNU/Linux, X11)
OpenGL ES / EGL driver"]:::gl

    classDef core fill:#37474F,stroke:#263238,stroke-width:2px,color:#fff
    classDef policy fill:#00897B,stroke:#00695C,stroke-width:2px,color:#fff
    classDef iface fill:#7CB342,stroke:#558B2F,stroke-width:2px,color:#fff
    classDef mtl fill:#8E24AA,stroke:#6A1B9A,stroke-width:2px,color:#fff
    classDef gl fill:#D32F2F,stroke:#B71C1C,stroke-width:2px,color:#fff

The idea is that all the drawing logic (how a glyph string is composed, where the wavy underline goes, how an image is clipped to the window, how scrolling works) lives in a plain-C file, without a single platform-specific line. And that each platform only has to implement a small contract: about 25 primitive operations of the kind "upload this texture", "draw this quad", "present the frame". That contract is gfxdrv.h. The first driver would be Metal, in mtlterm.m.

The golden rule, the one I imposed on myself and never broke once: xdisp.c is not touched. The redisplay engine computes the glyph matrices exactly as always; I only hook into the drawing interface that already exists. If the experiment went wrong, Emacs was still Emacs.

In hindsight, this was the best decision of the whole project!

Phase 2: the Metal backend and the tyranny of the pixel

With the architecture clear, I dove into Metal. The technical plan was that of any modern text renderer:

Rasterize each glyph just once, via CoreText, into a grayscale texture (a glyph atlas in R8 format).
Draw the text as textured quads that sample that atlas.
Upload images (PNG, JPEG, SVG, GIF) as textures.
Composite the whole frame on the GPU, in a persistent texture, and present it.

On paper, two afternoons. In practice, weeks. The reason has a name: pixel parity.

My success criterion was not "it looks good". It was that the result be identical, pixel for pixel, to the original Cocoa backend. Same binary, with the GPU on and off, and the diff between the two captures had to be practically zero. I built a harness that launched the same Emacs twice, loaded an identical scenario, captured the screen on both and compared them with Python and PIL. The bar landed around 0.055% of differing pixels in the baseline, and anything that strayed from there was a bug to hunt down.

That harness was relentless, and it surfaced a collection of details I had to look at under a magnifying glass:

The ink weight. CoreText and my shader applied antialiasing differently.
The relief colors (the 3D borders of buttons and the mode-line) were not coming out right.
There was an off-by-one in the vertical position of the glyphs.

We should not overlook that the way of drawing is completely different, both in approach and in architecture. That makes the bugs subtle and hard to detect.

Phase 3: the cursor that froze

Of all the bugs, the one that taught me the most was the cursor one.

I wanted animated cursor effects: a ring that expands when it jumps, a comet-like trail, that kind of visual candy the GPU does almost for free. I implemented them as a compositor layer on top of the frame, without touching the buffer content underneath. They worked perfectly... while I was typing. The moment I stopped touching the keyboard, the animation froze halfway. The culprit was Apple's synchronization mechanism, CADisplayLink: it dies at rest, Emacs's event loop does not feed it when there is no user input. While I typed, the keyboard events pumped the run loop and everything ran fine; the moment I stopped, there was no one to move the clock.

The solution was to stop depending on the system and move everything continuous to a Lisp timer. Cursor, buffer cross-fade and video, everything advances from a single "pump" in Emacs Lisp that ticks periodically and tells the driver "advance everything you have and present at most once". Later I unified the three timers into one with auto-pacing (60 Hz when there is a fade, 30 Hz otherwise, and it shuts itself off when there is nothing to animate).

Workload	Stock (X/cairo)	GPU (OpenGL)	Ratio
Line scroll	530 fps	487 fps	0.92x
Page scroll	297 fps	296 fps	1.00x
Full-frame redraw	247 fps	294 fps	1.19x
Typing	1857 fps	1311 fps	0.71x
Image scroll	1359 fps	1239 fps	0.91x

Workload	cairo (CPU)	GPU	Speedup
Line scroll	117 fps	240 fps	2.05x
Page scroll	102 fps	124 fps	1.22x
Full-frame redraw	66 fps	121 fps	1.84x
Typing	238 fps	1766 fps	7.4x
Image scroll	115 fps	1328 fps	11.5x