QBE – Compiler Backend – 1.3

原始链接: https://c9x.me/compile/release/qbe-1.3.html

QBE 1.3 是该项目迄今为止最重要的版本,带来了显著的性能提升、架构改进以及更广泛的平台支持。 **主要亮点:** * **性能:** 通过全新的优化通道(包括 GVN/GCM 和循环优化),QBE 在 Coremark 基准测试中现已达到商业编译器 63% 以上的性能,在 Hare 测试套件中更是取得了 33% 的显著提升。 * **更智能的指令选择:** 全新的元编程工具 `mgen` 允许 QBE 将“类 Lisp”的中间语言(IL)模式编译为地道的 C 代码。这取代了手动逻辑,为指令选择提供了一种更强大、可扩展性更高的方法。 * **平台与链接:** QBE 现在通过 `-t amd64_win` 标志正式支持 Windows ABI。此外,编译器现在可以利用新的 `DYNCONST` 标志进行间接全局访问,从而生成与位置无关的代码(共享对象)。 此版本反映了各方的协作成果,采纳了 Roland Paterson-Jones、Scott Graham 和 Michael Forney 的贡献,在保持 QBE 核心理念的同时,实现了基础设施的现代化。

Hacker News 最新 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 QBE – 编译器后端:版本 1.3 (c9x.me) 36 分,作者 birdculture,1 小时前 | 隐藏 | 过往 | 收藏 | 1 条评论 帮助 supergarfield 13 分钟前 [–] 我真的很想喜欢 QBE,但像这样的声明代码块让我感觉它更像是 1970 年代的 Unix 代码,而不是一个现代且易于修改的软件: int t, x, r, rf, rt, nr; bits rs; Ins *i, *i1; Mem *m; Ref *ra[4]; 我认为这让阅读和理解随后的 300 行函数中那些相对微妙的代码变得困难,从而阻碍了一些用户的使用。(我知道,这是技术问题) 回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

QBE 1.3 - Release notes

qbe-1.3

QBE 1.3 took a while to cook, but it is the most significant release since 1.0 with around 7k new lines of code and 1.5k deleted ones. In addition to the usual bug fixes, QBE gained a new and original IL matching algorithm, new optimizations from Roland Paterson-Jones, Scott Graham added support for the Windows ABI, and I implemented a plan suggested by Michael Forney to have QBE produce position-independent code (as in shared objects). QBE is teamwork, and I am happy to thank all the contributors to this release. In the rest of the notes we will take a closer look at some of the meat of the release.

Faster

Every once in a while, the QBE fly stings an outstanding programmer. This time, Roland was the victim! Roland suggested we look at the coremark benchmark to give us a concrete but simple playground to optimize. Initial measurements with qbe-1.2 revealed that we were far behind our “70% of gcc -O2” goal, closer to 40%. We decided to address this for the 1.3 release.

An early inspection of profiling data revealed that the performance gap boiled down in large part to how two functions are treated: ee_isdigit and crcu8. Interestingly, these functions are not really idiomatic C; for instance, ee_isdigit is typically inlined textually, uses && instead of & and skips the superfluous ternary operator; as for CRC, it is best implemented using a pre-computed table. This observation was a bit disappointing because, aside from QBE's lack of inlining, it did not point us to a general source of overhead that would apply to other compilation loads. On the other hand, it is expected that CPU-bound code spends a majority of time in compact code sections.

Nonetheless, we implemented numerous optimizations (GVN/GCM, loop optimization, if-elimination, CFG simplification, …) that we could try on both coremark and more realistic usages like the Hare test suite. In the end, we decided to keep only a subset of vetted passes and now score more than 63% of the performance of commercial compilers on vanilla coremark. Notably, we excluded inlining from the optimizations set to postpone solving its incompatibility with the streaming per-function compilation model of QBE. Modifying the coremark benchmark to inline the ee_isdigit function and use a simpler branch-free implementation of crcu8 makes QBE reach its 70% goal. The new optimizations should also benefit Hare users: I measured a 33% improvement on the Hare test suite against qbe-1.2 (1.7s vs 2.6s).

Smarter

Since its early days, QBE used a bottom-up tree-numbering algorithm inspired by Ken Thompson's Plan9 C compiler (see 5.5. Addressability). The algorithm is fairly generic but has some subtleties in dealing elegantly with associativity and commutativity of arithmetic operators. It has been a long-standing goal of mine to implement a metaprogramming solution to this problem. QBE 1.3 sees the realization of this goal.

A new OCaml tool called mgen is used to compile lispy IL patterns into idiomatic C code that matches them. The mgen tool will look for special comment blocks containing IL patterns and inline the matching C code right below these blocks. The generated C code is designed to look idiomatic in qbe and works similarly to the hand-written logic pre 1.3. See the isel.c file for the current use of mgen.

In more detail, instruction DAGs are matched by following a numbering approach like in Ken Thompson's compiler. Then, mgen associates each number with a bitset indicating which of the toplevel user patterns are matched by the current IL node (temporary); the most suitable pattern can then be selected by handwritten logic. Patterns can include variables which can be collected by running a matcher program. These programs are also generated by mgen in a simple bytecode language which the runmatch() function can interpret.

I expect that in the future mgen is used to simplify instruction selection in more backends and maybe even to recognize IL patterns such as bit rotations in optimization passes.

Nicer

For the 1.3 release, QBE also stung Scott Graham. Scott generously upstreamed his implementation of the Windows ABI, originally found in a fun derivative work. The assembly generated by QBE remains AT&T syntax and is best compiled by the mingw assembler, although I have not tried it myself on Windows. Compiling for Windows is now as simple as passing -t amd64_win to QBE.

Last but not least, QBE improved its support for position-independent code and is now able to link smoothly with and even produce shared objects on most targets. The main blocker to date has been the lack of support for indirect access to globals (e.g., global offset table on ELF). This is now possible at the level of IL through the support of a new extern “dynamic constant” flag (DYNCONST in the IL spec). For example, to access a variable dlvar from a dynamically-linked library, one would use

function w $load() {
@start
	%v =w load extern $dlvar
	ret %v
}

And, in case you wonder, we use the oxymoron “dynamic constant” to speak about address symbols (constant through execution) that can only be known at runtime (dynamic) because they are allocated by either the runtime or the dynamic linker rather than by the regular linking phase of compilation.

Thanks for reading this far and happy hacking.

联系我们 contact @ memedata.com