为什么异或 eax, eax?
Why xor eax, eax?

原始链接: https://xania.org/202512/01-xor-eax-eax

## x86汇编中XOR的惊人普及 令人惊讶的是,`xor`指令在典型的x86 Linux系统中执行频率名列前茅。这并非由于加密或图形例程,而是编译器的一种巧妙优化。 编译器经常使用`xor eax, eax`作为一种紧凑的、两字节的方法来将`eax`寄存器设置为零,而不是更明确的五字节`mov eax, 0`。这节省了代码空间并提高了指令缓存效率。 此外,现代x86 CPU *识别* 这种“零化惯用语”并可以进一步优化它——有效地将`xor`指令从执行流水线中完全移除,从而实现零执行周期! 使用32位`eax`寄存器也方便地将64位`rax`寄存器的上半部分32位清零,在返回长值时。 虽然看似晦涩,但这种技术展示了编译器和CPU如何协作以从代码中榨取最大性能,突出了看似微小优化的强大力量。

一个黑客新闻的讨论围绕着汇编指令“xor eax, eax”,用于将EAX寄存器清零。一个关键点是,在x86-64架构下,写入'e'寄存器(如eax)会自动将上部32位清零,使得这个XOR操作成为将寄存器设为零的一种高效方式。 讨论引发了对学习64位x86汇编资源的兴趣,特别是针对熟悉32位版本用户的指南。一个有趣的轶事分享了一个Quake 3比赛选手,绰号“xor eax, eax”。 最后,一位用户报告称在Android上查看链接页面时Chrome崩溃,而其他人没有遇到相同问题,这突显了潜在的浏览器兼容性问题。
相关文章

原文

Written by me, proof-read by an LLM.
Details at end.

In one of my talks on assembly, I show a list of the 20 most executed instructions on an average x86 Linux desktop. All the usual culprits are there, mov, add, lea, sub, jmp, call and so on, but the surprise interloper is xor - “eXclusive OR”. In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine. It’s surprising then, that a Linux machine just minding its own business, would be executing so many.

That is, until you remember that compilers love to emit a xor when setting a register to zero:

We know that exclusive-OR-ing anything with itself generates zero, but why does the compiler emit this sequence? Is it just showing off?

In the example above, I’ve compiled with -O2 and enabled Compiler Explorer’s “Compile to binary object” so you can view the machine code that the CPU sees, specifically:

31 c0           xor eax, eax
c3              ret

If you change GCC’s optimisation level down to -O1 you’ll see:

b8 00 00 00 00  mov eax, 0x0
c3              ret

The much clearer, more intention-revealing mov eax, 0 to set the EAX register to zero takes up five bytes, compared to the two of the exclusive OR. By using a slightly more obscure instruction, we save three bytes every time we need to set a register to zero, which is a pretty common operation. Saving bytes makes the program smaller, and makes more efficient use of the instruction cache.

It gets better though! Since this is a very common operation, x86 CPUs spot this “zeroing idiom” early in the pipeline and can specifically optimise around it: the out-of-order tracking systems knows that the value of “eax” (or whichever register is being zeroed) does not depend on the previous value of eax, so it can allocate a fresh, dependency-free zero register renamer slot. And, having done that it removes the operation from the execution queue - that is the xor takes zero execution cycles! It’s essentially optimised out by the CPU!

You may wonder why you see xor eax, eax but never xor rax, rax (the 64-bit version), even when returning a long:

In this case, even though rax is needed to hold the full 64-bit long result, by writing to eax, we get a nice effect: Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free. So xor eax, eax sets all 64 bits to zero.

Interestingly, when zeroing the “extended” numbered registers (like r8), GCC still uses the d (double width, ie 32-bit) variant:

Note how it’s xor r8d, r8d (the 32-bit variant) even though with the REX prefix (here 45) it would be the same number of bytes to xor r8, r8 the full width. Probably makes something easier in the compilers, as clang does this too.

xor eax, eax saves you code space and execution time! Thanks compilers!

See the video that accompanies this post.


This post is day 1 of Advent of Compiler Optimisations 2025, a 25-day series exploring how compilers transform our code.

This post was written by a human (Matt Godbolt) and reviewed and proof-read LLMs and humans.

Support Compiler Explorer on Patreon or GitHub, or by buying CE products in the Compiler Explorer Shop.

联系我们 contact @ memedata.com