RISC-V 条件移动

RISC-V 条件移动
RISC-V Conditional Moves

原始链接: https://www.corsix.org/content/riscv-conditional-moves

AArch64 指令集拥有强大的 `csel` 指令，用于高效的条件操作——直接评估类似 `rd = cond ? rs1 : rs2` 的表达式。然而，RISC-V 缺乏直接等效指令。虽然基本指令 (`slt`) 和扩展 (`Zbb`, `Zicond`) 提供有限的条件功能，但设计上仍然缺少通用的条件选择指令。 RISC-V 的设计者有意省略了条件移动指令，而是倾向于使用短向前分支，并期望乱序核心可以将这些分支*融合*为条件移动指令以提高性能。然而，作者认为这种融合存在问题。 RISC-V 严格的内存一致性模型定义了从分支延伸到*所有*后续存储的广泛“控制依赖”。将分支融合为条件移动指令会消除这种依赖关系，可能导致不正确的行为。为了保持兼容性，融合后的指令需要*保留*一些类似分支的特性，从而使实现复杂化，并质疑融合的好处。基本上，在 RISC-V 中，简单地用条件移动指令替换分支在语义上并不等效。

## RISC-V 指令融合与条件移动 - 摘要一篇关于 RISC-V 条件移动的文章引发了 Hacker News 的讨论，重点关注 *指令融合* 的复杂性。指令融合旨在将多个指令合并为一个内部操作 (µop)，以提高性能，但引发了对跨核心维护内存一致性的担忧。核心争论在于，在不引入内存屏障（确保操作的正确顺序）的情况下，激进的融合（例如将条件分支转换为无分支代码）是否有效。专家指出，执行融合的核心拥有有限的视野，必须保证其他核心看到的内存顺序，而编译器在程序级别进行优化时，不一定需要担心这一点。虽然 RISC-V 的 Zicond 扩展 *可以* 添加条件移动，但讨论强调问题更广泛——适用于任何具有强大内存模型的指令集架构 (ISA)（如 x86 和潜在的 ARM64）。目前，真正的指令融合在商业可用的 RISC-V 处理器中很大程度上是假设性的，尽管 Ventana 和 Tenstorrent 即将推出的核心预计将实现它。SiFive 目前采用的优化技术 *看起来* 相似，但并非完全的 µop 融合。

原文

I'm a big fan of aarch64's csel family of instructions. A single instruction can evaluate rd = cond ? rs1 : f(rs2), where cond is any condition code and f is any of f₀(x) = x or f₁(x) = x+1 or f₂(x) = ~x or or f₃(x) = -x. Want to convert a condition to a boolean? Use f₁ with rs1 == rs2 == x0. Want to convert a condition to a mask? Use f₂ with rs1 == rs2 == x0. Want to compute an absolute value? Use f₃ with rs1 == rs2. It is pleasing that the composition of f₁ and f₂ is f₃. I could continue espousing, but hopefully you get the idea.

RISC-V is the hot new thing, but it lacks a direct equivalent to csel. Some cases of converting conditions to booleans are possible with the slt family of instructions in the base instruction set. Beyond that, a few special cases are implemented by instruction set extensions: Zbb adds min and max instructions which are a particular pattern of compare and select, and Zicond adds czero.eqz and czero.nez which again are particular patterns of compare and select. But the general case? Considered and rejected, as per this direct quote from The RISC-V Instruction Set Manual Volume I Version 20250508:

We considered but did not include conditional moves or predicated instructions, which can effectively replace unpredictable short forward branches. Conditional moves are the simpler of the two, but are difficult to use ...

That quote hints at short forward branches being the recommended alternative. It doesn't quite go as far as to say that out-of-order cores are encouraged to perform macro fusion in the frontend to convert short forward branches back into conditional moves (when possible), but it is commonly taken to mean this, and some SiFive cores implement exactly this fusion.

Continuing to quote from The RISC-V Instruction Set Manual Volume I Version 20250508, the introductory text motivating Zicond also mentions fusion:

Using these [Zicond] instructions, branchless sequences can be implemented (typically in two-instruction sequences) without the need for instruction fusion, special provisions during the decoding of architectural instructions, or other microarchitectural provisions.

One of the shortcomings of RISC-V, compared to competing instruction set architectures, is the absence of conditional operations to support branchless code-generation: this includes conditional arithmetic, conditional select and conditional move operations. The design principles of RISC-V (e.g. the absence of an instruction-format that supports 3 source registers and an output register) make it unlikely that direct equivalents of the competing instructions will be introduced.

The design principles mentioned in passing mean that czero.eqz has slightly odd semantics. Assuming rd ≠ rs2, the intent is that these two instruction sequences compute the same thing:

Base instruction set	With Zicond
`mv rd, x0 beq rs2, x0, skip_next mv rd, rs1 skip_next:`	`czero.eqz rd, rs1, rs2`

The whole premise of fusion is predicated on the idea that it is valid for a core to transform code similar to the branchy code on the left into code similar to the branch-free code on the right. I wish to cast doubt on this validity: it is true that the two instruction sequences compute the same thing, but details of the RISC-V memory consistency model mean that the two sequences are very much not equivalent, and therefore a core cannot blindly turn one into the other.

To see why, consider this example, again from The RISC-V Instruction Set Manual Volume I Version 20250508:

Control dependencies behave differently from address and data dependencies in the sense that a control dependency always extends to all instructions following the original target in program order.
  lw x1, 0(x2)
  bne x1, x0, next
next:
  sw x3, 0(x4)
Even though both branch outcomes have the same target, there is still a control dependency from the memory operation generated by the first instruction in this snippet to the memory operation generated by the last instruction. This definition of control dependency is subtly stronger than what might be seen in other contexts (e.g., C++), but it conforms with standard definitions of control dependencies in the literature.

The general point highlighted by this example is that every branch (or indirect jump) instruction imposes a syntactic control dependency on every store instruction anywhere after it in program order. If a branch is converted to a conditional move, there is no longer a syntactic control dependency. There can instead be an address or data dependency, but this only applies to stores which use the result of the conditional move, whereas the syntactic control dependency applied to all stores. In other words, not equivalent.

TLDR: If RISC-V cores want to perform fusion of short forward branches into conditional moves (to mitigate the lack of conditional moves in the instruction set), the resultant fused instruction needs to retain some branch-like properties to avoid violating the memory model.

RISC-V 条件移动 RISC-V Conditional Moves

RISC-V 条件移动
RISC-V Conditional Moves