WASM 并非完全的栈式机器。

WASM 并非完全的栈式机器。
WASM is not quite a stack machine

原始链接: https://purplesyringa.moe/blog/wasm-is-not-quite-a-stack-machine/

## Wasm：并非传统的栈式机器尽管通常被描述为一种栈式语言，WebAssembly (Wasm) 从根本上不同于 Forth 等传统的基于栈的语言。Wasm *使用*栈进行求值，但缺乏真·栈式机器中关键的栈操作指令（如 `dup` 和 `swap`）。传统的栈式机器通过隐式引用栈中值的程序顺序来操作。然而，Wasm 主要依赖于直接求值表达式，限制了栈的重新排列。这迫使编译器使用变量——有效地将 Wasm 转换为寄存器机器——来执行即使是适度复杂的运算，例如重用值或优化表达式（例如，`x * x` 需要临时变量）。 Wasm 指令集反映了这一点：大量的生产者/消费者和操作，但最少的栈操作。这与 JVM 形成对比，JVM 是一种具有完整栈改变指令集的栈式机器。最终，Wasm 可以更好地理解为一种能够处理复合表达式的寄存器机器，这些表达式以逆波兰表示法等格式编码。虽然基于栈的求值对于初始实现简单性很有用，但它并不能定义 Wasm 的核心语义。使用传统栈式 VM 的经验并不能完全转化为 Wasm 优化策略。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 WASM 并非完全的栈式机器 (purplesyringa.moe) 7 分，由 signa11 发表于 3 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论帮助 stevefan1999 17 分钟前 [–] 我正在尝试实现一个 WASM 到 C 的编译器，并且由于这种非完全栈式行为，我可以保证它总是会构建一个表达式，而我不需要丢弃或重置栈值！一切都保持在该函数内，这非常整洁，我认为这也是 WAT（文本格式）如此整洁的原因之一，你可以用 S-表达式来表示它。回复考虑申请 YC 2026 夏季批次！申请截止至 5 月 4 日指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

April 27, 2026 Lobsters

Everyone knows Wasm is a stack machine. Wikipedia says so, the official Wasm design specification says so, you get it. I thought so too.

That is, until I started writing Wasm code – not compiling for Wasm, but writing the instructions by hand. And I found out that there exists a major difference between Wasm and all other stack-based languages, that makes this claim misleading.

Register vs stack

Let’s back up a bit. What is a stack machine, even?

Say you write a program in a high-level language, and at some point you want to calculate 2 * 3 + 5 * 7. Low-level languages don’t have a notion of compound expressions: they can only perform one operation at a time. So you need to do two multiplications, save their results, and then perform addition.

Many low-level languages, like x86 assembly, would represent these steps as follows:

a = 2
b = 3
c = a * b
d = 5
e = 7
f = d * e
g = c + f

This is called a register machine. You have variables (called registers), which can be used to store both persisted values and temporary results, and each instruction has form var1 = var2 op var3.

Other languages, like Forth or Hex Casting, use a stack for this purpose. The stack can store a sequence of values in an ordered manner, so that already computed subexpressions can lie around while you’re working on other parts. In a stack-based language, the same calculation would look like:

push(2)
push(3)
mul() – pops the last two values from the stack and pushes their product
push(5)
push(7)
mul()
add()

Note that there’s a similarity between the two programs: they have the same number of steps, and the corresponding steps perform the same operation. The major difference is that with a stack machine, the values operated upon are implicitly encoded in the program order, while the register machine always encodes indices.

Rearrangement

We know always-shrinking lossless compression doesn’t exist, though, so what expression power is lost by making indices implicit? For simple expressions, not much. But when values are reused, the difference becomes clear.

Say you’re a compiler, and you’re asked to compile this program:

x = 1 + 2 + 3 + 4
y = x * x * x

With a register machine, you can do:

(calculate x as usual)
tmp = x * x
y = tmp * x

A stack machine as described above, however, does not offer a way to refer to the same value twice: mul always multiplies two values on different positions in the stack. To enable this calculation, real stack machines introduce stack manipulation operations in addition to pure calculation. The one we’re looking for is called dup, and it duplicates the value on top of the stack:

(calculate x as usual)
dup() – the stack now contains x, x
dup() – the stack now contains x, x, x
mul() – the stack now contains x, x*x
mul() – the stack now contains x*(x*x)

You might notice that the register machine calculated (x*x)*x, while the stack machine calculated x*(x*x). These two are the same thing for multiplication, but may be different for other operations. To fix this, we also need to introduce swap, which, as the name implies, swaps the two values on top of the stack:

(calculate x as usual)
dup() – the stack now contains x, x
dup() – the stack now contains x, x, x
mul() – the stack now contains x, x*x
swap() – the stack now contains x*x, x
mul() – the stack now contains (x*x)*x

In practice, more operations are usually used to facilitate computation: over (copy second-last value to the top), 2dup (duplicate two values), drop (pop last value), rot (move third-last value to the top), etc.

From this perspective, stack machines can be seen as decoupling operations from indices they operate on. Whereas register machines always encode indices and pay a higher price when they’re redundant, stack machines encode them on an as-needed basis, but at the cost of a higher instruction count. If I wanted to be fancy, I’d say stack machines implement entropy-encoded compression for register machines.

Wasm

If you look at JVM, a well-known stack machine Wikipedia compares WebAssembly to, you’ll find basically this exact list of bytecode instructions:

Value producers and consumers: iaload, iastore, iconst.
Unary and binary operations: d2f, iadd.
Stack manipulation instructions: dup, dup_x1 (aka over), pop (aka drop), swap.

JVM is not a pure stack machine: there are also instructions for accessing local variables, like iload and istore. But it’s possible to write powerful JVM programs without their use, and javac mostly only uses them for variables explicitly created by the Java programmer.

Now let’s look at the Wasm instruction set:

Value producers and consumers: i32.load, i32.store, i32.const.
Unary and binary operations: f32.demote_f64, i32.add.
Stack manipulation instructions: drop, uhh, ???.

Well, now isn’t that interesting? Wasm has plenty of instructions that receive arguments and place return values on the stack, but almost no instructions that can rearrange it – and, as far as I can tell, drop only exists because otherwise you wouldn’t have a way to ignore a function output.

Pretty much the only thing pure Wasm can do is evaluate simple expressions exactly as written in source code. An optimizing compiler can’t perform common subexpression elimination or optimize expr^2 to expr * expr without introducing new variables. The moment you need anything non-trivial, you have to reach for variables – and thus end up with a register machine, the “stack machine” illusion falling apart.

Semantics

In my opinion, the right way to look at Wasm is as a register machine with operations generalized to compound expressions.

In binary Wasm, the expressions are encoded in Reverse Polish notation, which can be evaluated with a stack, but this is just an encoding. In textual Wasm, for example, they are instead represented in a LISP-like notation – not any less or more efficient. One can imagine a world where binary Wasm used prefix notation as well, with little impact; if I had to guess, postfix notation was preferred to simplify non-optimizing interpreters, or perhaps the experience with stack-based VMs was a tie-breaker.

This perspective is further confirmed by the fact that, until Wasm got the multi-value extension, control flow blocks pretty much couldn’t interact with the stack: values pushed onto the stack before if could not be accessed within the if body, and the if body could only return one value, so if was effectively just a ternary, and even values with a single consumer had to go through locals.

Conclusion

Does it really matter? Pretty much any machine can be converted to SSA, at which point the input format is not a consideration; and I suppose the simplicity of stack-based implementation was a good thing for Wasm adoption. But I think it’s fair to highlight that experience with stack-based VMs doesn’t translate well to Wasm, since it’s not quite a stack machine.

Soon after writing this post, I found this awesome post covering the same problem from a different, optimization-focused angle. Give it a read as well!