将 React Compiler 移植到 Rust
Port React Compiler to Rust

原始链接: https://github.com/react/react/pull/36173

为了处理此前会导致反序列化失败的未建模 Babel 语句类型,AST 现在新增了一个 `Unknown(UnknownStatement)` 变体。这使得系统能够按原样保留未建模的语法,而非直接崩溃,从而与 TypeScript 的行为保持一致。 主要技术改进包括: * **稳健的反序列化:** 手写的 `serde` 实现通过 `known_statements!` 宏分发已建模的标签,确保畸形的已建模节点能触发精确的错误,而只有真正未知的标签才会回退到 `Unknown` 变体。 * **完整性与安全性:** 系统通过一个拒绝 `type` 修改的范围限定修改器(scoped mutator),防止原始节点与位置辅助器之间出现不同步。为适应这一特定的 `Statement` 异常,修订了“无捕获所有(no-catch-all)”策略。 * **代码生成判别:** 通过显式判别确保表达式和模式节点得到正确处理,防止了因将表达式节点误认为原始语句而导致的“静默孤儿(silent orphan)”回归问题。 * **性能:** 通过在类型化解析前为每个语句具体化一个 `serde_json::Value`,系统在保持现有渐进性能的同时,提高了错误的粒度。 这些更改已通过详尽的单元测试和集成测试验证,确保与 Babel 处理未建模语法的方式保持一致。

React 团队最近合并了一个庞大的 React Compiler 代码库,包含 12 万行代码,从 TypeScript/Babel 迁移到了 Rust。这一举动在 Hacker News 上引发了关于大规模重构、大模型(LLM)使用以及软件可维护性之间关系的激烈争论。 支持者认为,此次重写是提升性能的现实需要,利用 Rust 的速度和内存安全性来改善构建时间和效率。许多开发者将 LLM 在此类迁移中的应用视为一种“力量倍增器”,使团队能够快速迁移复杂的代码库,同时将繁琐的样板代码和借用检查器的复杂性交给 AI 处理。 相反,批评者对这种快速的 AI 辅助转型所产生的“认知债务”表示严重关切。怀疑论者担心,当代码库超出人类理解范围时,它们会变成不可维护的黑箱,尤其是当原始的架构意图因此丢失时。讨论还涉及了行业内从解释型语言转向原生二进制工具的大趋势,一些人将当前的 Rust 迁移与 Java 和 .NET 生态系统中类似的 AOT 编译转变进行了比较。归根结底,这场争论反映了对高性能工具的追求与放弃人类可读、可维护代码所带来的风险之间日益加剧的矛盾。
相关文章

原文
Babel can emit statement kinds the typed AST does not model (the
todo-ts-* fixtures pin three TS module-interop forms). Deserialization
previously failed the whole file on the first such node, while the TS
reference compiles the file and leaves the statement alone.

Statement gains a final #[serde(untagged)] Unknown(UnknownStatement)
variant carrying the complete raw node. Deserialization is hand-written
and dispatches modeled `type` tags through a KnownStatement helper so a
malformed modeled node still errors with its precise field-level
message instead of degrading to Unknown; only genuinely unmodeled tags
take the catch-all. The TS reference reaches its equivalent default
case only via assertExhaustive (Babel's closed types), so it crashes;
here unmodeled syntax is reachable by construction and degrades
instead: top-level statements are preserved verbatim through
re-serialization, and function-body occurrences record the standard
UnsupportedSyntax bailout with an UnsupportedNode instruction carrying
the raw node. A known_statements! macro is the single source for the
dispatch enum, its From mapping, and the tag list, so those three
cannot drift; a variant added to Statement but not the macro is the one
remaining silent gap, documented on the variant.

UnknownStatement caches BaseNode for position helpers; the scoped
with_raw_mut mutator refreshes the cache and rejects mutations that
strip `type`, so the two views cannot desync. Program-level analyses
treat Unknown explicitly: the gating reference-before-declaration scan
walks the raw node for identifier references (an `export = X` does
reference X), and the prefilter and return-analysis arms are
deliberately inert. SWC/OXC reverse converters emit a deliberate
runtime tripwire (a throw in generated code) for the arms that are
unreachable until the SWC forward conversion stops rewriting these
statements to EmptyStatement in the next slice.

Deserialization now materializes a serde_json::Value per statement
before typed parsing. The cost is one move-based tree rebuild per
nesting level at a one-time boundary; the previous derive also buffered
every node through serde's internal Content to read the tag, so the
delta is allocation shape, not asymptotics.

Verified: ast unit tests including malformed/edge cases, a lowering
integration test pinning the function-body bailout, round_trip green on
the three fixtures, scoped and full Babel e2e green on all three with
events parity, cargo test --workspace green. The scope-resolution half
of test-babel-ast.sh is green on this stack's base and remains red
corpus-wide on the pr-36173 tip, whose node-ID migration removed
position-based keying while babel-ast-to-json.mjs still emits
offset-based scope JSON; that generator gap needs its own fix before
this stack rebases onto the tip. rust-port-0001-babel-ast.md's no-catch-all policy is
amended to document Statement as the deliberate exception.

Port adaptation for this branch's UnsupportedNode codegen fix
(0957b55), which discriminated statement-vs-expression
original_node by attempting a Statement deserialization. With the
tolerant deserializer that attempt succeeds for every tagged object,
which would silently emit expression nodes as raw statements and
orphan their lvalue temporaries — regressing the ~10 fixtures that
commit fixed. The codegen site now discriminates explicitly
(codegen_unsupported_original_node): modeled statement tags parse
typed and a parse failure is an invariant, not a degrade; tags that
parse as Expression or PatternLike (both strict enums, no catch-all)
flow through expression codegen unchanged, preserving the lvalue
binding and the pattern placeholder fallback; only genuinely unmodeled
tags — producible solely by the unknown-statement lowering bailout,
i.e. from statement position — degrade to Statement::Unknown and are
emitted verbatim, matching TS codegen's 'return node'.
is_known_statement_type is now exposed (pub) from the
known_statements! macro for this, and unit tests pin the
dispatch (modeled statement tag, malformed modeled tag, expression
tag, pattern tag, unknown tag).
联系我们 contact @ memedata.com