Turbopack: 通过构建更少来加速构建。

Turbopack: 通过构建更少来加速构建。
Turbopack: Building faster by building less

原始链接: https://nextjs.org/blog/turbopack-incremental-computation

## Turbopack：通过增量计算实现快速迭代 Turbopack 是 Next.js 的新默认打包器，它通过复杂的增量计算和缓存系统来优先考虑速度。与许多打包器不同，Turbopack 拥抱增量构建的复杂性——跟踪更改并仅重新计算必要的内容——从而为大型应用程序提供即时构建和快速 React 快速刷新。它的架构建立在十多年的研究和从 webpack 等项目中吸取的经验教训之上，利用“值单元”自动以细粒度级别跟踪依赖项。这避免了手动定义依赖图固有错误，并能够在发生更改时实现精确的重新计算。 Turbopack 使用详细的依赖图和“聚合图”来高效查询构建信息。最近的更新添加了默认启用的稳定文件系统缓存，允许缓存跨重启持久存在，并显著加快开发速度。这种创新的方法解决了缓存的挑战——开销和潜在的性能问题——从而提供更快速、更响应的开发体验。

## Turbopack 总结 Turbopack 是 Next.js 创造者推出的一种新的构建工具，旨在通过细粒度的依赖跟踪和 Rust 实现来显著提高构建速度。虽然技术上很有前景，但 Hacker News 上的讨论集中在其对 JavaScript 生态系统的潜在影响。一个主要担忧是生态系统碎片化。与集成现有工具的 Vite 不同，Turbopack 对 Rust 的插件依赖为不熟悉该语言的开发者设置了进入门槛。许多评论员认为 JS 社区已经在围绕 Vite 整合。 Turbopack 目前与 Next.js 紧密耦合，使其更难被广泛使用。有些人认为它对于仍然使用 Webpack 的旧版 Next.js 项目来说是一个解决方案，而另一些人则质疑 Next.js 的市场份额是否能证明需要一个独立的、潜在的“封闭花园”构建工具。关于 Bun 是否能实现类似结果也存在争论。

原文

Edit. Save. Refresh. Wait… Wait… Wait…

Compiling code usually means waiting, but Turbopack makes iteration loops fast with caching and incremental computation. Not every modern bundler uses an incremental approach, and that’s with good reason. Incremental computation can introduce significant complexity and opportunities for bugs. Caches require extra tracking and copies of data, adding both CPU and memory overhead. When applied poorly, caching can actually make performance worse.

Despite all of this, we took on these challenges because we knew that an incremental architecture would be critical to Turbopack’s success. Turbopack is the new default bundler for Next.js, a framework that is used to build some of the largest web applications in the world. We needed to enable instant builds and a fast as-you-type interactive React Fast Refresh experience, even for the largest and most challenging workloads. Our incremental architecture is core to achieving this.

Turbopack’s architecture was built ground-up with caching in mind. Its incremental design is based on over a decade of research. We built on first-hand experience from challenges in implementing caching in webpack and drew inspiration from Salsa (which powers Rust-Analyzer and Ruff), Parcel, the Rust compiler’s query system, Adapton, and many others.

Turbopack achieves a fine-grained cache by automatically tracking how internal functions are called and what values they depend on. When something changes we know how to recompute the results with minimal work.

Background: Manual incremental computation

Many build systems include explicit dependency graphs that must be manually populated when evaluating build rules. Explicitly declaring your dependency graph can theoretically give optimal results, but in practice it leaves room for errors.

The difficulty of specifying an explicit dependency graph means that usually caching is done at a coarse file-level granularity. This granularity does have some benefits: fewer incremental results means less data to cache, which might be worth it if you have limited disk space or memory.

An example of such an architecture is GNU Make, where output targets and prerequisites are manually configured and represented as files. Systems like GNU Make miss caching opportunities due to their coarse granularity: they do not understand and cannot cache internal data structures within the compiler.

Function-level fine-grained automatic incremental computation

In Turbopack, the relationship between input files and resulting build artifacts isn’t straightforward. Bundlers employ whole-program analysis for dead code elimination ("tree shaking") and clustering of common dependencies in the module graph. Consequently, the build artifacts (JavaScript files shared across multiple application routes) form complex many-to-many relationships with input files.

Turbopack uses a very fine-grained caching architecture. Because manually declaring and adding dependencies to a graph is prone to human errors, Turbopack needs an automated solution that can scale.

Tracking the compilation graph with value cells

To facilitate automatic caching and dependency tracking, Turbopack introduces a concept of “value cells” (Vc<…>). Each value cell represents a fine-grained piece of execution, like a cell in a spreadsheet. When reading a cell, it records the currently executing function and all of its cells as dependent on that cell. This is similar to how signals work in frameworks like SolidJS.

By not marking cells as dependencies until they are read, Turbopack achieves finer-grained caching than a traditional top-down memoization approach would provide. For example, an argument might be an object or mapping of many value cells. Instead of needing to recompute our tracked function when any part of the object or mapping changes, it only needs to recompute the tracked function when a cell that it has actually read changes.

Value cells represent nearly everything inside of Turbopack, such as a file on disk, an abstract syntax tree (AST), metadata about imports and exports of modules, or clustering information used for chunking and bundling.

Marking dirty and propagating changes

When Turbopack executes functions for the first time, it builds a graph of functions and the value cells they create or depend on. The graph’s roots are the requested outputs (bundled assets), and the leaves are source code files. There are intermediate representations in the middle, such as ASTs, metadata, partially-transformed modules, or chunking information.

When our file system watcher finds source code that has changed, it marks all functions that read the file’s value cell as “dirty” and queues them for re-computation.

The dirtied function reading the file might parse or transform the JavaScript module and produce a new intermediate representation. Recomputing the function update cells containing changed intermediate representations, which may mark more functions as dirty. Cell updates are skipped if the cell contents are equal. This propagation bubbles up the graph until all affected functions have been recomputed.

As an additional optimization, execution is "demand-driven," meaning the system defers re-execution of dirty functions until they become part of an "active query". In development, an active query could be a currently open webpage with hot reloading enabled. In builds, this is a request for the full production app.

Aggregation graphs

While most operations, like the dirty propagation algorithm, only require information about adjacent edges and neighboring nodes in the graph, some operations need to query information about more significant portions of the dependency graph:

Finding all dirty nodes when a sub-graph becomes part of a new active query, so we can schedule re-computation.
Collecting errors, warnings, or lints of a sub-graph.
Waiting for computation of a sub-graph to finish.

Because we maintain a fine-grained cache, the graph can contain hundreds of thousands or even millions of intermediate results. Visiting significant portions of this graph would be expensive.

To make these queries efficient, Turbopack uses an additional data structure on top of the dependency graph, called the “aggregation graph”.

When we build or update the dependency graph, we maintain parallel nodes in the aggregation graph that summarize part of the dependency graph. Some frequently accessed information, like emitted errors or warnings, is attached to the aggregation nodes.

This aggregation graph has multiple layers of resolution, with higher aggregation layers referencing more functions in each node, decreasing the resolution, and reducing the number of nodes that must be traversed when collecting information.

Every potential active query (e.g. an application entrypoint or route) represents a root in the aggregation graph. At the final aggregation graph layer, each root represents the information for itself and all of the children in the original dependency graph. Adding roots to the dependency graph can require a reorganization of the aggregation graph, but that’s an infrequent operation.

File system caching

Until our recent Next.js 16.1 release, all of these caches were stored only in memory.

In this new release, we shipped file system caching for next dev as stable and on-by-default. This cache allows us to persist the dependency graph, the aggregation graph, and all of the intermediate results stored in value cells to disk. When next dev is restarted, it can quickly resume from this warm cache.

File system caching came with its own set of challenges, and it took us over a year of dedicated work to meet our own high performance and quality bar. We'll dive into that soon in an upcoming engineering blog post.

Feedback and Community

Share your feedback and help shape the future of Next.js: