开发者工具如何将混淆的 JavaScript 代码映射回你的 TypeScript 源代码

开发者工具如何将混淆的 JavaScript 代码映射回你的 TypeScript 源代码
How devtools map minified JS code back to your TypeScript source code

原始链接: https://www.polarsignals.com/blog/posts/2025/11/04/javascript-source-maps-internals

## JavaScript 源代码映射：摘要源代码映射弥合了优化后的生产JavaScript代码（通常是压缩和打包后的代码）与原始、对开发者友好的源代码之间的差距。它们允许浏览器使用原始变量名和格式显示和调试代码，即使在运行压缩文件时也是如此——解释了为什么`bundle.min.js`中的错误可以精确定位到`src/index.ts`中的问题。这个过程涉及三个关键阶段：转译（TypeScript到JavaScript）、打包和压缩，源代码映射在整个过程中保留了与原始代码的链接。这些映射是JSON文件（`.js.map`），包含关于原始源代码、变量名以及最重要的*映射*的信息——压缩数据，详细说明了生成代码和原始代码之间的对应关系。这种压缩利用了**VLQ（可变长度量）编码**，使用Base64字符有效地表示位置差异。VLQ不存储绝对坐标，而是存储相对变化，从而最小化文件大小。`mappings`字符串由逗号和分号分隔的片段构成（分号充当换行符），编码生成文件和源代码的位置，并可选地包含原始变量名。理解源代码映射可以解锁强大的调试功能，并且在性能分析工具中变得越来越重要。

这个Hacker News讨论围绕开发者工具如何将混淆过的JavaScript代码转换回原始TypeScript源代码。关键在于**源映射 (source maps)**，它们不存储代码元素在混淆文件中的绝对位置。相反，它们利用**相对定位**——存储代码片段之间的*差异*（增量，如+7或+15），而不是大的列号。这种方法显著压缩了源映射数据的大小。争论集中在描述这些差异的最佳术语上，有人建议使用“偏移量 (offset)”来代替“位置 (position)”，但原作者澄清说，“偏移量”意味着从起始点的距离，而源映射值是相对于*前一个*片段而言的。帖子链接到polarsignals.com上更详细的解释。它还包含一个Y Combinator申请公告。

原文

Source maps are the main piece in the jigsaw puzzle of mapping symbols and locations from "built" JavaScript files back to the original source code. When you debug minified JavaScript in your browser's DevTools and see the original source with proper variable names and formatting, you're witnessing source maps in action.

For example, when your browser encounters an error at bundle.min.js:1:27698, the source map translates this to src/index.ts:73:16, revealing exactly where the issue occurred in your original TypeScript code:

Production Bundle

bundle.min.js:1:27698

Original Source Location

src/index.ts:73:16

Production Bundle

bundle.min.js:1:27698

Original Source Location

src/index.ts:73:16

But how does this actually work under the hood? In this post, we'll take a deep dive into the internals of source maps, exploring their format, encoding mechanisms, and how devtools use them to bridge the gap between production code and developer-friendly sources.

The TypeScript Build Pipeline

Modern JavaScript builds typically involve three main stages:

Transpilation: TypeScript → JavaScript
Bundling: Combining modules into a single file
Minification: Compressing code for production

At each stage, source maps preserve the connection back to the original code.

Stage 0: Source TS files

The original TypeScript source files with full type annotations.

Source Files

1export function add(a: number, b: number): number {
2  return a + b;
3}

1import { add } from './add';
2
3export function computeFibonacci(n: number): number {
4  if (n <= 1) return n;
5  return add(computeFibonacci(n - 1), computeFibonacci(n - 2));
6}

1import { computeFibonacci } from './fibonacci';
2
3const result = computeFibonacci(10);
4console.log(`Fibonacci(10) = ${result}`);

No source map at this stage

The Source Map File Format

Source maps use JSON format, typically with a .js.map extension. Let's examine a source map structure from our add.js.map file:

{
  "version": 3,
  "file": "add.js",
  "sourceRoot": "",
  "sources": ["add.ts"],
  "names": ["add", "a", "b"],
  "mappings": "AAAA,OAAO,SAAS,IAAI,CAAC,EAAE;EACrB,OAAO,IAAI;AACb"
}

Fields Breakdown:

version: Indicates the source map version (currently always 3).
file: The generated file name this source map corresponds to.
sourceRoot: Optional prefix for all source URLs. Useful when sources are hosted elsewhere.
sources: Array of original source file paths from which the generated file was built.
sourcesContent: Optional array containing the actual source code. This allows DevTools to display sources even if the original files aren't accessible. Usually disabled in production builds.
names: Array of original identifiers (variable names, function names, etc.) that appear in the source. Referenced by the mappings.
mappings: The compressed mapping data. This is the heart of the source map and uses VLQ encoding. More on this below.

Understanding the Mappings: VLQ Encoding

The mappings field is where the real magic happens. It contains the actual position mappings between every token in the generated JavaScript file and its corresponding location in the original source files.

Essentially, it answers the question: "For this character at line X, column Y in the minified file, where was it originally located?"

This mapping data tracks:

The file path and name of the original source file
The exact line and column in the source file
The original variable/function name (if renamed during minification)

But instead of storing this as a massive JSON array of positions, which would be larger than the minified code itself, source maps use a highly compressed format. Here's what the encoded string looks like:

"AAAA,OAAO,SAAS,IAAI,CAAC,EAAE;EACrB,OAAO,IAAI;AACb"

To keep file sizes manageable, mappings use Variable Length Quantity (VLQ) encoding with Base64 characters. Let's break this down.

The Mapping Structure

The mappings string is a series of segments separated by commas and semicolons:

"segment,segment,segment;segment,segment;segment"

We'll see significance of commas and semicolons shortly, but first, what is a "segment"?

Each segment represents a mapping from a position in the generated file to a position in the source file. Segments come in three flavors:

1 value: This referenced column doesn't map to any source (e.g., webpack-generated code)
[generatedColumn]
4 values: This is the most common case, mapping a position in the generated file to a position in the source file:
[generatedColumn, sourceFileIndex, sourceLine, sourceColumn]
5 values: Same as 4, plus a reference to the original name of the variable/function:
[generatedColumn, sourceFileIndex, sourceLine, sourceColumn, nameIndex]

The most common case is 4 values (basic position mapping). The 5th value is only added when a variable or function was renamed during minification.

But wait, notice that segments only contain the column in the generated file, not the line number. How does the decoder know which line a segment belongs to?

The answer lies in the structure: semicolons act as line breaks. The position of segments between semicolons determines their line number in the generated file.

"segment,segment,segmentLine 0;segment,segmentLine 1;segmentLine 2"

"segment,segment,segment;segment,segment;segment"

This is why empty lines in the generated file still need semicolons, they maintain the line count even with no mappings.

Let's see how this works with a real example:

Original mapping string from add.js.map:

AAAA,OAAO,SAASS,IAAI,CAAC,EAAE;EACrB,OAAO,IAAI;AACb

Line 0 (before first semicolon):

AAAA

[0, 0, 0, 0]

generated:0:0

↓

sources[0]:0:0

OAAO

[7, 0, 0, 7]

generated:0:7

↓

sources[0]:0:7

SAASS

[9, 0, 0, 9, 9]

(includes name index)

generated:0:16

↓

sources[0]:0:16

IAAI

[4, 0, 0, 4]

generated:0:20

↓

sources[0]:0:20

CAAC

[1, 0, 0, 1]

generated:0:21

↓

sources[0]:0:21

EAAE

[2, 0, 0, 2]

generated:0:23

↓

sources[0]:0:23

Line 1 (after first semicolon):

EACrB

[2, 0, 1, -21]

generated:1:2

↓

sources[0]:1:2

OAAO

[7, 0, 0, 7]

generated:1:9

↓

sources[0]:1:9

IAAI

[4, 0, 0, 4]

generated:1:13

↓

sources[0]:1:13

Line 2 (after second semicolon):

AACb

[0, 0, 1, -13]

generated:2:0

↓

sources[0]:2:0

Notice how the decoded values give relative positions, each value represents the difference from the previous position, not absolute coordinates. This is crucial: instead of encoding large column numbers like 27698 in minified files, source maps only store small deltas like +7 or +15, making the encoded strings much more compact.

Now that we understand the mapping structure, let's see how these numbers actually get transformed into the Base64 alphabet characters we see in the mappings string.

How VLQ Encoding Works

VLQ (Variable Length Quantity) encoding is an efficient way to represent numbers using as few bytes as possible. It's perfect for source maps because most position differences are small numbers.

The encoding process has three main steps:

1. Encode the sign bit

Since we need to handle both positive and negative differences (code can move backward), VLQ uses the least significant bit (LSB) to encode the sign:

Positive number: LSB = 0
Negative number: LSB = 1

Examples:
 5 → binary: 101 → with sign bit: 1010 (LSB=0 for positive)
-5 → binary: 101 → with sign bit: 1011 (LSB=1 for negative)

2. Split into 5-bit groups

Each Base64 character can represent 6 bits, but we need 1 bit as a "continuation" flag to indicate if more characters follow. This leaves 5 bits for data:

[continuation bit][5 data bits]
       ↑              ↑
   1 = more coming    actual value bits
   0 = last character

3. Convert to Base64

Map each 6-bit value to a Base64 character:

A=0, B=1, C=2... Z=25, a=26, b=27... z=51, 0=52, 1=53... 9=61, +=62, /=63

Example

Lets go through the steps to encode the number 7:

Step 1: Convert to binary

7→111

7 in binary representation

Step 2: Add sign bit

111→1110

Shift left (111 << 1) and add 0 for positive. LSB = 0 indicates positive number

Step 3: Add continuation bit

1110→001110

Prepend 0 (no continuation) since it fits in 5 bits. First bit = 0 means this is the last character

Step 4: Convert to Base64

001110→O

001110 in binary = 14 in decimal. Base64[14] = O

That's why in our mapping example, the value 7 is encoded as 'O'!

Conclusion

I hope this deep dive into JavaScript source maps has shed light on how they function under the hood and adds to your appreciation for the amount of position data they efficiently encode.

P.S. Stay tuned: source maps support is coming to parca-agent and Polar Signals Cloud, bringing the same debugging magic to your performance profiling workflow!