时间旅行调试:在本地重现生产环境中的错误
Time-Travel Debugging: Replaying Production Bugs Locally

原始链接: https://lackofimagination.org/2026/02/time-travel-debugging-replaying-production-bugs-locally/

## 使用时间旅行调试生产问题 是否曾追逐过只在生产环境中出现,但在本地却运行正常的bug?本文提出了一种使用JavaScript“Effect System”的解决方案,以实现确定性调试。代码不再直接产生副作用(如数据库调用),而是将这些副作用*描述*为“Command”对象。然后由解释器执行这些命令,创建可追踪的流水线。 这种方法允许记录所有交互——数据库读取、API调用——为详细的执行轨迹。至关重要的是,由于核心逻辑保持纯净且无副作用,因此可以在本地*重放*此轨迹,而无需模拟或访问实时服务。 示例演示了看似成功的结账流程,由于支付网关拒绝了100%促销代码后的0.00美元费用而失败。生成的轨迹清晰地指出了问题所在。一个简单的“时间旅行”函数可以重放此轨迹,在本地重现错误。 该系统可以在100行代码以内实现,提供了一个强大的调试工具,通过删除敏感信息增强安全性,并将调试从猜测转变为观察。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 时间旅行调试:在本地重现生产环境中的错误 (lackofimagination.org) 3 分,由 tie-in 1小时前发布 | 隐藏 | 过去 | 收藏 | 讨论 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

We’ve all had that sinking feeling. There are multiple crash reports from production. We have the exact input parameters that caused the failures. We have the stack traces. Yet, when we run the code locally, it works perfectly.

We know where it broke, but we can’t see why. Was it a race condition? Did a database read return stale data that has since been overwritten? To find the cause, we have to mentally reconstruct the state of the world as it existed milliseconds before the crash. Welcome to debugging hell.

If we could simply rewind time and watch the code execute exactly as it did for those failed requests, life would be a lot easier.

In Testing Side Effects Without the Side Effects, we explored a JavaScript Effect System where business logic doesn’t execute actions directly. Instead, it returns a description of what it intends to do in the form of a simple Command object. For example:

const validatePromo = (cartContents) => {
    // Define the side effect, but don't run it yet
    const cmdValidatePromo = () => db.checkPromo(cartContents);

    // Define what happens with the result
    const next = (promo) =>
        (promo.isValid ? Success({...cartContents, promo}) : Failure('Invalid promo'));

    return Command(cmdValidatePromo, next);
};

outputs:

{
    type: 'Command',
    cmd: [Function: cmdValidatePromo], // The command waiting to be executed
    next: [Function: next]             // What to do after the command finishes
}

We often compose multiple commands in a pipeline to make the most of our Effect system:

const checkoutFlow = (cartSummary) =>
    effectPipe(
        fetchCart,
        validatePromo,
        (cartContents) => chargeCreditCard(cartSummary, cartContents)
    )(cartSummary);

Our effect pipeline handles the Success and Failure cases automatically. If a function returns Success, the subsequent function in line will be called. In the case of a Failure, the pipeline terminates.

The series of Command objects generated by the pipeline is then run by an interpreter using runEffect(checkoutFlow(cartSummary)). Because our business logic consists of pure functions that interact with the world only through data, we can record those interactions simply by adding a few hooks for services like OpenTelemetry. And if we can record them, we can replay them deterministically. Best of all, there’s no need to mock a single database or external service.

When a crash happens, we don’t just get an error message. We get a crash log containing the initial input and the execution trace complete with all outputs.

In the following simplified trace generated by our workflow, a customer uses a 100% off promo code. The business logic calculates the total as $0.00 and attempts to pass it to the payment gateway, but the payment gateway rejects the API call because the minimum charge amount is $0.50, causing a 500 Internal Server Error:

const traceLog = {
  "flowName": "checkout",
  "initialInput": {
    "userId": "some_user_id",
    "cartId": "cart_abc123",
    "promoCode": "FREE_YEAR_VIP"
  },
  "trace": [
    {
      "command": "cmdFetchCart",
      "result": {
        "cartId": "cart_abc123",
        "items": ["annual_subscription"],
        "totalAmount": "120.00"
      }
    },
    {
      "command": "cmdValidatePromo",
      "result": {
        "cartId": "cart_abc123",
        "items": ["annual_subscription"],
        "totalAmount": "120.00",
        "isValid": true,
        "discountType": "%",
        "discountValue": 100
      }
    },
    {
      "command": "cmdChargeCreditCard",
      "result": {
        "error": {
          "code": "invalid_amount",
          "message": "Amount must be non-zero."
        }
      }
    }
  ]
};

Compared to the cryptic stack traces common in imperative code, this execution trace makes the source of the error immediately obvious.

We can even go ahead and write a quick time-travel function like the one below to replay any execution trace locally, complete with built-in support for detecting time paradoxes!

function timeTravel(workflowFn, traceLog) {
    const { initialInput, trace, flowName } = traceLog;
    const format = (v) => JSON.stringify(v, null, 2);

    let currentStep = workflowFn(initialInput);
    let traceIndex = 0;
    console.log(`Replay started with initial input: ${format(initialInput)}`);
    while (true) {
        const stepName = currentStep.type === 'Command' ? currentStep.cmd.name || 'anonymous' : currentStep.type;

        if (currentStep.type === 'Success' || currentStep.type === 'Failure') {
            console.log(`Replay Finished with state: ${currentStep.type}`);
            console.log(
                currentStep.type === 'Failure'
                    ? `Error: ${format(currentStep.error)}`
                    : `Result: ${format(currentStep.value)}`
            );
            break;
        }

        if (currentStep.type === 'Command') {
            const recordedEvent = trace[traceIndex];
            if (!recordedEvent) {
                throw new Error(`Trace ended prematurely at step ${traceIndex}. Workflow expected command: ${stepName}`);
            }
            if (recordedEvent.command !== stepName) {
                throw new Error(
                    `Time paradox detected! Workflow asked for '${stepName}', but trace recorded '${recordedEvent.command}'`
                );
            }
            console.log(`Step ${++traceIndex}: ${recordedEvent.command} returned ${format(recordedEvent.result)}`);
            currentStep = currentStep.next(recordedEvent.result);
        }
    }
}

When we run timeTravel(checkoutFlow, traceLog), it will actually exercise our checkout workflow, and produce the following output. With that, we’ve successfully executed a production execution trace locally, all without touching any database or external service:

Replay started with initial input: {
  "userId": "some_user_id",
  "cartId": "cart_abc123",
  "promoCode": "FREE_YEAR_VIP"
}
Step 1: cmdFetchCart returned {
  "cartId": "cart_abc123",
  "items": ["annual_subscription"],
  "totalAmount": "120.00"
}
Step 2: cmdValidatePromo returned {
  "cartId": "cart_abc123",
  "items": ["annual_subscription"],
  "totalAmount": "120.00",
  "isValid": true,
  "discountType": "%",
  "discountValue": 100
}
Step 3: cmdChargeCreditCard returned {
  "error": {
    "code": "invalid_amount",
    "message": "Amount must be non-zero."
  }
}
Replay Finished with state: Failure
Error: {
  "code": "invalid_amount",
  "message": "Amount must be non-zero."
}

Time-travel debugging might sound like a complex feature reserved for heavy-duty enterprise tools, but it fundamentally comes down to architectural design; it takes less than 100 lines of code to implement, and that figure includes our Effect System.

Because every interaction passes through runEffect, we can easily implement a redaction layer to scrub personally identifiable information, like credit card numbers or emails, before they ever hit the trace log.

By pushing side effects to the edges and keeping our core logic pure, we gain a deterministic and secure execution trace. As a result, debugging shifts from guessing what might have happened to watching exactly what did happen, all without compromising user privacy.


GitHub Repository: pure-effect


Related:

联系我们 contact @ memedata.com