Postgres 事务是分布式系统中的一项超能力

Postgres 事务是分布式系统中的一项超能力
Postgres transactions are a distributed systems superpower

原始链接: https://www.dbos.dev/blog/co-locating-workflow-state-with-your-data

作者认为，为了实现持久的工作流，开发者应摒弃外部工作流引擎，转而将工作流状态直接托管在应用程序的 Postgres 数据库中。传统的分布式工作流往往难以应对部分故障，迫使开发者必须在应用层面构建复杂的机制来保障幂等性和原子性。通过将工作流元数据与应用数据存放在同一个数据库中，这些挑战可以通过原生的 ACID 事务来解决。这种共存模式允许开发者将数据库更新与工作流检查点封装在同一个事务中。这提供了“恰好一次”（exactly-once）的执行语义，无需再进行繁琐的手动记录或幂等性检查。此外，它还简化了原子操作——例如在更新记录的同时触发下游任务——通过简单的数据库触发器取代了复杂的“事务性发件箱”（transactional outbox）模式。归根结底，通过将 Postgres 用作工作流引擎，开发者可以免去维护独立基础设施和对账任务的运营负担，从而构建出更可靠、更一致的分布式系统。

这次讨论的核心是**事务性发件箱模式（Transactional Outbox Pattern）**，这是一种在数据库和消息队列之间实现原子性的策略。分布式系统中的核心挑战在于，无法同时对数据库和消息队列进行原子性更新。发件箱模式通过将数据库更新和“待发送消息”写入同一个本地数据库事务来解决此问题，从而确保只有在数据库更新成功时，消息才会被加入队列。 **辩论的重点总结如下：** * **原子性与幂等性：** 该模式将原子性问题转化为了幂等性问题。由于处理过程为“至少一次投递”，消息消费者必须具备幂等性，以处理可能出现的重复事件。 * **中心化：** 批评者认为，使用数据库作为消息队列（即“分布式单体”）通过中心化数据源规避了真正的分布式系统复杂性。虽然这简化了数据一致性，但可能会导致性能瓶颈和紧耦合。 * **权衡：** 支持者将其视为一种务实的折中方案。它为关键状态变更提供了强一致性，同时将副作用分流至异步、可重试的后台进程中。归根结底，这往往是在架构纯粹性与实际运营可靠性之间做出选择。

原文

A few weeks ago, we wrote that you should “just use Postgres” for durable workflows.

That post generated a lot of discussion, but also a misunderstanding. We didn't just mean you should use a workflow engine that stores state in Postgres. We meant your workflow system can, and often should, live inside the same Postgres database as your application.

At first glance, this doesn’t sound like a good idea. Shouldn’t those concerns be separated? Shouldn’t workflow state live in one database and application data in another?

Maybe not.

In distributed systems, co-location is a superpower. When workflow metadata and application data live in the same Postgres database, they can be updated in the same database transaction. That means partial failures are no longer possible, making it far easier to build workflows that correctly handle all edge cases.

In this post, we'll explain why that's possible, and how transactions can simplify tough problems like idempotency and atomicity.

Idempotency with Transactional Steps

One fundamental challenge in distributed systems is idempotency, especially for operations that modify database state.

Durable workflows achieve fault tolerance by checkpointing the result of each step after it completes. If a workflow is interrupted, it resumes from its last checkpointed step instead of starting from the beginning. However, a workflow may be interrupted after completing a step but before recording its checkpoint. When it recovers, it has no record that the step already ran and will execute it again.

As a result, durable workflows alone do not solve the idempotency problem. Workflow engines typically require steps to be idempotent so they can safely be retried without duplicate side effects. For example, consider a step that credits (add money to) a bank account. This is not an idempotent operation: if a step adds $100 to an account, fails, reruns, and adds $100 again, then a total of $200 is added to the account, which is not correct.

The most common solution is to add application-level bookkeeping to guard against this. For example, you can add an additional applied_payments table to keep track of which payments have been applied, update it transactionally, and check against it to make sure you never credit an account twice:

When workflow state and application data are co-located in the same Postgres database, we can eliminate much of this complexity. Instead of checkpointing a step after its database transaction commits, a co-located workflow engine can write the step checkpoint and perform the database update in the same transaction.

To do this, the workflow executes the step using a database transaction provided by the workflow engine. The step performs its database updates, the workflow engine records the checkpoint, and the whole transaction commits atomically:

By making the database update and checkpoint write part of the same transaction, the workflow engine can provide exactly-once execution semantics for transactional steps:

If the transaction commits, both the database update and the checkpoint are durably recorded, guaranteeing the step will never run again.
If any failure occurs before commit, the entire transaction is rolled back, including both the database update and the checkpoint. When the workflow recovers, it safely re-executes the step from the beginning.

This eliminates the window in which a database update can succeed without a corresponding checkpoint. As a result, transactional steps no longer need application-level idempotency logic or bookkeeping tables. The database operation either happens exactly once and is checkpointed, or it does not happen at all.

Atomicity with a Transactional Workflow Outbox

Another classic challenge in distributed systems is reliably performing updates in multiple systems, for example, updating a database record and sending a notification to another system. This is trickier than it sounds because the operations need to be atomic: they either both happen or neither do, even if there are failures (such as process crashes or network glitches) while performing them.

For example, whenever a customer submits a new order, we may also want to start a workflow that sends the order to a warehouse for fulfillment. Without atomicity, the database and the downstream system may become inconsistent. The order might be submitted without a warehouse being notified, or a warehouse might be notified about an order that was never committed.

The most common solution to this problem is the transactional outbox. The idea is to maintain a new “outbox” table to the database. When we need to perform an atomic update, we run a single database transaction that both:

Updates the database record
Writes a message to the “outbox” table

A separate background process then polls the outbox table and delivers those messages there to the target system.

Here’s an example of what that might look like:

Performing the database record update and writing the message to the “outbox” table in one transaction guarantees atomicity: either both records are updated or neither are. Once a message is written to the outbox, it can be delivered asynchronously, even if failures occur after the transaction commits.

The transactional outbox is widely used, but it introduces additional operational complexity. You need infrastructure to poll the outbox, deliver messages, handle retries, and monitor failures. If the workflow engine is a separate system, it can drift out of sync with the database. In practice, resolving discrepancies requires additional infrastructure such as reconciliation jobs to detect database records that were updated without sending notifications to downstream systems.

By leveraging database-backed workflows and co-locating workflow state with application data, we can simplify this pattern. Instead of manually maintaining an outbox table and a separate polling process, we use a Postgres user-defined function (UDF) to enqueue a workflow in the same database transaction as the application update:

This works following the same principles as the transactional outbox. The workflow is represented by a database row containing its name, queue, and input. The enqueue_workflow UDF creates this row in the same transaction as the user database update, guaranteeing atomicity: either the update completes and the workflow is enqueued, or neither happens. Then, a worker dequeues and executes the workflow asynchronously, reliably performing the required operations.

Learn More

If you like building scalable, reliable systems, we’d love to hear from you. At DBOS, our goal is to make Postgres-backed durable execution as simple and performant as possible. Check it out: