pg_durable:Microsoft 开源数据库内持久化执行引擎
pg_durable: Microsoft open sources in-database durable execution

原始链接: https://github.com/microsoft/pg_durable

**pg_durable** 是一个 PostgreSQL 扩展,支持在数据库内部直接进行具有容错能力的持久化后台任务执行。它通过将工作流逻辑完全整合进 SQL,消除了对外部编排工具(如 Airflow 或 Temporal)、消息队列或自定义状态表的依赖。 **核心特性:** * **持久化执行:** 工作流被定义为一组 SQL 步骤图,可自动检查进度。如果发生崩溃或故障,执行将从最后一个成功步骤恢复,而不是从头重新开始。 * **原生 SQL 支持:** 工作流使用可组合的 SQL DSL(例如 `df.start()` 和运算符链)定义,非常适合数据管道、AI 嵌入工作流以及自动化维护手册。 * **零基础设施:** 由于它作为原生 PostgreSQL 扩展(使用 `pgrx` 构建)运行,因此无需 Redis 或外部工作集群等额外服务。 * **运维可见性:** 进度、状态和变量存储在标准的 PostgreSQL 表中,可直接利用现有的数据库身份验证、行级安全性(RLS)和备份模型。 通过将计算与数据保持紧密结合,pg_durable 简化了后端架构,降低了复杂性,并确保了复杂的后台任务与它们所处理的数据一样可靠且可审计。目前该项目处于 PostgreSQL 17 和 18 的预览阶段。

微软开源了 **pg_durable**,这是一个用于在 PostgreSQL 数据库内部实现持久化执行的工具。该项目旨在直接在数据库内管理工作流,这在 Hacker News 上引发了关于这种方法与传统外部编排工具孰优孰劣的热烈讨论。 批评者表示质疑,询问为什么人们会选择将复杂的控制逻辑和工作流管道存储在数据库中(使用基于 SQL 的语法),而不是将其保留在版本控制的应用程序代码中。许多用户将该项目与 Apache Airflow 等成熟的调度器或 Temporal 等工作流引擎进行了比较,指出 pg_durable 的“数据库内”理念似乎偏离了现代开发的最佳实践。 一些评论者指出,这种趋势反映了向“Postgres 原生”队列和工作流发展的潮流,并提到了 DBOS 和 pgQue 等类似项目。虽然一些开发者认为这个概念很有趣,但其他人仍对其价值持怀疑态度,认为这是一种针对外部 DAG 调度器已经解决的问题所提出的过度复杂的解决方案。该项目的贡献者也加入了讨论,欢迎社区反馈以及潜在的贡献,例如集成 pgmq 等现有工具。
相关文章

原文

Long-running, fault-tolerant SQL functions for teams that already keep their state in Postgres and want to stop stitching together cron jobs, workers, queues, and status tables to make background work reliable. Define the workflow in SQL, let pg_durable checkpoint each step, and resume after crashes, restarts, or failed steps.

Durable execution is now a standard industry pattern, and pg_durable brings it inside Postgres with no extra service infrastructure required. Part of our mission to bring compute close to data.

  • Backend and data engineers who want workflows to live next to the data they touch.
  • DBAs and SREs automating runbooks that must survive restarts and be auditable in SQL.
  • Teams building data or AI pipelines that need durable execution per row, document, or batch.

A pg_durable function is a graph of SQL steps that PostgreSQL executes and checkpoints as it goes. If the database crashes, restarts, or a step fails, execution resumes from the last durable checkpoint instead of making you reconstruct state by hand.

Workloads this is useful for

  • Vector embedding pipelines: chunk, call an embedding API, and upsert into pgvector.
  • Ingest pipelines: stage, deduplicate, transform, and publish large batches.
  • Scheduled maintenance: detect bloat, notify, wait for approval, then run the next action.
  • Fan-out aggregation: run independent queries in parallel, then join the results.
  • External API workflows: enrichment, classification, and webhook-style calls from SQL.

What you're probably doing today instead

  • pg_cron plus a jobs table, status columns, retry counters, and a polling worker.
  • An external orchestrator such as Airflow, Temporal, Step Functions, or Argo calling back into Postgres.
  • A queue plus workers plus a separate state table to coordinate retries and partial completion.
  • A plpgsql procedure that works until a crash or long-running transaction forces you to start over.
  • A restart in the middle of a long job means rerunning work that already succeeded.
  • One failed row or one failed API call turns into manual cleanup and uncertain replay.
  • Long transactions hold locks, grow WAL, and make batch jobs fragile at larger scale.
  • Parallel work in the app tier creates more places for partial-failure bugs and drift.
  • The workflow logic ends up spread across SQL, workers, queues, dashboards, and status tables.

What changes in your architecture

  • The workflow definition moves into SQL and starts with df.start(...).
  • Retry state, progress tracking, and checkpointing move into Postgres instead of bespoke app code.
  • Some app-tier workers, queue consumers, or scheduler glue can disappear entirely.
  • Operational visibility comes from Postgres tables such as df.instances, using the same auth and backup model as your data.
  • The job is already a single INSERT ... SELECT or one ordinary SQL statement.
  • You need sub-millisecond synchronous request handling rather than durable background execution.
  • You cannot install extensions or run a background worker in your Postgres environment.
  • The workflow mostly lives outside Postgres and spans many heterogeneous systems.
  • You need arbitrary application logic that does not map cleanly to SQL steps, branching, loops, or HTTP calls.
  1. Define a workflow in SQL using composable operators such as ~> and |=>.
  2. Start it with df.start() and get back an instance ID.
  3. Let the runtime execute each step durably with checkpointing between steps.
  4. Query status and results from PostgreSQL while the workflow runs or after it completes.

The model is intentionally SQL-shaped. If a step needs arbitrary code, a non-HTTP SDK, or rich in-memory control flow, you may need to wrap that logic in a SQL function, expose it behind an HTTP endpoint for df.http(), or use a general-purpose orchestrator for that part of the system.

  • Durable — Function state persists to PostgreSQL. Survives crashes, restarts, and failovers.
  • SQL-native — Define functions in SQL using composable operators.
  • Database-aware — First-class primitives for scheduling, conditions, and parallel execution.
  • Zero infrastructure — Runs as a PostgreSQL extension. No Redis, no Temporal, no external services.
-- A durable function that processes data in steps
SELECT df.start(
    'SELECT id FROM documents WHERE processed = false LIMIT 100' |=> 'batch'
    ~> 'UPDATE documents SET processed = true WHERE id = ANY($batch)'
);

Tagged releases publish Debian packages for PostgreSQL 17 and 18 on amd64 from the GitHub release assets. Packages are named pg-durable-postgresql-<PG major>_<pg_durable version>-1_<arch>.deb and install the extension library, control file, and SQL upgrade files into the matching PostgreSQL installation directories.

After installing a package, add pg_durable to shared_preload_libraries, restart PostgreSQL, and create the extension in the configured pg_durable database:

CREATE EXTENSION pg_durable;

The default pg_durable database is postgres; see User Guide for background worker configuration and privilege setup.

Release assets also include source archives for building from source.

  • PostgreSQL 17 or 18
  • Rust (nightly)
  • cargo-pgrx 0.16.1

The main branch prebuild installs PostgreSQL 17, builds pg_durable, and prepares a local cluster under ~/.pgrx with the extension ready. PostgreSQL is not left running, so start it when you begin working.

# Start PostgreSQL
./scripts/pg-start.sh

# Connect
~/.pgrx/17.*/pgrx-install/bin/psql -h localhost -p 28817 -d postgres

On a branch without a ready prebuild, run pg-start.sh — it will build and install the extension on first run (expect a few minutes):

A VS Code Dev Container (.devcontainer/) provides Rust, cargo-pgrx, and PostgreSQL 17 pre-installed. For a bare local machine, install the toolchain first by following the steps in .devcontainer/onCreateCommand.sh.

# Build, initialize PostgreSQL, and install the extension
# This takes a while - go do something else
./scripts/pg-start.sh

# Connect to the local pgrx PostgreSQL instance
~/.pgrx/17.*/pgrx-install/bin/psql -h localhost -p 28817 -d postgres

pg-start.sh bootstraps new local data directories with a postgres superuser and also creates a matching superuser role for the current OS user, so default local psql usage continues to work. Use -U postgres if you want to force the canonical bootstrap role explicitly.

# Build and test
./scripts/test-e2e-docker.sh --rebuild

# Optional: Deploy to ACR (for custom PG17 image with pg_durable baked-in)
./scripts/deploy-acr.sh

CREATE EXTENSION pg_durable does not grant any privileges to PUBLIC. After installing the extension, the admin must explicitly grant access to application roles. Row-level security (RLS) ensures each user can only see and manage their own durable function instances and nodes.

Grant privileges to an application role:

-- Grant to specific roles after CREATE EXTENSION
SELECT df.grant_usage('app_role');

Alternatively, create an indirection role and grant membership to application roles:

-- Create a shared role for pg_durable access
CREATE ROLE pg_durable_user NOLOGIN;
SELECT df.grant_usage('pg_durable_user');

-- Grant membership to application roles
GRANT pg_durable_user TO app_backend, etl_service;

See the User Guide — Privilege Grants section for the full list of individual grants, revoking access, and hardening upgraded installs.

Note: GRANT EXECUTE ON ALL FUNCTIONS only applies to functions that exist when the grant runs. After upgrading pg_durable with ALTER EXTENSION pg_durable UPDATE, re-run df.grant_usage('role') (or re-issue the manual grants) so new functions are accessible.

Key points:

  • The background worker role (pg_durable.worker_role GUC, default: azuresu) must be a superuser — it bypasses RLS to manage all users' instances
  • Users get SELECT + INSERT on df.instances / df.nodes, column-level UPDATE (status, updated_at) on instances for df.cancel()
  • Identity column (submitted_by) cannot be modified by users
  • df.vars uses per-user scoping — each user has their own variable namespace via an owner column and RLS. Superusers bypass RLS but DSL functions still scope to the calling user via explicit filters. Avoid storing secrets in plain text

All pull requests must pass the following checks before merging:

  1. Format Checkcargo fmt --check
  2. Clippy & Testscargo clippy, unit tests (cargo pgrx test pg17), pg_regress tests, and E2E tests

The CI workflow is defined in .github/workflows/ci.yml. It uses pgrx to download and manage PostgreSQL.

pg_durable has two test suites:

pg_regress Tests (Standard PostgreSQL Regression Tests)

Fast, deterministic tests for core DSL functionality using PostgreSQL's standard testing framework. Test SQL lives in sql/, expected output in expected/, and PGXS is configured in the root Makefile.

make test-regress          # full reset + run
make installcheck          # run only (PostgreSQL must already be running)

E2E Tests (Comprehensive Scenario Tests)

Complex local integration tests with pgrx PostgreSQL:

./scripts/test-e2e-local.sh                                                  # All local SQL E2E tests, including special restart/config phases
./scripts/test-e2e-local.sh 04_parallel                                      # Specific test
./scripts/test-e2e-local.sh --default-build-phases                            # Only the default-build phase group

See tests/e2e/ for details.

  • User Guide — Complete usage guide with examples
  • MVP Guide — Implementation details and internals
  • Examples — Example conventions and smoke-check guidance

pg_durable is a PostgreSQL extension (built with pgrx) — everything runs inside the PostgreSQL server, no external services. The extension exposes a SQL DSL for building function graphs and registers a background worker that executes them durably on top of two lower-level Rust libraries:

  • duroxide — a durable task framework providing the orchestration runtime (deterministic replay, checkpoints, sub-orchestrations, timers).
  • duroxide-pg — a PostgreSQL-backed state provider for duroxide. It persists runtime state (instances, history, work queues) in a dedicated duroxide.* schema owned by the extension.
┌────────────────────────────────────────────────────────────────────┐
│                             PostgreSQL                             │
│                                                                    │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                 pg_durable extension (pgrx)                  │  │
│  │                                                              │  │
│  │  SQL DSL     'sql' |=> 'name' ~> 'sql2'                      │  │
│  │              df.if() | df.join() | df.loop()                 │  │
│  │                                                              │  │
│  │  Background worker (hosts the duroxide runtime in-process)   │  │
│  │  ┌────────────────────────────────────────────────────────┐  │  │
│  │  │  duroxide        (orchestration runtime)               │  │  │
│  │  │  ┌──────────────────────────────────────────────────┐  │  │  │
│  │  │  │  duroxide-pg   (PostgreSQL state provider)       │  │  │  │
│  │  │  └──────────────────────────────────────────────────┘  │  │  │
│  │  └────────────────────────────────────────────────────────┘  │  │
│  └──────────────────────────────────────────────────────────────┘  │
│                                                                    │
│  Schemas                                                           │
│    df.*         DSL graphs (nodes, instances, vars)                │
│    duroxide.*   runtime state (owned by duroxide-pg)               │
└────────────────────────────────────────────────────────────────────┘

If you'd rather author durable functions in Rust, Python, or Node while still persisting state in PostgreSQL, you can use duroxide and duroxide-pg directly from your host language — pg_durable is what you'd build on top of that pair when you'd prefer authoring in SQL.

Preview - This project is currently in preview.

Use GitHub Issues for bug reports and feature requests. Do not report security vulnerabilities through public GitHub issues; follow the instructions in SECURITY.md instead.

This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact [email protected] with questions or comments.

Microsoft takes the security of our software products and services seriously. Please do not report security vulnerabilities through public GitHub issues. See SECURITY.md for security reporting instructions.

pg_durable does not send telemetry to Microsoft.

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third-party policies.

PostgreSQL License

联系我们 contact @ memedata.com