Show HN: Streambed – Stream Postgres to Iceberg on S3, Supports Postgres Wire

原始链接: https://github.com/viggy28/streambed

Streambed 是一个轻量级的 CDC(变更数据捕获)引擎,可将 Postgres 的 WAL 变更复制到 S3 上的 Apache Iceberg 数据湖中。它允许你在不修改应用程序、ETL 流水线或 Spark 基础设施的情况下,卸载生产数据库的分析查询负载。 **核心功能:** * **无缝复制:** 作为逻辑复制订阅者连接,将插入、更新和删除操作直接流式传输到 Parquet 文件中。 * **原生支持 Iceberg:** 自动将元数据提交至 Iceberg,确保数据可供任何兼容的查询引擎使用。 * **内置查询服务器:** 包含一个嵌入式 DuckDB 查询服务器,支持 Postgres 通信协议,允许你通过 `psql` 或标准的 Postgres 客户端直接运行分析查询。 * **简化运维:** 支持一致性快照回填(resync)及表管理命令,除 S3 和 Postgres 外无需其他外部依赖。 Streambed 为实时分析提供了一种“无 ETL”的方案,使开发人员能够在利用 S3 的可扩展性和 Iceberg 表格式强大功能的同时,像在 Postgres 中一样查询生产数据。需要 Go 1.22+ 版本。

Hacker News 最新 | 往日 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Show HN: Streambed – 将 Postgres 数据流式传输至 S3 上的 Iceberg,支持 Postgres 协议 (github.com/viggy28) 5 分,作者:vira28,49 分钟前 | 隐藏 | 往日 | 收藏 | 讨论 | 帮助 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

CI Go Reference License

Postgres-to-Iceberg CDC engine. Offload analytical queries from your production database without changing your application.

streambed streams WAL changes via logical replication, writes Parquet files to S3, and commits Iceberg metadata. Query the result with any Iceberg-compatible engine -- or use the built-in query server, which speaks the Postgres wire protocol so you can connect with psql.

Same analytical query on pgbench (1M accounts, 500K history rows). Postgres on the left, Streambed on the right.

Demo

No ETL. No Spark. Just Postgres + S3.

# Start Postgres + MinIO locally
docker compose up -d

# Build
go build -o streambed ./cmd/streambed

# Start syncing + query server on :5433
./streambed sync \
  --source-url="postgres://postgres:test@localhost:5432/postgres" \
  --s3-bucket="streambed" \
  --s3-endpoint="http://localhost:9000" \
  --s3-prefix="test" \
  --query-addr=:5433

# Query your Postgres tables via Iceberg
psql -h localhost -p 5433 -U postgres -d postgres

Run streambed sync --help for all configuration options. All flags support environment variables with STREAMBED_ prefix (e.g. STREAMBED_SOURCE_URL).

Architecture

Postgres WAL ──▶ Decode ──▶ Buffer ──▶ Parquet ──▶ S3 ──▶ Iceberg Commit
                                                              │
                                                    DuckDB ◀──┘ (query server)

Streambed connects to Postgres as a logical replication subscriber. It decodes WAL messages (inserts, updates, deletes), buffers rows per table, and periodically flushes them as Parquet files to S3 with Iceberg metadata commits. Updates and deletes use copy-on-write merging against existing Parquet data.

A query server exposes Iceberg tables over the Postgres wire protocol using embedded DuckDB, so you can query with psql or any Postgres client.

Command What it does
streambed sync Main daemon. Streams WAL, writes Iceberg, optionally serves queries.
streambed resync --table=public.users One-shot backfill via COPY under a consistent snapshot.
streambed query Standalone query server (no sync). Points at existing Iceberg tables.
streambed cleanup --table=public.users Deletes S3 objects and state for a table. Useful before resync.

Requires Go 1.22+ and CGO (for go-duckdb and go-sqlite3).

# Build
go build -o streambed ./cmd/streambed

# Unit tests
go test ./internal/... ./config/...

# Integration tests (requires Docker)
./scripts/test-integration.sh

Integration tests use the integration build tag and run against Postgres (port 5434) and MinIO (port 9002) from test/integration/docker-compose.yml.

联系我们 contact @ memedata.com