展示 HN:Rocky – Rust SQL 引擎,具有分支、重放、列谱系功能。
Show HN: Rocky – Rust SQL engine with branches, replay, column lineage

原始链接: https://github.com/rocky-data/rocky

Rocky是一个基于Rust的控制平面,旨在为数据仓库管道带来信任和安全性(兼容Databricks & Snowflake)。它专注于数据质量和可靠性,通过诸如**编译时安全性**、**列级别血缘**以及**自动模式漂移检测与恢复**等功能,防止静默数据损坏。 主要功能包括:在数据写入*之前*强制执行的**数据契约**、使用命名分支进行**无风险实验**,以及带有内置验证循环的**AI驱动的模型生成**。Rocky提供CLI并与Dagster和VS Code等工具集成。 它专为本地开发而设计,使用DuckDB(游乐场无需凭据),并提供全面的文档和示例。Rocky是免费且开源的,旨在通过提供强大的错误检测和清晰的数据血缘,改进数据管道的开发和维护。你可以在GitHub上找到它:[https://github.com/rocky-data/rocky](https://github.com/rocky-data/rocky)。

## Rocky:基于Rust的数据仓库控制平面 Hugo Correia推出了Rocky,一个全新的开源Rust引擎,旨在管理数据仓库管道(目前支持Databricks、Snowflake、BigQuery和DuckDB)。与数据仓库本身不同,Rocky专注于拥有*数据管道图*——依赖关系、类型、血缘和治理——这些功能在现有堆栈中常常缺失。 主要特性包括**分支与重放**,为仓库数据提供类似Git的工作流程,**列级别血缘**由编译器直接跟踪,以及强大的**治理**工具,如列分类、遮蔽策略和审计跟踪。Rocky还提供**成本归因**和**编译时可移植性**检查,跨不同的仓库方言。 它与Dagster等工具集成用于编排,并且不打算取代现有的数据加载工具,如Fivetran。该项目最近在v1.16.0和v1.17.4版本中实现了全面的治理功能,并正在寻求反馈,尤其是在其信任系统框架和治理界面方面。
相关文章

原文

Rocky

Engine CI Dagster CI VS Code CI License: Apache 2.0

The trust system for your data. A Rust-based control plane for warehouse pipelines: branches, replay, column-level lineage, compile-time safety, per-model cost attribution. Keep Databricks or Snowflake. Bring Rocky for the DAG.

Rocky quickstart — create a project, compile, and run 3 models in under 15s

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.ps1 | iex
rocky playground my-first-project
cd my-first-project
rocky compile && rocky test && rocky run

No credentials needed — the playground runs end-to-end on local DuckDB.

Each demo below is a self-contained POC in examples/playground/pocs/cd in, run ./run.sh, reproduce locally.

Detects schema drift the moment it happens

A source column type changes upstream. On the next run, Rocky diffs source vs. target, drops the target, and recreates it. No silent data corruption, no dbt-style quiet divergence.

rocky run detects source type change and recreates the target

POC — 02-performance/06-schema-drift-recover

Enforces data contracts at compile time

Missing required columns, protected columns being removed, or unsafe type changes surface as diagnostic codes (E010, E013) before a single row is written.

rocky compile flags E010 and E013 contract violations on broken_metrics

POC — 01-quality/01-data-contracts-strict

Named branches for risk-free experiments

Create a branch, run against it in an isolated schema, inspect, then drop or promote. Column-level lineage shows the downstream blast radius before you ship.

rocky branch create, run on branch, and trace column lineage downstream

POC — 00-foundations/06-branches-replay-lineage

Column-level lineage, not table-level

Trace a single column from a downstream fact back through its aggregations, all the way to the seed. Blast-radius analysis without reading every model.

rocky lineage --column traces fct_revenue.total back to seeds.orders.amount

POC — 06-developer-experience/01-lineage-column-level

AI model generation with a compile-validate loop

Describe what you want in plain English. Rocky generates a Rocky DSL model, compiles it, and retries on parse failure — the Attempts: 2 line shows the loop catching a first-pass error invisibly.

rocky ai generates a .rocky model from natural language intent, Attempts: 2

POC — 03-ai/01-model-generation

Path Artifact Language Description
engine/ rocky CLI binary Rust Core SQL transformation engine — 20-crate Cargo workspace
integrations/dagster/ dagster-rocky PyPI wheel Python Dagster resource and component wrapping the Rocky CLI
editors/vscode/ Rocky VSIX TypeScript VS Code extension — LSP client + commands for AI features
examples/playground/ (config only) TOML / SQL Self-contained DuckDB sample pipeline used for smoke tests and benchmarks

Each subproject has its own README with detailed usage. The engine/README.md is the canonical product reference for the Rocky CLI.

git clone https://github.com/rocky-data/rocky.git
cd rocky
just build       # builds engine + dagster wheel + vscode extension
just test        # runs all test suites
just lint        # cargo clippy/fmt + ruff + eslint

just is optional — you can also build each subproject directly. See CONTRIBUTING.md for per-subproject build commands.

Each artifact is released independently using a tag-namespaced scheme:

  • engine-v* → Rocky CLI binary (cross-compiled, on GitHub Releases)
  • dagster-v*dagster-rocky wheel
  • vscode-v* → Rocky VSIX

See CONTRIBUTING.md for the full release flow.

Full documentation: rocky-data.dev — concepts, guides, CLI reference, Dagster integration, adapter SDK.

See CONTRIBUTING.md. Before opening a PR, please read the cross-project change guidance — schema and DSL changes must update consumers atomically.

Rocky is free and open source. If it saves your team time, consider sponsoring the project so development can continue.

Apache 2.0

联系我们 contact @ memedata.com