Show HN：Moose——一个用于构建基于 ClickHouse 的分析型后端的开源框架

Show HN：Moose——一个用于构建基于 ClickHouse 的分析型后端的开源框架
Show HN: Moose – OSS framework to build analytical back ends with ClickHouse

原始链接: https://docs.fiveonefour.com/moose

Moose 使用 TypeScript 或 Python 简化了分析后端的构建。它通过提供一个统一的框架来消除集成 Kafka、ClickHouse 和 Airflow 等多种工具的复杂性，在这个框架中，您的代码同时定义了应用程序逻辑和数据基础设施。 Moose 自动管理数据库模式更新、API 验证和消息格式，从而防止不一致并减少开发时间。单个模型定义在任何地方都使用，消除了跨代码、数据库和 API 的重复更新。使用 Moose，您可以使用单个命令启动一个本地、完全集成的数仓栈（包括 ClickHouse、Redpanda 和 Temporal）。它支持热重载，允许开发人员立即看到整个管道中反映的变化。Moose 是模块化的，使用户能够选择和配置他们需要的特定组件。它非常适合构建面向用户的分析、数据仓库、数据迁移、事件流应用程序和 ETL 工作负载。

Moose，一个旨在简化 ClickHouse 分析后端构建的开源框架，正在 Hacker News 上引发热议。其创始人 Callicles 强调了 Moose 简化数据工作流程（从数据摄取到 API 交付）的能力，使团队能够专注于数据洞察，而不是复杂的工具集成。一个关键特性是本地开发体验，可以使用真实数据进行即时测试，并确保与生产环境行为一致。用户正在询问生产用例，据透露，Moose 已经通过 Boreal（一个 Moose 托管解决方案）为 F45 Training 的 LionHeart 心率追踪系统提供后端支持。F45 的工程主管证实了其成功的扩展性。虽然目前专注于 ClickHouse，但人们也有兴趣将其支持扩展到其他 OLAP 提供商，例如 TimescaleDB。早期用户称赞其简洁性和显著减少开发时间的能力，一些人称其为简化数据管道的巧妙抽象。

Show HN：AGX——ClickHouse 开源数据探索工具（新的标准？） 2025-03-19

Show HN：Xorq——开源的、优先使用 Python 的、Pandas 风格的数据管道 2025-03-27

（评论） 2025-04-22

ClickHouse 更懒惰（也更快）：介绍惰性物化 2025-04-22

ClickHouse 中的一年 Rust 经验 2025-04-09

原文

Moose

bash -i <(curl -fsSL https://fiveonefour.com/install.sh) moose

What is Moose?

Moose lets you develop analytical backends in pure TypeScript or Python code like this:

import { Key, OlapTable, Stream, IngestApi, ConsumptionApi } from "@514labs/moose-lib";
 
interface DataModel {
  primaryKey: Key<string>;
  name: string;
}
// Create a ClickHouse table
export const clickhouseTable = new OlapTable<DataModel>("TableName");
 
// Create a Redpanda streaming topic
export const redpandaTopic = new Stream<DataModel>("TopicName", {
  destination: clickhouseTable,
});
 
// Create an ingest API endpoint
export const ingestApi = new IngestApi<DataModel>("post-api-route", {
  destination: redpandaTopic,
});
 
// Create consumption API endpoint
interface QueryParams {
  limit?: number;
}
export const consumptionApi = new ConsumptionApi<QueryParams, DataModel>("get-api-route", {
  async handler({limit = 10}: QueryParams, {client, sql}): {
    const result = await client.query.execute(sql`SELECT * FROM ${clickhouseTable} LIMIT ${limit}`);
    return await result.json();
  }
});

from moose_lib import Key, OlapTable, Stream, StreamConfig, IngestApi, IngestApiConfig, ConsumptionApi
from pydantic import BaseModel
 
class DataModel(BaseModel):
    primary_key: Key[str]
    name: str
 
# Create a ClickHouse table
clickhouse_table = OlapTable[DataModel]("TableName")
 
# Create a Redpanda streaming topic
redpanda_topic = Stream[DataModel]("TopicName", StreamConfig(
    destination=clickhouse_table,
))
 
# Create an ingest API endpoint
ingest_api = IngestApi[DataModel]("post-api-route", IngestApiConfig(
    destination=redpanda_topic,
))
 
# Create a consumption API endpoint
class QueryParams(BaseModel):
    limit: int = 10
 
def handler(client, params: QueryParams):
    return client.query.execute("SELECT * FROM {table: Identifier} LIMIT {limit: Int32}", {
        "table": clickhouse_table.name,
        "limit": params.limit,
    })
 
consumption_api = ConsumptionApi[RequestParams, DataModel]("get-api-route", query_function=handler)

Core Capabilities

Why Moose Exists

Building Analytical Backends With Today's Tooling is Slow

Tool fragmentation

More time spent integrating Kafka, ClickHouse, Postgres, dbt, Airflow, and a dozen other services instead of building your actual application

Schema drift everywhere

Your TypeScript or Python models, database schemas, API validation, and message formats all diverge over time

Painful development workflow

No local testing, long deployment cycles, and constant context switching

SQL-only processing

Having to use SQL for everything when you'd rather use languages you're already comfortable with

The DIY Approach

What if you need to add a simple string field to your data model?

Update your TS/Python Code Model

Update your Database Schema

Update your Runtime Validation

Update your transformations & queries

Can you see the problem? This process repeats for every change.

You also have to test that everything is working together in a safe, isolated dev environment, which is even more painful with so many moving parts.

What Moose Does

With Moose, your TypeScript and Python code is the single source of truth for both your data application logic AND your data infrastructure:

// Define your model ONCE
import { Key, IngestPipeline } from "@514labs/moose-lib";
 
interface ExampleModel {
  primaryKey: Key<string>;
  name: string;
  nested: {
    isActive: boolean;
    value: number;
    internalName: string;
  }[];
  createdAt: Date;
}
 
// And use it EVERYWHERE - ONE line to wire everything up
export const examplePipeline = new IngestPipeline<ExampleModel>("example", {
  ingest: true,    // Creates API endpoint with validation
  stream: true,    // Creates properly structured Redpanda topic
  table: true      // Creates ClickHouse database table
});

You get end-to-end infrastructure for your data pipeline that is:

Defined purely in TypeScript or Python code

Completely type safe and validated

Able to catch errors at dev time, not at runtime

No More Context Switching

“Does my Database table use snake_case or camelCase?”

“Did I add the new field to both the model AND the table?”

“Which database field was nullable again?”

Local Dev in Seconds

Local Dev Benefits

One-command startup

Launch your entire data infrastructure locally with a single command

Zero configuration

All components come pre-configured and fully integrated - no setup required

Production parity

Use the same technologies and logic that will run in production

Real-time feedback

See your changes reflected instantly throughout the stack

Moose comes with a pre-configured and fully integrated data stack that runs entirely on your laptop. Spin it up with one command:

All the infrastructure is automatically spun up:

⡏ Starting local infrastructure
  Successfully started containers
     Validated clickhousedb-1 docker container
     Validated redpanda-1 docker container
  Successfully validated red panda cluster
     Validated temporal docker container
  Successfully ran local infrastructure

Common workflow scenarios

Need to add or change a model?

Moose hot reloads it to your local infrastructure when you save

Added a new field?

It's instantly available in your API, streams, and database

Need to test your pipeline?

Send sample data to your local ingest API and see the data flow through

Hot Reloading Dev Workflow in Action

Make a change to your model:

// Add a field to your model
interface ExampleModel {
  primaryKey: Key<string>;
  name: string;
  nested: {
    isActive: boolean;
    value: number;
    internalName: string;
  }[];
  createdAt: Date;
  status: string; // New field
}

Hit save and youll see your changes hot reloaded to your local infrastructure:

⢹ Processing Infrastructure changes from file watcher
             ~ Topic: orders - Version: 0.0 - Retention Period: 604800s - Partition Count: 1
             ~ Table orders with column changes: [Added(Column { name: "status", data_type: String, required: true, unique: false, primary_key: false, default: None })] and order by changes: OrderByChange { before: [], after: [] }
             ~ Topic to Table Sync Process: orders_0_0 -> orders
             ~ API Endpoint: orders - Version: 0.0 - Path: ingest/orders - Method: POST - Format: Some(Json)

That's it! No additional steps needed.

Your API now validates this field

Your database schema is updated

Your streams carry the new field

Modularity

Moose is designed to be modular and configurable. You can pick and choose which components you need and configure them to your liking.

Default Moose Stack:

ClickHouse

OLAP database (always enabled)

Redpanda

Kafka-compatible event streaming platform (can be disabled)

Temporal

Workflow orchestration (can be disabled)

Planned Extensions:

Snowflake, Databricks, and BigQuery

Cloud-native data warehouses

Kafka, Kinesis, and Pulsar

Kafka-compatible event streaming platforms

Let us know if you’d like to see support for specific platforms in your stack.

What Can You Build with Moose?

Moose is ideal for a wide range of data-intensive applications, from real-time analytics to complex data pipelines:

User Facing Analytics

Embed leaderboards, charts, metrics, and other real-time features in your web or mobile apps

BI and Data Warehouses

Collect disparate data sources into an analytical database, produce custom reports, and more

Data Migrations

One-time migration of data from legacy systems to a modern data backend

Event Streaming

Real-time processing of events from Kafka, Redpanda, or other event streaming platforms

ETL Workloads

Repeated batch jobs to collect data from different sources and load them into an analytics environment