Show HN: Ktx – 用于数据代理的开源可执行上下文层
Show HN: Ktx – Open-source executable context layer for data agents

原始链接: https://github.com/Kaelio/ktx

**ktx** 是一个自我优化的上下文层,旨在帮助 AI 智能体准确查询数据仓库。与难以处理不一致指标的通用智能体,或是需要持续人工维护的传统语义层不同,ktx 通过连接原始数据、业务知识和大型语言模型(LLM),实现了这一过程的自动化。 **核心功能包括:** * **知识摄入:** 自动整理来自 Wiki 和 Notion 的文档,并标记其中的矛盾之处。 * **语义映射:** 对数据库模式进行采样,以检测可连接的列并解决复杂的 SQL 陷阱(如扇形/深渊陷阱)。 * **智能体集成:** 提供 CLI 和 MCP(模型上下文协议)服务器,为智能体提供统一且可搜索的界面,涵盖语义指标和公司背景信息。 * **隐私至上:** ktx 在本地运行,确保模式和数据保留在您的机器上;仅将必要的上下文发送至您配置的 LLM 提供商。 它支持主流数据库(PostgreSQL、Snowflake、BigQuery 等),并与 dbt 和 Looker 等工具无缝集成。对于使用 Claude Code 或 Cursor 等智能体的开发者而言,它是理想选择,能够确保生成可靠、规范的 SQL,而非“幻觉”产生的指标逻辑。

**Ktx** 是一种全新的开源“可执行上下文层”,旨在解决人工智能数据代理(AI data agents)中固有的可靠性问题。虽然代理在生成 SQL 方面表现出色,但常因数据陈旧、连接扇出(join fanouts)及复杂的归因逻辑而失败。 传统的处理方法(如向代理提供维基文档,或使用僵化的传统语义层)往往效果不佳。Ktx 通过“两部分架构”将业务上下文与查询执行解耦,从而提高了准确性: 1. **上下文知识:** 通过 Markdown 摄入来自内部文档(如 Notion)的非结构化数据。 2. **结构化定义:** 使用 YAML 文件定义表、指标、维度及关系元数据。 代理无需编写原始 SQL,而是直接向 Ktx 查询特定指标。Ktx 规划器负责处理繁重的工作,包括选择最佳连接路径、捕获诸如扇出之类的常见 SQL 错误,并结合所提供的结构化与非结构化数据,编译成最终的仓库查询语句。 Ktx 采用 Apache 2.0 许可证,支持主流数据仓库(Snowflake、BigQuery、Postgres),并可与 dbt 和 LookML 等现有建模工具集成。开发者可以通过 npm 安装它,或将其添加为代理技能,以提升在生产级数据任务中的表现。
相关文章

原文

npm version Codecov Tests Documentation Join the ktx Slack community License Y Combinator P25

Quickstart · CLI Reference · Agent Setup · Slack


ktx is a self-improving context layer that teaches agents how to query your warehouse accurately - from approved metric definitions, joinable columns, and business knowledge it builds and maintains for you.

Note

Run ktx with your own LLM API keys or a Claude Pro/Max subscription. No extra usage billing from ktx.

ktx ingestion flow from source systems through validation to wiki and semantic-layer outputs

General-purpose agents struggle on data tasks. They re-explore your warehouse on every question, invent their own metric logic, and return numbers that don't match approved definitions.

Traditional semantic layers don't fix this. They demand constant manual upkeep and don't absorb the rest of your company's knowledge.

ktx does both, automatically:

  • Learns from company knowledge. Ingests wiki content, organizes it, removes duplicates, and flags contradictions for human review.
  • Maps the data stack. Samples tables, captures metadata and usage patterns, detects joinable columns, and annotates sources so agents write better queries.
  • Builds a semantic layer. Combines raw tables and high-level metrics through a join graph that automatically resolves chasm and fan traps, so agents fetch metrics declaratively instead of rewriting canonical SQL each time.
  • Serves agents at execution. Exposes CLI and MCP tools with combined full-text and semantic search across wiki and semantic-layer entities.
General-purpose agent Traditional semantic layer ktx
Builds warehouse context automatically
Detects joinable columns + resolves fan/chasm traps Manual
Approved, reusable metric definitions
Absorbs wiki / Notion / team knowledge
Flags contradictions across sources
Ships CLI + MCP for agent execution Partial
Read-only by design n/a n/a

Use ktx if you:

  • Want agents like Claude Code, Codex, Cursor, or OpenCode to query your warehouse with approved metric definitions
  • Have business knowledge scattered across dbt, Looker, Metabase, Notion, and team wikis
  • Need agents to reuse canonical SQL instead of inventing it on every prompt

Skip ktx if you:

  • You don't have a SQL warehouse - ktx sits on top of one
  • You only need one ad-hoc query - psql or a notebook will do

Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.

npm install -g @kaelio/ktx
ktx setup
ktx status

ktx setup creates or resumes a local ktx project, configures providers and connections, builds context, and installs agent integration.

Example ktx status after setup:

ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)

Tip

Already using an agent? Ask Claude Code, Codex, Cursor, or OpenCode from your project directory:

Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
and configure ktx in this project.

Important

If ktx status prints ktx mcp start --project-dir ..., run it before opening your agent client.

Command Purpose
ktx setup Create, resume, or update a ktx project
ktx status Check project readiness
ktx ingest Build context for every configured connection
ktx sl "revenue" Search semantic sources
ktx wiki "refund policy" Search local wiki pages
ktx mcp start Start the MCP server for agent clients

See the CLI Reference for every command, flag, and option.

my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources
├── wiki/global/                     # Shared business context
├── wiki/user/<user-id>/             # User-scoped notes
├── raw-sources/<connection-id>/     # Ingest artifacts and reports
└── .ktx/                            # Local state and secrets, git-ignored

Commit ktx.yaml, semantic-layer/, and wiki/. Keep .ktx/ local.

Project resolution defaults to KTX_PROJECT_DIR, then the nearest ktx.yaml, then the current directory. Pass --project-dir <path> when scripting.

  • Does ktx send my schema or query results to a hosted service? No. ktx runs locally. The only data leaving your machine is what you send to the LLM provider you configured.
  • Which LLM backends are supported? Anthropic API, Google Vertex AI, AI Gateway, and the local Claude Code session through the Claude Agent SDK. See LLM configuration.
  • How is ktx different from a dbt or MetricFlow semantic layer? ktx ingests those layers and combines them with raw-table introspection and wiki content. Agents get one searchable surface instead of three disconnected ones - and ktx flags contradictions across sources.
  • Does ktx need a running server? There is no hosted service. The local MCP daemon runs on demand via ktx mcp start when an agent client needs it.
  • Is my warehouse safe? Yes. Connections are read-only - ktx never writes to your database.
  • Slack — ask questions, share what you're building, and chat with maintainers.
  • GitHub Issues — report bugs and request features.
  • Contributing — set up the repo, run tests, and open a PR.
git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
pnpm run check

ktx is a pnpm + uv workspace:

Path Purpose
packages/cli TypeScript CLI and published npm package source
packages/cli/src/context Core context engine
packages/cli/src/llm LLM and embedding providers
packages/cli/src/connectors Database scan connectors
python/ktx-sl Semantic-layer query planning
python/ktx-daemon Portable compute service

Local development CLI:

pnpm run setup:dev
pnpm run link:dev
ktx-dev --help

Useful checks:

pnpm run type-check
pnpm run test
pnpm run dead-code
uv run pytest -q

ktx collects anonymous usage telemetry from interactive CLI runs to improve setup, command reliability, and data-agent workflows. No file paths, hostnames, SQL, schema names, error messages, or argv are recorded. See Telemetry for the event catalog and opt-out options.

ktx is licensed under the Apache License, Version 2.0. See LICENSE.

ktx Star History Chart

联系我们 contact @ memedata.com