展示HN:Gambit,一个用于构建可靠AI代理的开源代理框架。
Show HN: Gambit, an open-source agent harness for building reliable AI agents

原始链接: https://github.com/bolt-foundry/gambit

## Gambit:构建可靠的LLM工作流 Gambit 是一种通过将复杂任务分解为更小、可管理的“牌组”(decks)来创建健壮 LLM 应用程序的工具。这些牌组定义了清晰的输入、输出和安全保障,从而提高可预测性和可测试性。与传统的长提示方法不同,Gambit 鼓励模块化设计,无缝地混合 LLM 调用和标准计算任务。 主要功能包括本地执行、流式追踪和内置调试 UI,方便故障排除——摆脱对提供商日志的依赖。您可以使用简单的 Markdown 或 TypeScript 定义工作流,并利用 Zod 等工具进行输入/输出验证。 Gambit 提供了一个 CLI 用于运行牌组(使用 `npx` 或 `deno`),一个 REPL 用于交互式测试,以及一个带有可视化界面的调试服务器。它支持追踪和状态持久化,以便进行详细分析。目标是使 LLM 应用程序开发更像传统的软件工程,具有改进的可观察性和控制性。它旨在通过在每个步骤只向模型提供必要的信息来降低成本和减少幻觉。

## Gambit:一个开源代理框架 Gambit 是一个新的开源“代理框架”,旨在简化可靠 AI 代理的开发。与传统代理框架高度依赖 LLM 调用*之间*的计算步骤不同,Gambit 优先考虑 LLM 驱动的逻辑,并将计算步骤集成在*流程之内*。 代理使用 markdown 或 TypeScript 定义,允许模块化以及代理(“牌组”)之间的类型安全接口。一个关键特性是通过“评分员”进行自动评估——这些代理被设计用来评估对话并确保质量,包括防止 PII 泄露等问题。Gambit 还支持定义测试代理以生成合成数据。 创建者在开发基于 LLM 的视频编辑器时,在使用现有工具时遇到困难,因此构建了 Gambit,专注于提高推理时间和 LLM 质量。他们设想 Gambit 将能够实现真正开源的代理、快速的机器人原型设计(利用如 Claude Code 等模型),以及比当前“UI 驱动”项目更具扩展性的平台。 演示视频:[https://youtu.be/J_hQ2L_yy60](https://youtu.be/J_hQ2L_yy60)
相关文章

原文

Gambit helps you build reliable LLM workflows by composing small, typed “decks”
with clear inputs/outputs and guardrails. Run decks locally, stream traces, and
debug with a built-in UI.

Requirements: Node.js 18+ and OPENROUTER_API_KEY (set OPENROUTER_BASE_URL if
you proxy OpenRouter-style APIs).

Run the CLI directly with npx (no install):

export OPENROUTER_API_KEY=...
npx @bolt-foundry/gambit init

Downloads example files (hello decks plus the examples/ gallery) and sets environment variables.

Run an example in the terminal (repl):

npx @bolt-foundry/gambit repl gambit/hello.deck.md

This example just says "hello" and repeats your message back to you.

Run an example in the browser (serve):

npx @bolt-foundry/gambit serve gambit/hello.deck.md
open http://localhost:8000/debug

  • Most teams wire one long prompt to several tools and hope the model routes
    correctly.
  • Context often arrives as a single giant fetch or RAG blob, so costs climb and
    hallucinations slip in.
  • Input/outputs are rarely typed, which makes orchestration brittle and hard to
    test offline.
  • Debugging leans on provider logs instead of local traces, so reproducing
    failures is slow.
  • Treat each step as a small deck with explicit inputs/outputs and guardrails;
    model calls are just one kind of action.
  • Mix LLM and compute tasks interchangeably and effortlessly inside the same
    deck tree.
  • Feed models only what they need per step; inject references and cards instead
    of dumping every document.
  • Keep orchestration logic local and testable; run decks offline with
    predictable traces.
  • Ship with built-in observability (streaming, REPL, debug UI) so debugging
    feels like regular software, not guesswork.

Use the CLI to run decks locally, stream output, and capture traces/state.

Run with npx (no install):

npx @bolt-foundry/gambit <command>

Run a deck once:

npx @bolt-foundry/gambit run <deck> --init <json|string> --message <json|string>

Drop into a REPL (streams by default):

npx @bolt-foundry/gambit repl <deck>

Run a persona against a root deck (test bot):

npx @bolt-foundry/gambit test-bot <root-deck> --test-deck <persona-deck>

Grade a saved session:

npx @bolt-foundry/gambit grade <grader-deck> --state <file>

Start the Debug UI server:

npx @bolt-foundry/gambit serve <deck> --port 8000

Tracing and state: 

--trace <file> for JSONL traces
--verbose to print events
--state <file> to persist a session.

The simulator is the local Debug UI that streams runs and renders traces.

Run with npx (no install):

npx @bolt-foundry/gambit <command>

Start it:

npx @bolt-foundry/gambit serve <deck> --port 8000

Then open:

It also serves:

http://localhost:8000/test-bot
http://localhost:8000/calibrate

The Debug UI shows transcript lanes plus a trace/tools feed. If the deck has an
inputSchema, the UI renders a schema-driven form with defaults and a raw JSON
tab. Local-first state is stored under .gambit/ (sessions, traces, notes).

Use the library when you want TypeScript decks/cards or custom compute steps.

Import the helpers from JSR:

import { defineDeck, defineCard } from "jsr:@bolt-foundry/gambit";

Define inputSchema/outputSchema with Zod to validate IO, and implement
run/execute for compute decks. To call a child deck from code, use
ctx.spawnAndWait({ path, input }). Emit structured trace events with
ctx.log(...).


Minimal Markdown deck (model-powered): hello_world.deck.md

+++
label = "hello_world"

[modelParams]
model = "openai/gpt-4o-mini"
temperature = 0
+++

You are a concise assistant. Greet the user and echo the input.

Run it:

npx @bolt-foundry/gambit run ./hello_world.deck.md --init '"Gambit"' --stream

Compute deck in TypeScript (no model call): echo.deck.ts

// echo.deck.ts
import { defineDeck } from "jsr:@bolt-foundry/gambit";
import { z } from "zod";

export default defineDeck({
  label: "echo",
  inputSchema: z.object({ text: z.string() }),
  outputSchema: z.object({ text: z.string(), length: z.number() }),
  run(ctx) {
    return { text: ctx.input.text, length: ctx.input.text.length };
  },
});

Run it:

npx @bolt-foundry/gambit run ./echo.deck.ts --init '{"text":"ping"}'

Deck with a child action (calls a TypeScript tool): agent_with_time.deck.md

+++
label = "agent_with_time"
modelParams = { model = "openai/gpt-4o-mini", temperature = 0 }
[[actionDecks]]
name = "get_time"
path = "./get_time.deck.ts"
description = "Return the current ISO timestamp."
+++

A tiny agent that calls get_time, then replies with the timestamp and the input.

And the child action: get_time.deck.ts

// get_time.deck.ts
import { defineDeck } from "jsr:@bolt-foundry/gambit";
import { z } from "zod";

export default defineDeck({
  label: "get_time",
  inputSchema: z.object({}), // no args
  outputSchema: z.object({ iso: z.string() }),
  run() {
    return { iso: new Date().toISOString() };
  },
});

Run it:

npx @bolt-foundry/gambit run ./agent_with_time.deck.md --init '"hello"' --stream

If you prefer Deno, use the Deno commands below.

Quickstart:

export OPENROUTER_API_KEY=...
deno run -A jsr:@bolt-foundry/gambit/cli init

Run a deck:

deno run -A jsr:@bolt-foundry/gambit/cli run <deck> --init <json|string> --message <json|string>

Start the Debug UI:

deno run -A jsr:@bolt-foundry/gambit/cli serve <deck> --port 8000
联系我们 contact @ memedata.com