```Show HN: 在 Claude、Codex 和 Cursor 中直接实现智能模型路由```
Show HN: Smart model routing directly in Claude, Codex and Cursor

原始链接: https://github.com/workweave/router

Weave 是一个通用的开源路由层,充当您的 AI 开发工具(如 Claude Code、Cursor、Codex 和 opencode)与任何大模型(LLM)提供商之间的智能中介。 **核心功能:** * **智能路由:** 使用基于“Avengers-Pro 2”衍生的评分机制,为每个请求选择最优模型。 * **通用兼容性:** 原生支持 Anthropic、OpenAI 和 Gemini API,并通过 OpenRouter 全面支持开源模型(Llama、DeepSeek、Mistral 等)。 * **安全隐私:** 遵循“自带密钥”(BYOK)模式;提供商密钥会加密存储在您的本地机器上。 * **可观测性:** 内置 OTLP 追踪功能,兼容 Weave 仪表盘、Honeycomb、Datadog 和 Grafana。 * **零门槛设置:** 使用 `npx` 安装程序,只需一条命令即可将您常用的工具指向 Weave 路由器。简单设置下无需 Docker 或 Postgres。 无论是在本地运行还是自托管,Weave 都提供了一个统一的接口,用于模型切换、基于 Token 的速率限制和推测性分发,使您无需重新配置 IDE 或 CLI 工具即可切换后端。通过 CLI 命令或直接 API 调用,即可轻松开启或关闭路由功能。

Weave 推出了一款旨在优化 Claude Code 和 Cursor 等 AI 编程智能体成本的模型路由器。该路由器作为用户与大模型提供商之间的智能中间件,能够评估每一项请求并动态选择最合适的模型:对于简单任务使用速度更快、成本更低的替代模型(如 DeepSeek、GLM),对于复杂推理任务则使用前沿模型(如 Opus、GPT)。 其路由逻辑由基于数千条智能体追踪记录训练的强化学习模型驱动,旨在优先确保任务的成功完成。据 Weave 内部使用报告显示,在不牺牲性能的前提下,Token 成本降低了 40%。该工具支持以 Elastic License 2.0 协议进行自托管,或通过其托管服务 weaverouter.com 使用。 Hacker News 上的讨论对此持怀疑态度,用户质疑其对缓存效率的影响、不同模型间切换的可靠性,以及该功能与 Cursor “自动”模式等现有原生解决方案有何区别。
相关文章

原文

Point Claude Code, Codex, Cursor, or your own app at localhost:8080. The router:

  • 🎯 Routes per request. A cluster scorer derived from Avengers-Pro 2 picks the right model from your enabled providers, every turn.
  • 🔌 Speaks everyone's API. Anthropic Messages, OpenAI Chat Completions, Gemini native. Streaming, tools, vision, the works.
  • 🧠 Knows OSS too. DeepSeek, Kimi, GLM, Qwen, Llama, Mistral via OpenRouter (or any OpenAI-compatible endpoint).
  • 🔒 BYOK by default. Provider keys stay on your box, encrypted at rest.
  • 📊 Observable. OTLP traces out of the box. See your dashboard in the Weave dashboard (http://localhost:8080/ui/dashboard) or drop in Honeycomb, Datadog, Grafana, whatever.

The fastest way: point Claude Code, Codex, or opencode at the hosted Weave Router with one command. No clone, no Docker, no Postgres.

That's it. The installer asks which tool (Claude Code, Codex, or opencode), walks you through scope (user vs. project), grabs a router key, and wires the right config file. Other flavors:

npx @workweave/router --claude              # skip the picker, Claude Code
npx @workweave/router --codex               # skip the picker, OpenAI Codex CLI
npx @workweave/router --opencode            # skip the picker, opencode
npx @workweave/router --scope project       # per-repo, commits settings.json (or .codex/ / opencode.json)
npx @workweave/router --local               # self-hosted localhost:8080
npx @workweave/router --base-url https://router.acme.internal
npx @workweave/[email protected]                 # pin a version

Requires Node ≥ 18 (Claude Code and opencode paths also need jq). Full flag reference: install/npm/README.md.

Or: self-host the whole stack

If you want the router (and dashboard) running on your own box:

# 1. Drop a provider key in. OpenRouter is the recommended baseline.
echo "OPENROUTER_API_KEY=sk-or-v1-..." >> .env.local

# 2. Boot Postgres + router on :8080 and seed an rk_ key.
make full-setup

The router is up at http://localhost:8080, the dashboard at http://localhost:8080/ui/ (password: admin), and your rk_... key prints in the logs.

# Call it like Anthropic
curl -sS http://localhost:8080/v1/messages \
  -H "Authorization: Bearer rk_..." \
  -d '{"model":"claude-sonnet-4-5","max_tokens":256,
       "messages":[{"role":"user","content":"hi"}]}'

# ...or like OpenAI
curl -sS http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer rk_..." \
  -d '{"model":"gpt-4o-mini",
       "messages":[{"role":"user","content":"hi"}]}'

# Peek at the routing decision without proxying
curl -sS http://localhost:8080/v1/route -H "Authorization: Bearer rk_..." -d '...'

Claude Code. Run make install-cc to wire Claude Code at the local self-hosted router (it's also invoked automatically at the end of make full-setup). For the hosted router, use npx @workweave/router above.

Codex (OpenAI CLI). npx @workweave/router --codex patches ~/.codex/config.toml (or <repo>/.codex/config.toml with --scope project) with a managed [model_providers.weave] block and sets model_provider = "weave". Codex's existing OPENAI_API_KEY flows through to api.openai.com for the plan-based passthrough; the router key rides in an X-Weave-Router-Key HTTP header. Re-install and --uninstall --codex rewrite/remove only the managed block, leaving the rest of your Codex config untouched.

opencode. npx @workweave/router --opencode merges a provider.weave entry into ~/.config/opencode/opencode.json (or <repo>/opencode.json with --scope project). It uses opencode's bundled @ai-sdk/anthropic provider pointed at the router's /v1 endpoint — the router speaks the Anthropic Messages API natively, so opencode works unmodified. The router key and identity headers ride alongside the provider config; re-install rewrites only the managed block and --uninstall --opencode strips it.

Cursor (early beta, performance may not be the best). Settings → Models → Override OpenAI Base URLhttp://localhost:8080/v1, paste rk_... as the API key.

Switching on/off. After installing, npx @workweave/router off --claude (or --codex / --opencode) routes that client straight to its provider again without discarding the router config; on flips it back, and status reports which way it's pointing. Claude Code also gets /router-off, /router-on, and /router-status slash commands. Cursor toggles via the same Settings → Models override above. See install/README.md.

Two keys, don't mix them up:

  • sk-or-... / sk-ant-... / sk-... = your upstream provider key. Lives in .env.local.
  • rk_... = your router key. Clients send this as a Bearer token.
Endpoint Format
POST /v1/messages Anthropic Messages, routed
POST /v1/chat/completions OpenAI Chat Completions, routed
POST /v1beta/models/:action Gemini generateContent, routed
POST /v1/route Returns the decision, no upstream call
GET /v1/models  ·  POST /v1/messages/count_tokens Anthropic passthrough
GET /health  ·  GET /validate liveness + key check
  • 📐 Configuration reference: every env var, BYOK encryption, OTel knobs, cluster routing.
  • 🛠️ Contributing: layering rules, hot-reload dev, migrations, tests, the whole engineering loop.
  • 🏗️ Architecture: package layout, import contracts, recipes for adding endpoints / providers / strategies.
  • Token-aware rate limiting (Redis sliding window per installation)
  • Sub-installations for tenant hierarchies
  • Speculative dispatch + hedging for tail latency

联系我们 contact @ memedata.com