展示 HN：上下文模式 – 315 KB 的 MCP 输出在 Claude Code 中变为 5.4 KB

展示 HN：上下文模式 – 315 KB 的 MCP 输出在 Claude Code 中变为 5.4 KB
Show HN: Context Mode – 315 KB of MCP output becomes 5.4 KB in Claude Code

原始链接: https://github.com/mksglu/claude-context-mode

## 上下文模式：扩展 Claude 的上下文窗口 Claude Code 使用 MCP（模型上下文协议）工具会迅速消耗其 200K 上下文窗口，大量原始数据——快照、问题列表、日志——导致在短短 30 分钟后性能下降。**上下文模式** 通过压缩工具输出来解决这个问题，效仿 Cloudflare 压缩工具 *定义* 的成功经验。该系统可以作为插件安装，拦截大型输出并在沙箱中处理。它不是将原始数据发送给 Claude，而是只传递摘要或相关摘录，从而实现高达 **98%** 的上下文使用量减少。主要功能包括： * **沙箱执行：** 代码在隔离进程中运行，仅将 stdout 发送给 Claude。 * **意图驱动过滤：** 对于超过 5KB 的输出，上下文模式会索引完整输出，并仅返回与指定意图匹配的部分。 * **高效索引和搜索：** 利用 SQLite FTS5 和 BM25 排名，实现快速、相关的内容检索。 * **鼓励批量处理：** 逐步限制鼓励用户组合查询，以进一步提高效率。测试表明，节省显著：20 个 GitHub issue（59KB）变为 1.1KB，315KB 的输出减少到仅 5.4KB，从而显著延长会话长度。上下文模式还包括子代理的自动路由，优化工具使用，无需手动配置。

## Claude 代码上下文模式：摘要一项新工具“Claude 上下文模式”（github.com/mksglu）大幅减少发送到 Claude 代码的数据量，从而扩展了其可用的上下文窗口。来自 Playwright 和 GitHub issue 等工具的原始数据会消耗大量上下文（例如，Playwright 快照为 56KB）。此工具充当服务器，在沙箱中处理输出，并返回*摘要*而不是完整数据，将 315KB 缩小到仅 5.4KB。它支持 10 种语言，利用 SQLite 进行高效搜索，并支持批量执行。这会将性能下降前的会话时间从大约 30 分钟延长到大约 3 小时。该项目采用 MIT 许可，并可通过单个命令轻松安装。开发者正在寻求反馈，特别是来自那些遇到 Claude 代码上下文限制的用户。它也可能适用于其他 MCP 客户端，如 OpenCode。

原文

The other half of the context problem.

Every MCP tool call in Claude Code dumps raw data into your 200K context window. A Playwright snapshot costs 56 KB. Twenty GitHub issues cost 59 KB. One access log — 45 KB. After 30 minutes, 40% of your context is gone.

Inspired by Cloudflare's Code Mode — which compresses tool definitions from millions of tokens into ~1,000 — we asked: what about the other direction?

Context Mode is an MCP server that sits between Claude Code and these outputs. 315 KB becomes 5.4 KB. 98% reduction.

cc-04.mp4

claude mcp add context-mode -- npx -y context-mode

Restart Claude Code. Done.

Plugin install (includes auto-routing skill + subagent hook)

/plugin marketplace add mksglu/claude-context-mode
/plugin install context-mode@claude-context-mode

Installs the MCP server + a skill that automatically routes large outputs through Context Mode + a PreToolUse hook that injects context-mode routing into subagent prompts. No prompting needed.

Local development

claude --plugin-dir ./path/to/context-mode

MCP has become the standard way for AI agents to use external tools. But there is a tension at its core: every tool interaction fills the context window from both sides — definitions on the way in, raw output on the way out.

With 81+ tools active, 143K tokens (72%) get consumed before your first message. And then the tools start returning data. A single Playwright snapshot burns 56 KB. A gh issue list dumps 59 KB. Run a test suite, read a log file, fetch documentation — each response eats into what remains.

Code Mode showed that tool definitions can be compressed by 99.9%. Context Mode applies the same principle to tool outputs — processing them in sandboxes so only summaries reach the model.

Tool	What it does	Context saved
`batch_execute`	Run multiple commands + search multiple queries in ONE call.	986 KB → 62 KB
`execute`	Run code in 10 languages. Only stdout enters context.	56 KB → 299 B
`execute_file`	Process files in sandbox. Raw content never leaves.	45 KB → 155 B
`index`	Chunk markdown into FTS5 with BM25 ranking.	60 KB → 40 B
`search`	Query indexed content with multiple queries in one call.	On-demand retrieval
`fetch_and_index`	Fetch URL, convert to markdown, index.	60 KB → 40 B
`stats`	Session token tracking with per-tool breakdown.	—

Each execute call spawns an isolated subprocess with its own process boundary. Scripts can't access each other's memory or state. The subprocess runs your code, captures stdout, and only that stdout enters the conversation context. The raw data — log files, API responses, snapshots — never leaves the sandbox.

Ten language runtimes are available: JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, R. Bun is auto-detected for 3-5x faster JS/TS execution.

Authenticated CLIs work through credential passthrough — gh, aws, gcloud, kubectl, docker inherit environment variables and config paths without exposing them to the conversation.

When output exceeds 5 KB and an intent is provided, Context Mode switches to intent-driven filtering: it indexes the full output into the knowledge base, searches for sections matching your intent, and returns only the relevant matches with a vocabulary of searchable terms for follow-up queries.

How the Knowledge Base Works

The index tool chunks markdown content by headings while keeping code blocks intact, then stores them in a SQLite FTS5 (Full-Text Search 5) virtual table. Search uses BM25 ranking — a probabilistic relevance algorithm that scores documents based on term frequency, inverse document frequency, and document length normalization. Porter stemming is applied at index time so "running", "runs", and "ran" match the same stem.

When you call search, it returns relevant content snippets focused around matching query terms — not full documents, not approximations, the actual indexed content with smart extraction around what you're looking for. fetch_and_index extends this to URLs: fetch, convert HTML to markdown, chunk, index. The raw page never enters context.

Search results use intelligent extraction instead of truncation. Instead of returning the first N characters (which might miss the important part), Context Mode finds where your query terms appear in the content and returns windows around those matches. If your query is "authentication JWT token", you get the paragraphs where those terms actually appear — not an arbitrary prefix.

Progressive Search Throttling

The search tool includes progressive throttling to prevent context flooding from excessive individual calls:

Calls 1-3: Normal results (2 per query)
Calls 4-8: Reduced results (1 per query) + warning
Calls 9+: Blocked — redirects to batch_execute

This encourages batching queries via search(queries: ["q1", "q2", "q3"]) or batch_execute instead of making dozens of individual calls.

The stats tool tracks context consumption in real-time. Useful for debugging context usage during long sessions.

Metric	Value
Session uptime	2.6 min
Tool calls	5
Bytes returned to context	62.0 KB (~15.9k tokens)
Bytes indexed (stayed in sandbox)	140.5 KB
Context savings ratio	2.3x (56% reduction)

Tool	Calls	Context used
batch_execute	4	58.2 KB
search	1	3.8 KB

When installed as a plugin, Context Mode includes a PreToolUse hook that automatically injects routing instructions into subagent (Task tool) prompts. Subagents learn to use batch_execute as their primary tool and search(queries: [...]) for follow-ups — without any manual configuration.

Measured across real-world scenarios:

Playwright snapshot — 56.2 KB raw → 299 B context (99% saved) GitHub Issues (20) — 58.9 KB raw → 1.1 KB context (98% saved) Access log (500 requests) — 45.1 KB raw → 155 B context (100% saved) Context7 React docs — 5.9 KB raw → 261 B context (96% saved) Analytics CSV (500 rows) — 85.5 KB raw → 222 B context (100% saved) Git log (153 commits) — 11.6 KB raw → 107 B context (99% saved) Test output (30 suites) — 6.0 KB raw → 337 B context (95% saved) Repo research (subagent) — 986 KB raw → 62 KB context (94% saved, 5 calls vs 37)

Over a full session: 315 KB of raw output becomes 5.4 KB. Session time before slowdown goes from ~30 minutes to ~3 hours. Context remaining after 45 minutes: 99% instead of 60%.

Full benchmark data with 21 scenarios →

These prompts work out of the box. Run /context-mode stats after each to see the savings.

Deep repo research — 5 calls, 62 KB context (raw: 986 KB, 94% saved)

Research https://github.com/modelcontextprotocol/servers — architecture, tech stack,
top contributors, open issues, and recent activity. Then run /context-mode stats.

Git history analysis — 1 call, 5.6 KB context

Clone https://github.com/facebook/react and analyze the last 500 commits:
top contributors, commit frequency by month, and most changed files.
Then run /context-mode stats.

Web scraping — 1 call, 3.2 KB context

Fetch the Hacker News front page, extract all posts with titles, scores,
and domains. Group by domain. Then run /context-mode stats.

Documentation search — 2 calls, 1.8 KB context

Fetch the React useEffect docs, index them, and find the cleanup pattern
with code examples. Then run /context-mode stats.

Node.js 18+
Claude Code with MCP support
Optional: Bun (auto-detected, 3-5x faster JS/TS)

git clone https://github.com/mksglu/claude-context-mode.git
cd claude-context-mode && npm install
npm test              # run tests
npm run test:all      # full suite

MIT