停止烧毁你的上下文窗口 – 我们如何将 Claude Code 中的 MCP 输出减少 98%

停止烧毁你的上下文窗口 – 我们如何将 Claude Code 中的 MCP 输出减少 98%
Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

## Claude 上下文模式：延长 AI 会话时长 Claude 代码使用 MCP 工具时，常常会迅速填满其 200K 上下文窗口，例如，Playwright 快照会占用 56KB，20 个 GitHub issue 占用 59KB。这限制了会话时长，仅 30 分钟后便会损失 40% 的上下文。**上下文模式** 通过充当 Claude 与工具输出之间的服务器，大幅减少数据大小——从 315KB 减少到仅 5.4KB（减少 98%）来解决这个问题。它通过一个安全的 **沙箱** 实现这一点，在隔离的进程中执行工具调用。只有 *输出* (stdout) 会传递给 Claude，从而防止大型原始数据（如日志或 API 响应）膨胀上下文。支持十种语言运行时，包括通过 Bun 优化的 JavaScript/TypeScript。内置的 **知识库** 使用 BM25 搜索索引 markdown 和网页内容，返回精确的代码块——而不是摘要——而无需将原始页面内容发送到上下文。在实际场景中的测试表明，输出大小显著减少（例如，56KB 快照减少到 299B）。这使可用会话时间从约 30 分钟延长到约 3 小时，45 分钟后保留 99% 的上下文。上下文模式易于安装为插件或直接通过 MCP，并且不需要更改现有工作流程。

## Claude 代码上下文窗口优化一位开发者 (mksglu) 详细说明了他们如何使用一种名为“上下文模式”的系统将 Claude 代码的 MCP 输出减少了 98%，相关细节在 GitHub 仓库 ([https://github.com/mksglu/claude-context-mode](https://github.com/mksglu/claude-context-mode)) 中概述。核心思想是将工具调用隔离到子进程中，*仅*将它们的标准输出输入到 200K 的上下文窗口中。这避免了因原始数据转储而使上下文膨胀。它利用 SQLite FTS5 和 BM25 排名来实现对相关信息的有效搜索和检索。主要改进包括自动升级 Bash 子代理以及在依赖 LLM 筛选“噪声”之前预先过滤信息。该系统通过精选的环境变量允许列表处理凭据传递，确保安全性，且在工具调用之间没有持久状态。评论者强调了这种方法对于管理复杂的、多步骤的工作流程的重要性，在这些工作流程中，累积的工具输出会迅速超出上下文限制，从而迫使做出次优决策，例如手动摘要或截断。

原文

Every MCP tool call in Claude Code dumps raw data into your 200K context window. A Playwright snapshot costs 56 KB. Twenty GitHub issues cost 59 KB. One access log — 45 KB. After 30 minutes, 40% of your context is gone.

Context Mode is an MCP server that sits between Claude Code and these outputs. 315 KB becomes 5.4 KB. 98% reduction.

The Problem

MCP became the standard way for AI agents to use external tools. But there's a tension at its core: every tool interaction fills the context window from both sides — definitions on the way in, raw output on the way out.

With 81+ tools active, 143K tokens (72%) get consumed before your first message. Then the tools start returning data. A single Playwright snapshot burns 56 KB. A gh issue list dumps 59 KB. Run a test suite, read a log file, fetch documentation — each response eats into what remains.

Cloudflare showed that tool definitions can be compressed by 99.9% with Code Mode. We asked: what about the other direction?

How the Sandbox Works

Each execute call spawns an isolated subprocess with its own process boundary. Scripts can't access each other's memory or state. The subprocess runs your code, captures stdout, and only that stdout enters the conversation context. The raw data — log files, API responses, snapshots — never leaves the sandbox.

Ten language runtimes are available: JavaScript, TypeScript, Python, Shell, Ruby, Go, Rust, PHP, Perl, R. Bun is auto-detected for 3-5x faster JS/TS execution.

Authenticated CLIs (gh, aws, gcloud, kubectl, docker) work through credential passthrough — the subprocess inherits environment variables and config paths without exposing them to the conversation.

How the Knowledge Base Works

The index tool chunks markdown content by headings while keeping code blocks intact, then stores them in a SQLite FTS5 (Full-Text Search 5) virtual table. Search uses BM25 ranking — a probabilistic relevance algorithm that scores documents based on term frequency, inverse document frequency, and document length normalization. Porter stemming is applied at index time so "running", "runs", and "ran" match the same stem.

When you call search, it returns exact code blocks with their heading hierarchy — not summaries, not approximations, the actual indexed content. fetch_and_index extends this to URLs: fetch, convert HTML to markdown, chunk, index. The raw page never enters context.

The Numbers

Validated across 11 real-world scenarios — test triage, TypeScript error diagnosis, git diff review, dependency audit, API response processing, CSV analytics. All under 1 KB output each.

Playwright snapshot: 56 KB → 299 B
GitHub issues (20): 59 KB → 1.1 KB
Access log (500 requests): 45 KB → 155 B
Analytics CSV (500 rows): 85 KB → 222 B
Git log (153 commits): 11.6 KB → 107 B
Repo research (subagent): 986 KB → 62 KB (5 calls vs 37)

Over a full session: 315 KB of raw output becomes 5.4 KB. Session time before slowdown goes from ~30 minutes to ~3 hours. Context remaining after 45 minutes: 99% instead of 60%.

Install

Two ways. Plugin Marketplace gives you auto-routing hooks and slash commands:

/plugin marketplace add mksglu/claude-context-mode
/plugin install context-mode@claude-context-mode

Or MCP-only if you just want the tools:

claude mcp add context-mode -- npx -y context-mode

Restart Claude Code. Done.

What Actually Changes

You don't change how you work. Context Mode includes a PreToolUse hook that automatically routes tool outputs through the sandbox. Subagents learn to use batch_execute as their primary tool. Bash subagents get upgraded to general-purpose so they can access MCP tools.

The practical difference: your context window stops filling up. Sessions that used to hit the wall at 30 minutes now run for 3 hours. The same 200K tokens, used more carefully.

Why We Built This

I run the MCP Directory & Hub. 100K+ daily requests. See every MCP server that ships. The pattern was clear: everyone builds tools that dump raw data into context. Nobody was solving the output side.

Cloudflare's Code Mode blog post crystallized it. They compressed tool definitions. We compress tool outputs. Same principle, other direction.

Built it for my own Claude Code sessions first. Noticed I could work 6x longer before context degradation. Open-sourced it.

Open source. MIT. github.com/mksglu/claude-context-mode

Mert Köseoğlu, Senior Software Engineer, AI consultant. x.com/mksglu · linkedin.com/in/mksglu · mksg.lu