不要相信长上下文窗口。

不要相信长上下文窗口。
Don't trust large context windows

原始链接: https://garrit.xyz/posts/2026-05-06-dont-trust-large-context-windows

LLM 的上下文窗口往往是误导性的营销指标。尽管厂商宣传其容量高达 200 万个 token，但研究表明，“上下文衰退”通常在 10 万个 token 左右就会出现，导致模型失去焦点并遗忘细节。现代编程智能体（coding agents）会迅速消耗这一“智能区”，频繁将用户推向不可靠的“愚钝区”。虽然会话摘要等自动化方案有所帮助，但它们属于被动补救，且容易导致信息退化。为了保持最佳性能，作者提倡采用“面包屑方案”。不要依赖冗长且混乱的上下文窗口，而应将上下文视为有限的预算。将项目拆解为小而结构化的工件（例如产品需求文档、明确的计划和模块化技能集），并在不同会话间传递。通过将信息从实时上下文中卸载到清晰、有条理的文档中，你可以防止智能体在无关的历史记录中挣扎，并确保其处于高性能阈值内。简而言之，有效的智能体管理需要从长生命周期、单体式的会话转向模块化、以工件为驱动的工作流。

链接文章《不要迷信大上下文窗口》在 Hacker News 上引发了关于如何减轻“上下文退化”并保持人工智能性能的讨论。用户认为，依赖巨大的上下文窗口往往会导致偏移或结果不理想。为了应对这一问题，贡献者们提出了几种策略： * **模块化：** 用户提倡使用简短、离散的会话，而不是一个长线程。一位用户通过扮演产品经理的角色，要求人工智能为每个功能撰写产品需求文档（PRD），从而在不让模型负担过重的情况下保持结构化的参考。 * **“转置”循环：** 另一种方法是将复杂的任务分解为更小的、推进状态的步骤，并为每个增量从结构化数据中动态生成提示词。 * **上下文管理：** 建议频繁清除聊天记录（即“重置种子”），并将信息压缩成较小的、重叠的块，以保持模型的专注，防止其进入大上下文窗口的“笨拙区”。总而言之，目前的共识是：将人工智能交互视为模块化、有针对性的任务，而不是将所有内容一股脑地塞进单一、庞大的上下文中，能够带来更高的可靠性和更稳定的输出。

原文

I recently watched a video that put a name on something I'd been feeling. The author splits an LLM's context window into two zones. There's the smart zone, where the model is sharp, and the dumb zone, where attention drops off and the model starts forgetting what you told it five minutes ago. The cutoff sits somewhere around 100k tokens. It doesn't matter how big the advertised context window is.

This matters because coding agents will happily walk you straight into the dumb zone. A modern agent burns through tokens fast. A few file reads, a long debug session, a sprawling test run, and you're at 100k before lunch. Meanwhile vendors keep advertising windows of 200k, 1M, even 2M, as if those numbers represented a usable working set. They don't. Studies like RULER and Chroma's report on context rot show that effective context is a fraction of the advertised number, and that performance degrades gradually as you fill the window.

Large context windows are mostly a marketing number. The architectures behind them work, but they paper over a problem the underlying attention mechanism doesn't really solve. The number on the box gets bigger every release. The usable part doesn't keep up.

Modern agents are getting smart about this. Tools like Claude Code now auto-compact: when the session gets long, the agent summarizes the history and starts fresh. That helps. But auto-compaction kicks in after you've already spent time in the dumb zone, and the summary is itself produced by a model that's already degraded. Better than nothing, but I'd rather avoid the situation altogether.

What I do is open a new session and pass it a spec I wrote myself. That's a much higher signal handoff than any automated summary, because I get to decide what matters going forward. It's the breadcrumb approach applied to agents. Leave an artifact that the next session, or the next person, can pick up cleanly.

You can take this further. Projects like obra/superpowers and mattpocock/skills structure entire agent workflows around small, named artifacts. PRDs, plans, skills, sub-agent handoffs. Each one is a way to keep the working session in the smart zone by deliberately moving information out of the session into something the next session can read.

So I treat my context window like a budget. I assume only the first chunk is really working for me, and everything I can move out of the live session and into a written artifact is one less thing for attention to fight over.

不要相信长上下文窗口。 Don't trust large context windows

不要相信长上下文窗口。
Don't trust large context windows