Claude 代码的压缩会丢弃仍然在磁盘上的数据。

Claude 代码的压缩会丢弃仍然在磁盘上的数据。
Claude Code's compaction discards data that's still on disk

原始链接: https://github.com/anthropics/claude-code/issues/26771

## Claude 自动压缩数据丢失：摘要 Claude 目前在自动压缩时存在数据丢失问题。虽然完整的对话记录会被保存，但压缩后的摘要会丢弃用户提供的数据（例如代码、标记或日志），仅保留对其的引用。这导致 Claude “忘记”关键细节、产生幻觉信息，或要求用户重新粘贴数据，浪费 token 并中断工作流程。大量用户报告（8+ 个开放问题）凸显了这一持续存在的问题。核心问题在于有损的压缩过程缺乏与原始无损记录的链接。提出的解决方案是在压缩摘要中添加**索引记录引用**。这些引用（例如 `[记录：第847-1023行]`）将允许 Claude 在需要时才从记录中精确检索特定数据，从而最大限度地减少 token 使用并保持上下文。这是一个低成本的解决方案——利用现有文件，并且只需要修改摘要器。未来的阶段可以将其扩展到跨会话恢复。现有的解决方法无效，社区解决方案是对这个核心平台限制的外部补偿。解决这个问题是一个**高优先级**功能，会影响生产力，特别是对于涉及大量用户输入的任务。

## Claude Code 的压缩问题与解决方法用户在使用 Anthropic 的 Claude Code 时，由于其“压缩”功能（旨在管理长对话上下文限制）导致数据丢失。压缩功能会总结之前的对话内容，但这个过程是*有损的*，这意味着重要的细节，如文件路径、之前的决策，甚至代码片段都可能被遗忘。核心问题在于 Claude 无法预测哪些信息在之后会变得重要。一位用户在压缩后丢失了 8K 的 DOM 标记，阻碍了迭代工作。虽然原始记录仍然可以本地保存，但摘要中缺少检索它的链接。解决方法包括手动将关键信息持久化到外部文件（“记忆文件”），以便 Claude 在压缩后重新读取。像 `claude-log-cli` 这样的工具提供了高效的日志搜索功能，可以尝试恢复丢失的提示词，但一些用户报告此功能不稳定。开发者建议在摘要中添加行范围注释，以便有针对性地恢复所需的部分，避免令牌开销。

原文

Preflight Checklist

Problem Statement

When auto-compaction triggers mid-task, the summarization process irreversibly loses user-provided data that Claude is actively working with. The compacted summary retains a reference to the data (e.g., "user provided DOM markup") but discards the data itself — even though the full transcript is still on disk at ~/.claude/projects/{project-path}/.

Concrete example: I paste ~8,000 characters of DOM markup and ask Claude to refactor it. The session continues, auto-compaction fires, and the summary now says "User provided 8,200 characters of DOM markup for analysis." When I ask a specific question about the markup, Claude either hallucinates details, hedges, or asks me to re-paste what I provided minutes ago. The transcript on disk still has the original markup — Claude just has no way to know that, or where to find it.

This isn't an edge case. Any task involving substantial user-provided input (DOM markup, config files, log output, schema definitions) is at risk the moment the context window fills up. There are at least 8 open issues describing different symptoms of this same root cause:

#1534 — Memory Loss After Auto-compact (June 2025, closed — problem persists)
#3021 — Forgets to Refresh Memory After Compaction
#10960 — Repository Path Changes Forgotten After Compaction
#13919 — Skills context completely lost after auto-compaction (compiled its own list of 5+ related issues)
#14968 — Context compaction loses critical state
#19888 — Conversation compaction loses entire history
#21105 — Context truncation causes loss of conversation history
#23620 — Agent team lost when lead's context gets compacted

The underlying problem: compaction is a one-way lossy transformation with an available lossless source that isn't wired up. The summary and the transcript coexist on disk, but the summary contains no pointer back to the source material it compressed.

Proposed Solution

Phase 1: Within-session indexed transcript references

Tag compressed entries in the compacted summary with transcript line-range annotations so Claude can surgically recover only the specific content it needs, on demand.

Current behavior (lossy, no recovery):

[compacted] User provided 8,200 characters of DOM markup for analysis.

Proposed behavior (recoverable):

[compacted] User provided 8,200 characters of DOM markup for analysis. [transcript:lines 847-1023]

When Claude detects it needs the original data — because a question requires specifics the summary doesn't contain — it reads only lines 847–1023 from the transcript. Surgical recovery, minimal token cost, no wasted context on data that isn't needed.

Why this works:

Zero standing overhead. The summary stays compact. Tokens are only spent at the moment of recovery, and only on the specific chunk needed.
Ground truth recovery. The transcript is the literal conversation — no interpretation layer, no embedding approximation.
Architecturally minimal. No new databases, no embedding models, no external processes. Just metadata pointers in summaries referencing files that already exist on disk.
Scales flat. Whether the session is 10 minutes or 10 hours, the cost model is identical — pay only at retrieval.

Implementation scope: A modification to the compaction summarizer — when compressing a block of content, emit the summary line plus a transcript line-range annotation. Then expose a recovery mechanism (likely a file read with offset) that Claude can invoke when it identifies a gap between what the summary references and what it can actually see.

Phase 2 (future extension): Cross-session indexed lookup

The same mechanism generalizes across sessions. Transcripts from past sessions persist in ~/.claude/projects/. A cross-session reference would look like:

[compacted] User established auth pattern using JWT with refresh tokens. 
[session:2025-02-15/transcript:lines 312-487]

Suggested UX: Claude checks the current session transcript first (fast, local, cheap). If no match, Claude prompts the user: "I don't have that detail in the current session. Want me to search previous session transcripts?" User opts in → Claude searches across session transcripts using indexed references. This respects the user's token budget while providing a clean escalation path.

Alternative Solutions

Workarounds I've tried:

MEMORY.md behavioral nudge — Adding a rule in ~/.claude/CLAUDE.md telling Claude to check the transcript when data feels incomplete. This works occasionally but depends on Claude being self-aware enough to recognize what it's lost, which it often can't. It's a behavioral hack, not an architectural fix.
Manual /compact with preservation instructions — e.g., /compact preserve all user-provided DOM markup. This is manual, not automated, and still produces lossy summaries.

Existing proposals that address this differently:

#17428 — File-backed summaries that write compacted content to new files for later retrieval. This proposal achieves the same goal with less machinery: the transcript already exists on disk, so rather than duplicating data into new files, annotate the summary with line-range pointers to the existing transcript. Same recoverability, zero additional storage.
#15923 — Pre-compaction hook to archive transcripts before compaction. Useful for logging but doesn't solve the recovery problem — Claude still can't find what it lost.
#21190 — Include last N interactions verbatim in the compact summary. Helps with recent context but doesn't scale to mid-session data and increases summary size.
#6549 — Manual continuation prompts before compaction. Effective but entirely manual and user-dependent.

Community workarounds: The MCP memory ecosystem (embedding stores, knowledge graphs, SQLite-backed recall) has dozens of projects working around this gap externally. These are all compensating for one missing capability: the compacted summary doesn't know where its source material lives. The indexed transcript reference approach solves it natively, at the platform layer, with zero additional infrastructure.

Priority

High - Significant impact on productivity

Feature Category

Performance and speed

Use Case Example

I'm building a Chrome extension and paste ~8,000 characters of DOM markup from a target page into Claude Code, asking it to analyze the structure and help me write content scripts to interact with specific elements.
Claude reads the markup, identifies key selectors, and starts generating code. We iterate over several rounds — adjusting selectors, handling edge cases, testing approaches.
Around 40 minutes in, auto-compaction triggers. The session continues seamlessly — or so it appears.
I ask Claude to target a specific div with a nested data-attribute that was in the original markup. Claude can't recall the exact structure. The compacted summary says "user provided DOM markup for analysis" but the actual markup is gone from context.
Claude guesses at the selector, gets it wrong, and I have to re-paste the markup — losing the continuity of the session and burning tokens on data Claude already had.

With indexed transcript references:

At step 4, Claude sees [transcript:lines 847-1023] in the compacted summary, reads just those 177 lines from the transcript on disk, recovers the exact markup, and answers accurately. No re-paste, no guessing, no wasted tokens. The session continues as if compaction never happened.

Additional Context

Technical considerations:

The transcript already persists at ~/.claude/projects// — no new storage mechanism needed. This proposal adds metadata to the compaction summary, not infrastructure.
Implementation scope is a modification to the compaction summarizer: emit summary lines with [transcript:lines X-Y] annotations, plus a recovery mechanism (file read with line offset) Claude can invoke on demand.
Token cost is zero until recovery is triggered, and surgical when it is — only the specific compressed chunk gets loaded, not the full transcript.

Related issues (demand evidence):

At least 8 open bugs trace back to this same root cause — compaction summaries with no back-reference to source:
#1534, #3021, #10960, #13919, #14968, #19888, #21105, #23620

Related feature requests (differentiation):

Phase 2 extension (not the core ask, but worth noting): The same indexed mechanism generalizes across session transcripts for cross-session recovery. Default to current session lookup; prompt the user before searching prior sessions. Same architecture, wider scope.