6.4 KiB
s06: Compact
A three-layer compression pipeline lets the agent work indefinitely by strategically forgetting old tool results, auto-summarizing when tokens exceed a threshold, and allowing manual compression on demand.
The Problem
The context window is finite. After enough tool calls, the messages array exceeds the model's context limit and the API call fails. Even before hitting the hard limit, performance degrades: the model becomes slower, less accurate, and starts ignoring earlier messages.
A 200,000 token context window sounds large, but a single read_file on
a 1000-line source file consumes ~4000 tokens. After reading 30 files and
running 20 bash commands, you are at 100,000+ tokens. The agent cannot
work on large codebases without some form of compression.
The three-layer pipeline addresses this with increasing aggressiveness: Layer 1 (micro-compact) silently replaces old tool results every turn. Layer 2 (auto-compact) triggers a full summarization when tokens exceed a threshold. Layer 3 (manual compact) lets the model trigger compression itself.
Teaching simplification: the token estimation here uses a rough characters/4 heuristic. Production systems use proper tokenizer libraries for accurate counts.
The Solution
Every turn:
+------------------+
| Tool call result |
+------------------+
|
v
[Layer 1: micro_compact] (silent, every turn)
Replace tool_result > 3 turns old
with "[Previous: used {tool_name}]"
|
v
[Check: tokens > 50000?]
| |
no yes
| |
v v
continue [Layer 2: auto_compact]
Save transcript to .transcripts/
LLM summarizes conversation.
Replace all messages with [summary].
|
v
[Layer 3: compact tool]
Model calls compact explicitly.
Same summarization as auto_compact.
How It Works
- Layer 1 -- micro_compact: Before each LLM call, find all tool_result entries older than the last 3 and replace their content.
def micro_compact(messages: list) -> list:
tool_results = []
for i, msg in enumerate(messages):
if msg["role"] == "user" and isinstance(msg.get("content"), list):
for j, part in enumerate(msg["content"]):
if isinstance(part, dict) and part.get("type") == "tool_result":
tool_results.append((i, j, part))
if len(tool_results) <= KEEP_RECENT:
return messages
to_clear = tool_results[:-KEEP_RECENT]
for _, _, part in to_clear:
if len(part.get("content", "")) > 100:
tool_id = part.get("tool_use_id", "")
tool_name = tool_name_map.get(tool_id, "unknown")
part["content"] = f"[Previous: used {tool_name}]"
return messages
- Layer 2 -- auto_compact: When estimated tokens exceed 50,000, save the full transcript and ask the LLM to summarize.
def auto_compact(messages: list) -> list:
TRANSCRIPT_DIR.mkdir(exist_ok=True)
transcript_path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
with open(transcript_path, "w") as f:
for msg in messages:
f.write(json.dumps(msg, default=str) + "\n")
response = client.messages.create(
model=MODEL,
messages=[{"role": "user", "content":
"Summarize this conversation for continuity..."
+ json.dumps(messages, default=str)[:80000]}],
max_tokens=2000,
)
summary = response.content[0].text
return [
{"role": "user", "content": f"[Compressed]\n\n{summary}"},
{"role": "assistant", "content": "Understood. Continuing."},
]
- Layer 3 -- manual compact: The
compacttool triggers the same summarization on demand.
if manual_compact:
messages[:] = auto_compact(messages)
- The agent loop integrates all three layers.
def agent_loop(messages: list):
while True:
micro_compact(messages)
if estimate_tokens(messages) > THRESHOLD:
messages[:] = auto_compact(messages)
response = client.messages.create(...)
# ... tool execution ...
if manual_compact:
messages[:] = auto_compact(messages)
Key Code
The three-layer pipeline (from agents/s06_context_compact.py,
lines 67-93 and 189-223):
THRESHOLD = 50000
KEEP_RECENT = 3
def micro_compact(messages):
# Replace old tool results with placeholders
...
def auto_compact(messages):
# Save transcript, LLM summarize, replace messages
...
def agent_loop(messages):
while True:
micro_compact(messages) # Layer 1
if estimate_tokens(messages) > THRESHOLD:
messages[:] = auto_compact(messages) # Layer 2
response = client.messages.create(...)
# ...
if manual_compact:
messages[:] = auto_compact(messages) # Layer 3
What Changed From s05
| Component | Before (s05) | After (s06) |
|---|---|---|
| Tools | 5 | 5 (base + compact) |
| Context mgmt | None | Three-layer compression |
| Micro-compact | None | Old results -> placeholders |
| Auto-compact | None | Token threshold trigger |
| Manual compact | None | compact tool |
| Transcripts | None | Saved to .transcripts/ |
Design Rationale
Context windows are finite, but agent sessions can be infinite. Three compression layers solve this at different granularities: micro-compact (replace old tool outputs), auto-compact (LLM summarizes when approaching limit), and manual compact (user-triggered). The key insight is that forgetting is a feature, not a bug -- it enables unbounded sessions. Transcripts preserve the full history on disk so nothing is truly lost, just moved out of the active context. The layered approach lets each layer operate independently at its own granularity, from silent per-turn cleanup to full conversation reset.
Try It
cd learn-claude-code
python agents/s06_context_compact.py
Example prompts to try:
Read every Python file in the agents/ directory one by one(watch micro-compact replace old results)Keep reading files until compression triggers automaticallyUse the compact tool to manually compress the conversation