Follow up PR #265: refine chapters, diagrams, and add S20 (#283)

* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience

Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building
incrementally on the previous. Key fixes across chapters:

- s01-s04: agent loop, tool dispatch, permission pipeline, hooks
- s05-s08: todo write, subagent, skill loading, context compact
- s09-s11: memory system, system prompt assembly, error recovery
- s12-s14: task graph, background tasks, cron scheduler

All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS,
json.dumps cache, real-state context, can_start dep protection, etc.).

* feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools

Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform
chapters. Each chapter inherits all previous fixes and adds one mechanism:

- s15: agent teams (TeamCreate, teammate threads, shared task list)
- s16: team protocols (plan approval, shutdown handshake, consume_inbox)
- s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox)
- s18: worktree isolation (git worktree, bind_task, cwd switching, safety)
- s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache)

All appendix source code references verified against CC source. Config priority
corrected: claude.ai < plugin < user < project < local.

* fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash

- s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02)
- s06-s08: todo_write validates content/status required fields (inherited from s05)
- s09: extract_memories uses pre-compression snapshot instead of compacted messages
- s16: submit_plan docstring clarifies protocol-only (not code-level gate)
- s17-s19: match_response restores type mismatch validation (from s16)
- s17-s19: claim_task deps list handles missing dep files without crashing

* fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation

- s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task,
  non-interactive/SDK defaults to TodoWrite. Fix env var name to
  CLAUDE_CODE_ENABLE_TASKS (not TODO_V2).
- s14/s15: add _validate_cron_field with per-field range checks (minute 0-59,
  hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi.
  Replace old try/except validation that only caught exceptions.
- s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree,
  not just create_worktree.

* fix: align s16-s19 teaching tool consistency

* fix pr265 chapter diagrams

* Add comprehensive s20 harness chapter

* Fix chapter smoke test regressions

* Clarify README tutorial track transition

---------

Co-authored-by: Haoran <bill-billion@outlook.com>
This commit is contained in:
gui-yue
2026-05-20 21:45:38 +08:00
committed by GitHub
parent c354cf7721
commit 1baf1aca5a
174 changed files with 35833 additions and 353 deletions

View File

@@ -0,0 +1,293 @@
# s08: Context Compact — Context Will Fill Up, Have a Way to Make Room
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
> *"Context will fill up — have a way to make room"* — Four-layer compression pipeline: cheap first, expensive last.
>
> **Harness Layer**: Compression — clean memory, unlimited sessions.
---
## The Problem
The agent is running along, then freezes.
It has bash, read, write — all the capabilities it needs. But it read a 1000-line file (~4000 tokens), then read 30 more files, ran 20 commands. Every command's output, every file's contents, all pile up in the `messages` list.
The context window is finite. Once full, the API outright rejects the call: `prompt_too_long`.
Without compression, an agent simply cannot work on large projects.
---
## The Solution
![Compact Overview](images/compact-overview.en.svg)
The hook structure, skill loading, and sub-Agent from s07 are preserved, with some tools omitted to focus on compaction. The core change: insert three pre-processors (0 API calls) before each LLM call, trigger an LLM summary (1 API call) when tokens still exceed the threshold, and emergency-trim if the API throws an error.
Core design: cheap first, expensive last.
---
## How It Works
![Four-layer compression pipeline](images/compaction-layers.en.svg)
### L1: snip_compact — Trim Irrelevant Old Conversation
The agent ran 80 turns of conversation, accumulating 160 `messages`. The very first "help me create hello.py" is barely relevant to current work, yet it still occupies space.
Message count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle:
```python
def snip_compact(messages, max_messages=50):
if len(messages) <= max_messages:
return messages
keep_head, keep_tail = 3, max_messages - 3
snipped = len(messages) - keep_head - keep_tail
placeholder = {"role": "user",
"content": f"[snipped {snipped} messages from conversation middle]"}
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
```
Entire messages are trimmed, but `tool_result` content within remaining messages keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.
### L2: micro_compact — Placeholder for Old Tool Results
![Old results placeholder](images/micro-compact.en.svg)
The agent read 10 files consecutively. The full contents of reads 17 are still sitting in context, no longer needed, but hogging large amounts of space.
Keep only the 3 most recent `tool_result` entries intact; replace older ones with a one-line placeholder:
```python
KEEP_RECENT_TOOL_RESULTS = 3
def micro_compact(messages):
tool_results = collect_tool_result_blocks(messages)
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
return messages
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
if len(block.get("content", "")) > 120:
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
return messages
```
Old results are cleared, but a single new result can be 500KB — one `cat` of a large file can max out the context. → L3.
### L3: tool_result_budget — Persist Large Results to Disk
![Large results to disk](images/layer1-budget.en.svg)
The model read 5 large files in one go; all `tool_result` blocks in the last user message total 500KB.
Sum the size of all `tool_result` blocks in the last user message. If over 200KB → sort by size, starting from the largest, persist to `.task_outputs/tool-results/`, keeping only a `<persisted-output>` marker + a 2000-character preview in context. The model sees the marker and knows the full content is on disk, re-reading it when needed.
```python
def tool_result_budget(messages, max_bytes=200_000):
last = messages[-1]
blocks = [(i, b) for i, b in enumerate(last["content"])
if b.get("type") == "tool_result"]
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
if total <= max_bytes:
return messages
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
for idx, block in ranked:
if total <= max_bytes:
break
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
total = recalculate_total(blocks)
return messages
```
The first three layers are all plain-text / structural operations — 0 API calls — but they cannot "understand" conversation content. Context may still be too large. → L4.
### L4: compact_history — Full LLM Summary
![Full LLM summary](images/auto-compact.en.svg)
All three previous layers have run, but after 30 minutes of continuous work on a huge project, tokens still exceed the threshold.
Three-step process:
1. **Save transcript**: Write the full conversation to `.transcripts/` in JSONL format. The transcript preserves a recoverable record, but the model's active context only contains the summary. For the model's current reasoning, the details are no longer in context. The teaching code does not provide a transcript retrieval tool.
2. **LLM generates summary**: Send conversation history to the LLM, asking it to preserve key information: current goals, important findings, modified files, remaining work, user constraints, etc.
3. **Replace message list**: All old messages are replaced with a single summary. The teaching version only keeps the summary; the real Claude Code re-attaches some recent files, plans, agent/skill/tool context after compaction.
```python
def compact_history(messages):
transcript_path = write_transcript(messages) # Save full conversation first
summary = summarize_history(messages) # LLM generates summary
return [{"role": "user",
"content": f"[Compacted]\n\n{summary}"}]
```
**Circuit breaker**: After 3 consecutive failures, stop retrying to prevent an infinite loop wasting API calls.
### Reactive: reactive_compact
Sometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers.
This triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, trimming to an API-acceptable size with byte-level precision, keeping only the last 5 messages + summary.
```python
def reactive_compact(messages):
transcript = write_transcript(messages)
summary = summarize_history(messages)
tail = messages[-5:]
return [{"role": "user",
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
```
Reactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11.
### Putting It All Together
```python
def agent_loop(messages):
reactive_retries = 0
while True:
# Three pre-processors (0 API calls)
# Order: budget first, so large content is persisted before placeholders
messages[:] = tool_result_budget(messages) # L3: persist large results
messages[:] = snip_compact(messages) # L1: trim middle
messages[:] = micro_compact(messages) # L2: old result placeholders
# Still too much? LLM summary (1 API call)
if estimate_token_count(messages) > THRESHOLD:
messages[:] = compact_history(messages)
try:
response = client.messages.create(...)
except PromptTooLongError:
if reactive_retries < MAX_REACTIVE_RETRIES:
messages[:] = reactive_compact(messages) # Emergency
reactive_retries += 1
continue
raise # retry limit exceeded, raise exception
# ... tool execution ...
# compact tool: when the model actively calls it, triggers compact_history
if block.name == "compact":
messages[:] = compact_history(messages)
results.append({..., "content": "[Compacted. History summarized.]"})
messages.append({"role": "user", "content": results})
break # end current turn, start fresh with compacted context
```
**The order must not be swapped.** L3 (budget) runs before L2 (micro) because micro replaces old large tool_results with one-line placeholders — budget must persist the full content before that happens. This is why CC source puts `applyToolResultBudget` first.
---
## Changes From s07
| Component | Before (s07) | After (s08) |
|-----------|-------------|-------------|
| Context management | None (context grows unbounded) | Four-layer compression pipeline + emergency |
| New functions | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
| Tools | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
| Loop | LLM call → tool execution | Three pre-processors before each turn + threshold-triggered compact_history |
| Design principle | — | Cheap first, expensive last |
---
## Try It
```sh
cd learn-claude-code
python s08_context_compact/code.py
```
Try these prompts:
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md` (read multiple files consecutively, observe L2 compressing old results)
2. `Read every file in s08_context_compact/` (read a large amount of content at once, observe L3 persisting to disk)
3. Chat for 20+ turns, observe whether `[auto compact]` or `[reactive compact]` appears
What to watch for: After each tool execution, are old `tool_result` entries compressed? When tokens exceed the threshold after extended conversation, is summarization triggered automatically?
---
## What's Next
Context compression lets an agent run for a long time without crashing. But after each compression, the preferences and constraints the user told it are also lost. Can we let the agent selectively remember important things?
s09 Memory → three subsystems: choosing what to remember, extracting key information, consolidating and organizing. Across compressions, across sessions.
<details>
<summary>Deep Dive Into CC Source Code</summary>
> The following is based on analysis of CC source code `compact.ts`, `autoCompact.ts`, `microCompact.ts`, and `query.ts`.
### Execution Order Comparison
The teaching version labels layers L1/L2/L3/L4 for pedagogical clarity, but actual execution order does not match the numbering:
| Dimension | Teaching Version | Claude Code |
|-----------|-----------------|-------------|
| Execution order | budget → snip → micro → auto | budget → snip → micro → collapse → auto (`query.ts:379-468`) |
| snip_compact | Keep head 3 + tail 47 | CC only enables on main thread; implementation not in open-source repo (`HISTORY_SNIP` feature gate), but interface is visible: `snipCompactIfNeeded(messages)``{ messages, tokensFreed, boundaryMessage? }`, also exposes `SnipTool` for model-initiated snipping. Teaching version's 3/47 are simplified parameters |
| micro_compact | Text placeholder replacement | Two paths: time-based clears content directly, cached uses API `cache_edits` (legacy path removed) |
| micro_compact whitelist | By position (most recent 3) | time-based triggers by time threshold; cached triggers by count (`microCompact.ts`) |
| tool_result_budget | 200KB characters | 200,000 characters (`toolLimits.ts:49`) |
| compact_history threshold | Character count estimate | Precise tokens: `contextWindow - maxOutputTokens - 13_000` |
| Summary requirements | 5 categories of info | 9 sections + `<analysis>`/`<summary>` dual tags |
| Compression prompt | Simple prompt | Double-ended hard guardrails forbidding tool calls |
| PTL retry | Yes (simplified) | `truncateHeadForPTLRetry()` retreats by message groups (`compact.ts:243-290`) |
| Post-compaction recovery | None (teaching version only keeps summary) | Auto re-read recent files, plans, agent/skill/tool context |
| Circuit breaker | 3 times | 3 times (`autoCompact.ts:70`) |
| Reactive retry | 1 time | CC has more granular tiered retries |
### Execution Order Details
The real order in CC source `query.ts`:
1. `applyToolResultBudget` (L379): persist large results first, ensuring full content is saved
2. `snipCompact` (L403): trim middle messages
3. `microcompact` (L414): old result placeholders
4. `contextCollapse` (L441): independent context management system (not in teaching version)
5. `autoCompact` (L454): LLM full summary
The teaching version's budget → snip → micro order matches this. The teaching version does not have the contextCollapse mechanism.
### Full Constant Reference
| Constant | Value | Source File |
|----------|-------|-------------|
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
| Time micro_compact interval | 60 minutes | `timeBasedMCConfig.ts` |
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
### contextCollapse and sessionMemoryCompact
CC source code has two additional mechanisms not covered in this teaching version:
- **contextCollapse**: An independent context management system that, when enabled, suppresses proactive autocompact (`autoCompact.ts:215-222`), with collapse's commit/blocking flow taking over context management. Manual `/compact` and reactive fallback remain independent paths, unaffected by contextCollapse.
- **sessionMemoryCompact**: Before compact_history, CC first attempts a lightweight summary using existing session memory (covered in s09) without calling the LLM. This mechanism becomes clearer after learning s09.
### What Does the Compression Prompt Look Like?
CC's compression prompt has two hard requirements:
1. **Absolutely no tool calls**: It begins with `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`, and appends another REMINDER at the end
2. **Analyze first, then summarize**: The model must first reason in an `<analysis>` tag, then output the formal summary in a `<summary>` tag. The analysis is stripped during formatting
### Teaching Version Simplifications Are Intentional
- micro_compact uses text placeholders → we don't have API-level `cache_edits` access
- Tokens estimated via character count → precise tokenizers are out of scope
- Post-compaction recovery omitted → teaching version only keeps summary, does not auto re-attach files
- Two auxiliary mechanisms not covered → they fall in the 10% detail category
The core design principle, cheap first, expensive last, is fully preserved.
</details>
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->

View File

@@ -0,0 +1,293 @@
# s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
> *"Context will fill up — have a way to make room"* — 4層圧縮戦略、安価なものを先に、高価なものを後に実行。
>
> **Harness レイヤー**: 圧縮 — クリーンな記憶、無限のセッション。
---
## 課題
Agent が動いている途中で、止まってしまう。
bash、read、write は揃っており、能力は十分。しかし 1000 行のファイル(~4000 tokenを読み、さらに 30 のファイルを読み、20 のコマンドを実行したとします。各コマンドの出力、各ファイルの内容がすべて `messages` リストに蓄積されます。
コンテキストウィンドウには上限があります。満杯になると、API は即座に拒否します:`prompt_too_long`
圧縮しなければ、Agent は大規模プロジェクトではまともに動けません。
---
## ソリューション
![Compact Overview](images/compact-overview.ja.svg)
s07 のフック構造、スキルロード、サブ Agent の骨格を維持し、圧縮に焦点を当てるため一部のツールは省略。コアの変更点:各 LLM 呼び出し前に 3 層のプリプロセッサ0 APIを挿入し、token が閾値を超えた場合は LLM 要約1 APIをトリガー、API エラー時には緊急トリムを実行。
コア設計:安価なものを先に、高価なものを後に。
---
## 仕組み
![4層圧縮パイプライン](images/compaction-layers.ja.svg)
### L1: snip_compact — 無関係な古い会話を切り捨て
Agent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。
メッセージ数が 50 を超えた場合 → 先頭 3 件(初期コンテキスト)と末尾 47 件(現在の作業)を保持し、中間を切り捨て:
```python
def snip_compact(messages, max_messages=50):
if len(messages) <= max_messages:
return messages
keep_head, keep_tail = 3, max_messages - 3
snipped = len(messages) - keep_head - keep_tail
placeholder = {"role": "user",
"content": f"[snipped {snipped} messages from conversation middle]"}
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
```
メッセージ全体は切り捨てたが、残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。
### L2: micro_compact — 古いツール結果をプレースホルダに置換
![古い結果のプレースホルダ](images/micro-compact.ja.svg)
Agent が連続して 10 個のファイルを読んだ。1〜7 回目の完全な内容はまだコンテキストに残っており、もう不要だが、大量のスペースを占有している。
直近 3 件の `tool_result` の完全な内容のみを保持し、それより古いものは 1 行のプレースホルダに置換:
```python
KEEP_RECENT_TOOL_RESULTS = 3
def micro_compact(messages):
tool_results = collect_tool_result_blocks(messages)
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
return messages
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
if len(block.get("content", "")) > 120:
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
return messages
```
古い結果はクリーンアップされたが、1 件の新しい結果だけで 500KB の可能性がある。大きなファイルを `cat` するだけでコンテキストがいっぱいになる。→ L3。
### L3: tool_result_budget — 大きな結果をディスクに退避
![大きな結果のディスク退避](images/layer1-budget.ja.svg)
モデルが一度に 5 つの大きなファイルを読み、1 つの user メッセージ内の全 `tool_result` の合計が 500KB に達した。
最後の user メッセージ内のすべての `tool_result` の合計サイズを集計。200KB を超えた場合 → サイズ順にソートし、最大のものから順に `.task_outputs/tool-results/` に退避。コンテキストには `<persisted-output>` マーカー + 先頭 2000 文字のプレビューのみを残す。モデルはマーカーを見て完全な内容がディスク上にあることを認識し、必要に応じて再読み込みできる。
```python
def tool_result_budget(messages, max_bytes=200_000):
last = messages[-1]
blocks = [(i, b) for i, b in enumerate(last["content"])
if b.get("type") == "tool_result"]
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
if total <= max_bytes:
return messages
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
for idx, block in ranked:
if total <= max_bytes:
break
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
total = recalculate_total(blocks)
return messages
```
最初の 3 層はすべて純粋なテキスト/構造操作0 API 呼び出し)だが、会話内容を「理解」することはできない。コンテキストがまだ大きすぎる可能性がある。→ L4。
### L4: compact_history — LLM 全量要約
![LLM 全量要約](images/auto-compact.ja.svg)
最初の 3 層がすべて実行されたが、超大規模プロジェクトで 30 分間連続作業すると、token がまだ閾値を超えている。
3 ステップのフロー:
1. **transcript を保存**:完全な会話を `.transcripts/` に JSONL 形式で書き出す。transcript は回復可能な記録として保存されるが、モデルのアクティブなコンテキストには要約しか残らない。モデルの現在の推論にとって、詳細はすでにコンテキストにない。教学コードは transcript 検索ツールを提供しない。
2. **LLM で要約を生成**:会話履歴を LLM に送り、現在の目標、重要な発見、変更済みファイル、残りの作業、ユーザーの制約などの重要な情報を保持するよう指示。
3. **メッセージリストを置換**:すべての古いメッセージが 1 件の要約に置き換えられる。教学版は要約のみを保持する。実際の Claude Code は compact 後に直近のファイル、計画、agent/skill/tool などのコンテキストを再付加する。
```python
def compact_history(messages):
transcript_path = write_transcript(messages) # 先に完全な会話を保存
summary = summarize_history(messages) # LLM で要約を生成
return [{"role": "user",
"content": f"[Compacted]\n\n{summary}"}]
```
**サーキットブレーカー**:連続 3 回失敗したらリトライを停止し、無限ループによる API 呼び出しの浪費を防止。
### 緊急: reactive_compact
API がまだ `prompt_too_long`413を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。
この時 **reactive_compact** がトリガーされるcompact_history よりもさらに積極的で、末尾からバイト単位の精度で API が受け入れ可能なサイズまで切り詰め、最後の 5 件のメッセージ + 要約のみを保持。
```python
def reactive_compact(messages):
transcript = write_transcript(messages)
summary = summarize_history(messages)
tail = messages[-5:]
return [{"role": "user",
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
```
reactive compact にはリトライ上限がある(デフォルト 1 回)。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。
### 合わせて実行
```python
def agent_loop(messages):
reactive_retries = 0
while True:
# 3 つのプリプロセッサ0 API 呼び出し)
# 順序budget を先に実行し、大きな内容をプレースホルダ化する前に退避
messages[:] = tool_result_budget(messages) # L3: 大きな結果を退避
messages[:] = snip_compact(messages) # L1: 中間を切り捨て
messages[:] = micro_compact(messages) # L2: 古い結果をプレースホルダに
# まだ足りないLLM 要約1 API 呼び出し)
if estimate_token_count(messages) > THRESHOLD:
messages[:] = compact_history(messages)
try:
response = client.messages.create(...)
except PromptTooLongError:
if reactive_retries < MAX_REACTIVE_RETRIES:
messages[:] = reactive_compact(messages) # 緊急対応
reactive_retries += 1
continue
raise # リトライ上限超過、例外をスロー
# ... ツール実行 ...
# compact ツールモデルが能動的に呼び出した場合、compact_history をトリガー
if block.name == "compact":
messages[:] = compact_history(messages)
results.append({..., "content": "[Compacted. History summarized.]"})
messages.append({"role": "user", "content": results})
break # 現在のターンを終了し、圧縮後のコンテキストで新しく開始
```
**順序は変えられない。** L3budgetが L2microの前に実行される理由micro は古い大きな tool_result を 1 行のプレースホルダに置換するため、budget はその前に完全な内容を退避させる必要がある。CC ソースが `applyToolResultBudget` を最初に配置する理由も同じ。
---
## s07 からの変更点
| コンポーネント | 変更前 (s07) | 変更後 (s08) |
|------|-----------|-----------|
| コンテキスト管理 | なし(コンテキストが無限に膨張) | 4 層圧縮パイプライン + 緊急対応 |
| 新規関数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
| ツール | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
| ループ | LLM 呼び出し → ツール実行 | 各ラウンド前に 3 層プリプロセッサを実行 + 閾値で compact_history をトリガー |
| 設計原則 | — | 安価なものを先に、高価なものを後に |
---
## 試してみよう
```sh
cd learn-claude-code
python s08_context_compact/code.py
```
以下のプロンプトを試してみてください:
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`連続して複数のファイルを読み、L2 の古い結果圧縮を観察)
2. `Read every file in s08_context_compact/`一度に大量の内容を読み込み、L3 のディスク退避を観察)
3. 20+ ラウンドの対話を繰り返し、`[auto compact]` または `[reactive compact]` が表示されるか観察
観察のポイント:ツール実行のたびに、古い tool_result は圧縮されているか?連続対話で token が閾値を超えたとき、要約が自動的にトリガーされたか?
---
## 次へ
コンテキスト圧縮により、Agent は長時間クラッシュせずに動けるようになった。しかし、圧縮のたびにユーザーが以前に伝えた偏好や制約も一緒に失われてしまう。Agent が重要なことを選択的に記憶できるようにできないか?
s09 Memory → 3 つのサブシステム:何を記憶するかの選択、重要情報の抽出、整理と統合。圧縮を越え、セッションを越えて。
<details>
<summary>CC ソースコードの詳細</summary>
> 以下は CC ソースコード `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` の分析に基づく。
### 実行順序の対応
教学版は説明の便宜上 L1/L2/L3/L4 と番号を振っているが、実際の実行順序は番号と完全には一致しない:
| 項目 | 教学版 | Claude Code |
|------|--------|-------------|
| 実行順序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto`query.ts:379-468` |
| snip_compact | 先頭 3 + 末尾 47 を保持 | CC はメインスレッドのみ有効;実装はオープンソースリポジトリにない(`HISTORY_SNIP` feature gate、インターフェースは確認可能`snipCompactIfNeeded(messages)``{ messages, tokensFreed, boundaryMessage? }``SnipTool` もモデルが能動的に呼び出し可能。教学版の 3/47 は簡略パラメータ |
| micro_compact | テキストプレースホルダで置換 | 2 つのパスtime-based は直接内容をクリア、cached は API の `cache_edits` を使用legacy パスは削除済み) |
| micro_compact ホワイトリスト | 位置による(直近 3 件) | time-based は時間閾値でトリガー、cached はカウントでトリガー(`microCompact.ts` |
| tool_result_budget | 200KB 文字 | 200,000 文字(`toolLimits.ts:49` |
| compact_history 閾値 | 文字数で推定 | 精密な token 数:`contextWindow - maxOutputTokens - 13_000` |
| 要約の要求 | 5 種類の情報 | 9 つのセクション + `<analysis>`/`<summary>` デュアルタグ |
| 圧縮プロンプト | シンプルなプロンプト | 先頭と末尾に二重の安全ガードでツール呼び出しを禁止 |
| PTL retry | あり(簡略版) | `truncateHeadForPTLRetry()` がメッセージグループ単位でロールバック(`compact.ts:243-290` |
| 圧縮後のリカバリ | なし(教学版は要約のみ保持) | 直近のファイル、計画、agent/skill/tool などの自動再付加 |
| サーキットブレーカー | 3 回 | 3 回(`autoCompact.ts:70` |
| reactive リトライ | 1 回 | CC にはより精緻な段階別リトライがある |
### 実行順序の詳細
CC ソース `query.ts` での実際の順序:
1. `applyToolResultBudget`L379まず大きな結果を処理し、完全な内容を退避
2. `snipCompact`L403中間メッセージを切り捨て
3. `microcompact`L414古い結果のプレースホルダ化
4. `contextCollapse`L441独立したコンテキスト管理システム教学版にはなし
5. `autoCompact`L454LLM 全量要約
教学版の budget → snip → micro の順序はこれと一致する。教学版には contextCollapse メカニズムがない。
### 完全な定数リファレンス
| 定数 | 値 | ソースファイル |
|------|-----|--------|
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
| 時間ベース micro_compact 間隔 | 60 分 | `timeBasedMCConfig.ts` |
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
### contextCollapse と sessionMemoryCompact
CC ソースコードには、この教学版では展開していない 2 つのメカニズムが存在する:
- **contextCollapse**:独立したコンテキスト管理システム。有効時には proactive autocompact を抑制し(`autoCompact.ts:215-222`、collapse の commit/blocking フローがコンテキスト管理を引き継ぐ。ただし manual `/compact` と reactive fallback は独立パスのままで、contextCollapse の影響を受けない。
- **sessionMemoryCompact**compact_history の前に、CC は既存の session memorys09 で解説を使った軽量要約を先に試みる。LLM を呼び出さない。このメカニズムは s09 を学んだ後に振り返るとより理解しやすい。
### 圧縮プロンプトの中身
CC の圧縮プロンプトには 2 つの厳格な要件がある:
1. **ツール呼び出しの絶対禁止**:冒頭が `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.` で、末尾にも再度 REMINDER がある
2. **先に分析してから要約**:モデルはまず `<analysis>` タグで思考を整理し、その後 `<summary>` タグで正式な要約を出力する。analysis はフォーマット時に除去される
### 教学版の簡略化は意図的
- micro_compact でテキストプレースホルダを使用 → API 層の `cache_edits` 権限がないため
- token を文字数で推定 → 精密な tokenizer は教学の対象外
- 圧縮後のリカバリを省略 → 教学版は要約のみを保持し、ファイルの自動再付加を行わない
- 2 つの補助メカニズムを展開しない → 10% の細部に属する
コア設計思想、安価なものを先に高価なものを後に、は完全に保持されている。
</details>
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->

View File

@@ -0,0 +1,293 @@
# s08: Context Compact — 上下文总会满,要有办法腾地方
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
> *"上下文总会满, 要有办法腾地方"* — 四层压缩策略, 便宜的先跑贵的后跑。
>
> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。
---
## 问题
Agent 跑着跑着,不动了。
手里有 bash、有 read、有 write能力是够的。但它读了一个 1000 行的文件(~4000 token又读了 30 个文件,跑了 20 条命令。每条命令的输出、每个文件的内容,全都堆在 `messages` 列表里。
上下文窗口是有限的。满了之后API 直接拒绝:`prompt_too_long`
不压缩Agent 根本没法在大项目里干活。
---
## 解决方案
![Compact Overview](images/compact-overview.svg)
保留 s07 的 hook 结构、技能加载、子 Agent 等骨架,省略部分工具细节以聚焦压缩。核心变动:每轮 LLM 调用前插入三层预处理器0 APItoken 仍超阈值时触发 LLM 摘要1 APIAPI 报错时应急裁剪。
核心设计:便宜的先跑,贵的后跑。
---
## 工作原理
![四层压缩管线](images/compaction-layers.svg)
### L1: snip_compact — 裁掉无关的旧对话
Agent 跑了 80 轮对话,`messages` 攒了 160 条。最前面的"帮我创建 hello.py"和当前工作几乎无关了,但全占着位置。
消息数超过 50 条 → 保留头部 3 条(初始上下文)和尾部 47 条(当前工作),中间裁掉:
```python
def snip_compact(messages, max_messages=50):
if len(messages) <= max_messages:
return messages
keep_head, keep_tail = 3, max_messages - 3
snipped = len(messages) - keep_head - keep_tail
placeholder = {"role": "user",
"content": f"[snipped {snipped} messages from conversation middle]"}
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
```
裁掉了整条消息,但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。
### L2: micro_compact — 旧工具结果占位
![旧结果占位](images/micro-compact.svg)
Agent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里,早就不需要了,但占着大量空间。
只保留最近 3 条 `tool_result` 的完整内容,更旧的替换为一行占位符:
```python
KEEP_RECENT_TOOL_RESULTS = 3
def micro_compact(messages):
tool_results = collect_tool_result_blocks(messages)
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
return messages
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
if len(block.get("content", "")) > 120:
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
return messages
```
旧结果清掉了,但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。
### L3: tool_result_budget — 大结果落盘
![大结果落盘](images/layer1-budget.svg)
模型一次读了 5 个大文件,单条 user 消息里所有 `tool_result` 加起来 500KB。
统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序,从最大的开始落盘到 `.task_outputs/tool-results/`,上下文里只留 `<persisted-output>` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上,需要时可以重新读。
```python
def tool_result_budget(messages, max_bytes=200_000):
last = messages[-1]
blocks = [(i, b) for i, b in enumerate(last["content"])
if b.get("type") == "tool_result"]
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
if total <= max_bytes:
return messages
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
for idx, block in ranked:
if total <= max_bytes:
break
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
total = recalculate_total(blocks)
return messages
```
前三层都是纯文本/结构操作0 API 调用,但也无法"理解"对话内容。上下文可能仍然太大。→ L4。
### L4: compact_history — LLM 全量摘要
![LLM 全量摘要](images/auto-compact.svg)
前三层全跑完了,但在超大项目中连续工作 30 分钟后token 仍然超过阈值。
三步流程:
1. **保存 transcript**:完整对话写入 `.transcripts/`JSONL 格式。transcript 保留了可恢复记录,但模型的活跃上下文里只剩摘要。对模型当下推理来说,细节已经不在上下文中了。教学代码没有提供 transcript 检索工具。
2. **LLM 生成摘要**:把对话历史发给 LLM要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。
3. **替换消息列表**:所有旧消息被替换为一条摘要。教学版只保留摘要;真实 Claude Code 会在 compact 后重新附加部分最近文件、计划、agent/skill/tool 等上下文。
```python
def compact_history(messages):
transcript_path = write_transcript(messages) # 先保存完整对话
summary = summarize_history(messages) # LLM 生成摘要
return [{"role": "user",
"content": f"[Compacted]\n\n{summary}"}]
```
**熔断器**:连续失败 3 次后停止重试,防止死循环浪费 API 调用。
### 应急: reactive_compact
有时候 API 还是返回 `prompt_too_long`413上下文增长速度快于压缩触发速度时。
这时触发 **reactive_compact**:比 compact_history 更激进,从尾部回退,以字节级精度裁剪到 API 可接受的大小,只保留最后 5 条消息 + 摘要。
```python
def reactive_compact(messages):
transcript = write_transcript(messages)
summary = summarize_history(messages)
tail = messages[-5:]
return [{"role": "user",
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
```
reactive compact 有重试上限(默认 1 次)。再失败就抛出异常,不无限循环。完整的错误恢复逻辑留给 s11。
### 合起来跑
```python
def agent_loop(messages):
reactive_retries = 0
while True:
# 三个预处理器0 API 调用)
# 顺序budget 先跑,确保大内容落盘后再做占位和裁剪
messages[:] = tool_result_budget(messages) # L3: 大结果落盘
messages[:] = snip_compact(messages) # L1: 裁中间
messages[:] = micro_compact(messages) # L2: 旧结果占位
# 还不够LLM 摘要1 API 调用)
if estimate_token_count(messages) > THRESHOLD:
messages[:] = compact_history(messages)
try:
response = client.messages.create(...)
except PromptTooLongError:
if reactive_retries < MAX_REACTIVE_RETRIES:
messages[:] = reactive_compact(messages) # 应急
reactive_retries += 1
continue
raise # 超过重试上限,抛出异常
# ... 工具执行 ...
# compact 工具:模型主动调用时触发 compact_history
if block.name == "compact":
messages[:] = compact_history(messages)
results.append({..., "content": "[Compacted. History summarized.]"})
messages.append({"role": "user", "content": results})
break # 结束当前 turn用压缩后的上下文开始新一轮
```
**顺序不能换。** L3budget在 L2micro前面因为 micro 会把旧的大 tool_result 替换成一行占位符budget 必须在那之前把完整内容落盘。这也是为什么 CC 源码把 `applyToolResultBudget` 放在最前面。
---
## 相对 s07 的变更
| 组件 | 之前 (s07) | 之后 (s08) |
|------|-----------|-----------|
| 上下文管理 | 无(上下文无限膨胀) | 四层压缩管线 + 应急 |
| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history |
| 设计原则 | — | 便宜的先跑,贵的后跑 |
---
## 试一下
```sh
cd learn-claude-code
python s08_context_compact/code.py
```
试试这些 prompt
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(连续读多个文件,观察 L2 压缩旧结果)
2. `Read every file in s08_context_compact/`(一次性读大量内容,观察 L3 落盘)
3. 反复对话 20+ 轮,观察是否出现 `[auto compact]``[reactive compact]`
观察重点:每次工具执行后,旧 tool_result 是否被压缩?连续对话后 token 超阈值时,是否自动触发了摘要?
---
## 接下来
上下文压缩让 Agent 能跑很久不会崩。但每次压缩后,用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent 有选择地记住重要的事?
s09 Memory → 三个子系统:选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。
<details>
<summary>深入 CC 源码</summary>
> 以下基于 CC 源码 `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` 的分析。
### 执行顺序对照
教学版为了讲解方便按 L1/L2/L3/L4 编号,但实际执行顺序和编号不完全对应:
| 维度 | 教学版 | Claude Code |
|------|--------|-------------|
| 执行顺序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto`query.ts:379-468` |
| snip_compact | 保留头 3 + 尾 47 | CC 仅主线程启用;实现不在开源仓库中(`HISTORY_SNIP` feature gate但接口可见`snipCompactIfNeeded(messages)``{ messages, tokensFreed, boundaryMessage? }`,还暴露了 `SnipTool` 工具让模型主动调用。教学版的 3/47 是简化参数 |
| micro_compact | 文本占位符替换 | 两条路径time-based 直接清内容cached 走 API `cache_edits`legacy path 已移除) |
| micro_compact 白名单 | 按位置(最近 3 条) | time-based 按时间阈值触发cached 按计数触发(`microCompact.ts` |
| tool_result_budget | 200KB 字符 | 200,000 字符(`toolLimits.ts:49` |
| compact_history 阈值 | 字符数估算 | 精确 token`contextWindow - maxOutputTokens - 13_000` |
| 摘要要求 | 5 类信息 | 9 个部分 + `<analysis>`/`<summary>` 双标签 |
| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 |
| PTL retry | 有(简化) | `truncateHeadForPTLRetry()` 按消息组回退(`compact.ts:243-290` |
| 后压缩恢复 | 无(教学版只保留摘要) | 自动重新读取最近文件、计划、agent/skill/tool 等 |
| 熔断器 | 3 次 | 3 次(`autoCompact.ts:70` |
| reactive 重试 | 1 次 | CC 有更精细的分级重试 |
### 执行顺序详解
CC 源码 `query.ts` 中的真实顺序:
1. `applyToolResultBudget`L379先处理大结果确保完整内容落盘
2. `snipCompact`L403裁中间消息
3. `microcompact`L414旧结果占位
4. `contextCollapse`L441独立的上下文管理系统教学版无
5. `autoCompact`L454LLM 全量摘要
教学版的 budget → snip → micro 顺序与此一致。教学版没有 contextCollapse 机制。
### 完整常量参考
| 常量 | 值 | 源文件 |
|------|-----|--------|
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts` |
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
### contextCollapse 和 sessionMemoryCompact
CC 源码中还有两个机制本教学版没有展开:
- **contextCollapse**:独立的上下文管理系统,启用时抑制 proactive autocompact`autoCompact.ts:215-222`),由 collapse 的 commit/blocking 流程接管上下文管理。但 manual `/compact` 和 reactive fallback 仍是独立路径,不受 contextCollapse 影响。
- **sessionMemoryCompact**compact_history 之前CC 会先尝试用已有的 session memorys09 会讲到)做轻量摘要,不调 LLM。这个机制等学完 s09 之后回头看会更清楚。
### 压缩 prompt 长什么样?
CC 的压缩 prompt 有两个硬性要求:
1. **绝对禁止调用工具**:开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`,末尾还会再 REMINDER 一次
2. **先分析再总结**:模型需要先在 `<analysis>` 标签里理清思路,然后在 `<summary>` 标签里输出正式摘要。analysis 在格式化时被剥离
### 教学版的简化是刻意的
- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限
- token 用字符数估算 → 精确 tokenizer 不在教学范围内
- 后压缩恢复省略 → 教学版只保留摘要,不自动重新附加文件
- 两个辅助机制不展开 → 属于 10% 的细节
核心设计思想,便宜的先跑贵的后跑,完整保留。
</details>
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->

469
s08_context_compact/code.py Normal file
View File

@@ -0,0 +1,469 @@
#!/usr/bin/env python3
"""
s08_context_compact.py - Context Compact
Four-layer compaction pipeline inserted before LLM calls:
L1: snip_compact — trim middle messages when count > 50
L2: micro_compact — replace old tool_results with placeholders
L3: tool_result_budget — persist large results to disk
L4: compact_history — LLM full summary (1 API call)
Emergency: reactive_compact — when API still returns prompt_too_long
┌─────────────────────────────────────────────────────────────┐
│ messages[] │
│ ↓ │
│ L3 budget ─→ L1 snip ─→ L2 micro ─→ [token > threshold?] │
│ ├─ No → LLM │
│ └─ Yes → L4 summary │
│ ↓ │
│ LLM call │
│ [prompt_too_long?] │
│ └─ Yes → reactive │
└─────────────────────────────────────────────────────────────┘
Core principle: cheap first, expensive last.
Execution order matches CC source: budget → snip → micro → auto.
Builds on s07 (skill loading). Usage:
python s08_context_compact/code.py
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
"""
import os, subprocess, json, time
from pathlib import Path
try:
import readline
readline.parse_and_bind('set bind-tty-special-chars off')
except ImportError:
pass
from anthropic import Anthropic
from dotenv import load_dotenv
load_dotenv(override=True)
if os.getenv("ANTHROPIC_BASE_URL"): os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
WORKDIR = Path.cwd()
SKILLS_DIR = WORKDIR / "skills"
TRANSCRIPT_DIR = WORKDIR / ".transcripts"
TOOL_RESULTS_DIR = WORKDIR / ".task_outputs" / "tool-results"
TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True)
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
MODEL = os.environ["MODEL_ID"]
# s07: Skill catalog scan (inherited from s07)
def _parse_frontmatter(text: str) -> tuple[dict, str]:
if not text.startswith("---"):
return {}, text
parts = text.split("---", 2)
if len(parts) < 3:
return {}, text
meta = {}
for line in parts[1].strip().splitlines():
if ":" in line:
k, v = line.split(":", 1)
meta[k.strip()] = v.strip().strip('"').strip("'")
return meta, parts[2].strip()
SKILL_REGISTRY: dict[str, dict] = {}
def _scan_skills():
if not SKILLS_DIR.exists():
return
for d in sorted(SKILLS_DIR.iterdir()):
if not d.is_dir():
continue
manifest = d / "SKILL.md"
if manifest.exists():
raw = manifest.read_text()
meta, body = _parse_frontmatter(raw)
name = meta.get("name", d.name)
desc = meta.get("description", raw.split("\n")[0].lstrip("#").strip())
SKILL_REGISTRY[name] = {"name": name, "description": desc, "content": raw}
_scan_skills()
def list_skills() -> str:
if not SKILL_REGISTRY:
return "(no skills found)"
return "\n".join(f"- **{s['name']}**: {s['description']}" for s in SKILL_REGISTRY.values())
def load_skill(name: str) -> str:
skill = SKILL_REGISTRY.get(name)
if not skill:
return f"Skill not found: {name}"
return skill["content"]
# s08: SYSTEM includes skill catalog (inherited from s07 build_system)
def build_system() -> str:
catalog = list_skills()
return (
f"You are a coding agent at {WORKDIR}. "
f"Skills available:\n{catalog}\n"
"Use load_skill to get full details when needed."
)
SYSTEM = build_system()
# s08: subagent gets its own system prompt — no compact, no skill loading
SUB_SYSTEM = (
f"You are a coding agent at {WORKDIR}. "
"Complete the task you were given, then return a concise summary. "
"Do not delegate further."
)
# ═══════════════════════════════════════════════════════════
# FROM s02-s07 (unchanged): Basic Tools
# ═══════════════════════════════════════════════════════════
def safe_path(p: str) -> Path:
path = (WORKDIR / p).resolve()
if not path.is_relative_to(WORKDIR): raise ValueError(f"Path escapes workspace: {p}")
return path
def run_bash(command: str) -> str:
try:
r = subprocess.run(command, shell=True, cwd=WORKDIR, capture_output=True, text=True, timeout=120)
out = (r.stdout + r.stderr).strip()
return out[:50000] if out else "(no output)"
except subprocess.TimeoutExpired: return "Error: Timeout (120s)"
def run_read(path: str, limit: int | None = None) -> str:
try:
lines = safe_path(path).read_text().splitlines()
if limit and limit < len(lines): lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
return "\n".join(lines)
except Exception as e: return f"Error: {e}"
def run_write(path: str, content: str) -> str:
try:
file_path = safe_path(path); file_path.parent.mkdir(parents=True, exist_ok=True)
file_path.write_text(content); return f"Wrote {len(content)} bytes to {path}"
except Exception as e: return f"Error: {e}"
def run_edit(path: str, old_text: str, new_text: str) -> str:
try:
file_path = safe_path(path)
text = file_path.read_text()
if old_text not in text: return f"Error: text not found in {path}"
file_path.write_text(text.replace(old_text, new_text, 1))
return f"Edited {path}"
except Exception as e: return f"Error: {e}"
def run_glob(pattern: str) -> str:
import glob as g
try:
results = []
for match in g.glob(pattern, root_dir=WORKDIR):
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
results.append(match)
return "\n".join(results) if results else "(no matches)"
except Exception as e: return f"Error: {e}"
def run_todo_write(todos: list) -> str:
for i, t in enumerate(todos):
if "content" not in t or "status" not in t:
return f"Error: todos[{i}] missing 'content' or 'status'"
if t["status"] not in ("pending", "in_progress", "completed"):
return f"Error: todos[{i}] has invalid status '{t['status']}'"
tasks_file = TASKS_DIR / "current_todos.json"
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
lines = ["\n\033[33m## Current Tasks\033[0m"]
for t in todos:
icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]]
lines.append(f" [{icon}] {t['content']}")
print("\n".join(lines))
return f"Updated {len(todos)} tasks"
def extract_text(content) -> str:
if not isinstance(content, list): return str(content)
return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text")
# ═══════════════════════════════════════════════════════════
# FROM s06-s07 (unchanged): Subagent
# ═══════════════════════════════════════════════════════════
SUB_TOOLS = [
{"name": "bash", "description": "Run a shell command.",
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
{"name": "read_file", "description": "Read file contents.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
{"name": "write_file", "description": "Write content to a file.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
{"name": "edit_file", "description": "Replace exact text in a file once.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
{"name": "glob", "description": "Find files matching a glob pattern.",
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
]
SUB_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write,
"edit_file": run_edit, "glob": run_glob}
def spawn_subagent(task: str) -> str:
print(f"\n\033[35m[Subagent spawned]\033[0m")
messages = [{"role": "user", "content": task}]
for _ in range(30):
response = client.messages.create(model=MODEL, system=SUB_SYSTEM,
messages=messages, tools=SUB_TOOLS, max_tokens=8000)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
break
results = []
for block in response.content:
if block.type == "tool_use":
blocked = trigger_hooks("PreToolUse", block)
if blocked:
results.append({"type": "tool_result", "tool_use_id": block.id,
"content": str(blocked)})
continue
handler = SUB_HANDLERS.get(block.name)
output = handler(**block.input) if handler else f"Unknown: {block.name}"
trigger_hooks("PostToolUse", block, output)
print(f" \033[90m[sub] {block.name}: {str(output)[:100]}\033[0m")
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
messages.append({"role": "user", "content": results})
result = extract_text(messages[-1]["content"])
if not result:
for msg in reversed(messages):
if msg["role"] == "assistant":
result = extract_text(msg["content"])
if result:
break
if not result:
result = "Subagent stopped after 30 turns without final answer."
print(f"\033[35m[Subagent done]\033[0m")
return result
# ═══════════════════════════════════════════════════════════
# NEW in s08: Four-Layer Compaction Pipeline
# ═══════════════════════════════════════════════════════════
CONTEXT_LIMIT = 50000
KEEP_RECENT = 3
PERSIST_THRESHOLD = 30000
def estimate_size(msgs): return len(str(msgs))
# L1: snipCompact — trim middle messages
def snip_compact(messages, max_messages=50):
if len(messages) <= max_messages: return messages
keep_head, keep_tail = 3, max_messages - 3
snipped = len(messages) - keep_head - keep_tail
return messages[:keep_head] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] + messages[-keep_tail:]
# L2: microCompact — old result placeholders
def collect_tool_results(messages):
blocks = []
for mi, msg in enumerate(messages):
if msg.get("role") != "user" or not isinstance(msg.get("content"), list): continue
for bi, block in enumerate(msg["content"]):
if isinstance(block, dict) and block.get("type") == "tool_result":
blocks.append((mi, bi, block))
return blocks
def micro_compact(messages):
tool_results = collect_tool_results(messages)
if len(tool_results) <= KEEP_RECENT: return messages
for _, _, block in tool_results[:-KEEP_RECENT]:
if len(block.get("content", "")) > 120:
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
return messages
# L3: toolResultBudget — persist large results to disk
def persist_large_output(tool_use_id, output):
if len(output) <= PERSIST_THRESHOLD: return output
TOOL_RESULTS_DIR.mkdir(parents=True, exist_ok=True)
path = TOOL_RESULTS_DIR / f"{tool_use_id}.txt"
if not path.exists(): path.write_text(output)
return f"<persisted-output>\nFull output: {path}\nPreview:\n{output[:2000]}\n</persisted-output>"
def tool_result_budget(messages, max_bytes=200_000):
last = messages[-1] if messages else None
if not last or last.get("role") != "user" or not isinstance(last.get("content"), list): return messages
blocks = [(i, b) for i, b in enumerate(last["content"]) if isinstance(b, dict) and b.get("type") == "tool_result"]
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
if total <= max_bytes: return messages
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
for _, block in ranked:
if total <= max_bytes: break
content = str(block.get("content", ""))
if len(content) <= PERSIST_THRESHOLD: continue
tid = block.get("tool_use_id", "unknown")
block["content"] = persist_large_output(tid, content)
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
return messages
# L4: autoCompact — LLM full summary
def write_transcript(messages):
TRANSCRIPT_DIR.mkdir(parents=True, exist_ok=True)
path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
with path.open("w") as f:
for msg in messages: f.write(json.dumps(msg, default=str) + "\n")
return path
def summarize_history(messages):
conversation = json.dumps(messages, default=str)[:80000]
prompt = ("Summarize this coding-agent conversation so work can continue.\n"
"Preserve: 1. current goal, 2. key findings/decisions, 3. files read/changed, "
"4. remaining work, 5. user constraints.\nBe compact but concrete.\n\n" + conversation)
response = client.messages.create(model=MODEL, messages=[{"role": "user", "content": prompt}], max_tokens=2000)
return "\n".join(
getattr(block, "text", "")
for block in response.content
if getattr(block, "type", None) == "text").strip() or "(empty summary)"
def compact_history(messages):
transcript_path = write_transcript(messages)
print(f"[transcript saved: {transcript_path}]")
summary = summarize_history(messages)
return [{"role": "user", "content": f"[Compacted]\n\n{summary}"}]
# Emergency: reactiveCompact — on API error
def reactive_compact(messages):
transcript = write_transcript(messages)
summary = summarize_history(messages)
return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *messages[-5:]]
# ═══════════════════════════════════════════════════════════
# FROM s07: Tool Definitions
# ═══════════════════════════════════════════════════════════
TOOLS = [
{"name": "bash", "description": "Run a shell command.",
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
{"name": "read_file", "description": "Read file contents.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
{"name": "write_file", "description": "Write content to a file.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
{"name": "edit_file", "description": "Replace exact text in a file once.",
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
{"name": "glob", "description": "Find files matching a glob pattern.",
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
{"name": "todo_write", "description": "Create and manage a task list for your current coding session.",
"input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}, "required": ["content", "status"]}}}, "required": ["todos"]}},
{"name": "task", "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.",
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}},
{"name": "load_skill", "description": "Load the full content of a skill by name.",
"input_schema": {"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}},
# s08 change: new compact tool — triggers compact_history, not a no-op
{"name": "compact", "description": "Summarize earlier conversation to free context space.",
"input_schema": {"type": "object", "properties": {"focus": {"type": "string"}}}},
]
TOOL_HANDLERS = {
"bash": run_bash, "read_file": run_read, "write_file": run_write,
"edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write,
"task": spawn_subagent, "load_skill": load_skill,
}
# FROM s04 (unchanged): Hooks
HOOKS = {"PreToolUse": [], "PostToolUse": []}
def trigger_hooks(event, *args):
for cb in HOOKS[event]:
r = cb(*args)
if r is not None: return r
return None
DENY_LIST = ["rm -rf /", "sudo", "shutdown"]
def permission_hook(block):
if block.name == "bash":
for p in DENY_LIST:
if p in block.input.get("command", ""): return "Permission denied"
return None
def log_hook(block):
print(f"\033[90m[HOOK] {block.name}\033[0m")
return None
HOOKS["PreToolUse"].append(permission_hook)
HOOKS["PreToolUse"].append(log_hook)
# ═══════════════════════════════════════════════════════════
# agent_loop — s08 core: run compaction pipeline before LLM
# ═══════════════════════════════════════════════════════════
MAX_REACTIVE_RETRIES = 1 # retry limit for reactive compact
def agent_loop(messages: list):
reactive_retries = 0
while True:
# s08 change: three preprocessors (0 API calls, cheap first)
# Order matches CC source: budget → snip → micro
messages[:] = tool_result_budget(messages) # L3: persist large results first
messages[:] = snip_compact(messages) # L1: trim middle
messages[:] = micro_compact(messages) # L2: old result placeholders
# s08 change: tokens still over threshold → LLM summary (1 API call)
if estimate_size(messages) > CONTEXT_LIMIT:
print("[auto compact]")
messages[:] = compact_history(messages)
try:
response = client.messages.create(model=MODEL, system=SYSTEM, messages=messages, tools=TOOLS, max_tokens=8000)
reactive_retries = 0 # reset on successful API call
except Exception as e:
if ("prompt_too_long" in str(e).lower() or "too many tokens" in str(e).lower()) and reactive_retries < MAX_REACTIVE_RETRIES:
print("[reactive compact]")
messages[:] = reactive_compact(messages)
reactive_retries += 1
continue
raise
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use": return
results = []
for block in response.content:
if block.type != "tool_use": continue
print(f"\033[36m> {block.name}\033[0m")
# s08: compact tool triggers compact_history, not a no-op string
if block.name == "compact":
messages[:] = compact_history(messages)
results.append({"type": "tool_result", "tool_use_id": block.id,
"content": "[Compacted. Conversation history has been summarized.]"})
messages.append({"role": "user", "content": results})
break # end current turn, start fresh with compacted context
blocked = trigger_hooks("PreToolUse", block)
if blocked:
results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(blocked)})
continue
handler = TOOL_HANDLERS.get(block.name)
output = handler(**block.input) if handler else f"Unknown: {block.name}"
trigger_hooks("PostToolUse", block, output)
print(str(output)[:200])
results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(output)})
else:
# normal path: no compact was called
messages.append({"role": "user", "content": results})
continue
# compact was called: results already appended above
continue
if __name__ == "__main__":
print("s08: Context Compact — four-layer compaction pipeline")
print("输入问题,回车发送。输入 q 退出。\n")
history = []
while True:
try: query = input("\033[36ms08 >> \033[0m")
except (EOFError, KeyboardInterrupt): break
if query.strip().lower() in ("q", "exit", ""): break
history.append({"role": "user", "content": query})
agent_loop(history)
for block in history[-1]["content"]:
if getattr(block, "type", None) == "text": print(block.text)
print()

View File

@@ -0,0 +1,72 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
</defs>
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM Full Summary</text>
<!-- Trigger Condition -->
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">Trigger Condition</text>
<text x="140" y="70" fill="#991b1b" font-size="11">All three preprocessing layers have run, estimated tokens &gt; contextWindow - maxOutputTokens - 13_000.</text>
<text x="140" y="86" fill="#991b1b" font-size="10">Tries sessionMemoryCompact first (lightweight summary from existing memory), only calls LLM if insufficient.</text>
<!-- Steps -->
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Step 1: Save transcript</text>
<text x="40" y="152" fill="#475569" font-size="10">Write full conversation to .transcripts/</text>
<text x="40" y="168" fill="#475569" font-size="10">JSONL format, one message per line</text>
<text x="40" y="184" fill="#475569" font-size="10">Filename: transcript_{timestamp}.jsonl</text>
<text x="40" y="200" fill="#94a3b8" font-size="9">No data lost, just moved out of active area</text>
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Step 2: LLM generates summary</text>
<text x="290" y="152" fill="#475569" font-size="10">Send conversation history to LLM</text>
<text x="290" y="166" fill="#475569" font-size="9">Summary must include 9 sections:</text>
<text x="290" y="180" fill="#94a3b8" font-size="8">request · concepts · files · errors · resolutions</text>
<text x="290" y="192" fill="#94a3b8" font-size="8">user messages · todos · current state · next steps</text>
<text x="290" y="206" fill="#94a3b8" font-size="9">Generated only once</text>
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">Step 3: Replace message list</text>
<text x="540" y="152" fill="#991b1b" font-size="10">All old messages → 1 summary</text>
<text x="540" y="168" fill="#991b1b" font-size="10">Model continues from summary</text>
<text x="540" y="184" fill="#991b1b" font-size="10">Includes recently_read file list</text>
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ This is an irreversible operation</text>
<!-- Before/After comparison -->
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">Before messages</text>
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 messages, occupying 62K tokens</text>
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">After messages</text>
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] Summary: goal → create hello.py ...</text>
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">Recent files: hello.py, README.md ...</text>
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 message, occupying 1K tokens</text>
<!-- Circuit breaker -->
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">Circuit breaker:</text>
<text x="130" y="362" fill="#475569" font-size="10">3 consecutive autocompact failures → stop retrying. Prevents wasting API calls when context is unrecoverable.</text>
</svg>

After

Width:  |  Height:  |  Size: 5.7 KiB

View File

@@ -0,0 +1,72 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
</defs>
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM 完全要約</text>
<!-- トリガー条件 -->
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">トリガー条件</text>
<text x="115" y="70" fill="#991b1b" font-size="11">前 3 層の前処理を全て実行後、推定 token &gt; contextWindow - maxOutputTokens - 13_000。</text>
<text x="115" y="86" fill="#991b1b" font-size="10">まず sessionMemoryCompact を試行(既存のメモリで軽量要約)、不足時のみ LLM を呼び出し。</text>
<!-- ステップ -->
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">ステップ 1transcript 保存</text>
<text x="40" y="152" fill="#475569" font-size="10">完全な対話を .transcripts/ に書き込み</text>
<text x="40" y="168" fill="#475569" font-size="10">JSONL 形式、1 行 1 メッセージ</text>
<text x="40" y="184" fill="#475569" font-size="10">ファイル名transcript_{timestamp}.jsonl</text>
<text x="40" y="200" fill="#94a3b8" font-size="9">情報は失われていない、アクティブ領域から移動のみ</text>
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">ステップ 2LLM 要約生成</text>
<text x="290" y="152" fill="#475569" font-size="10">対話履歴を LLM に送信</text>
<text x="290" y="166" fill="#475569" font-size="9">要約は 9 つのセクションを含む:</text>
<text x="290" y="180" fill="#94a3b8" font-size="8">リクエスト・概念・ファイル・エラー・解決</text>
<text x="290" y="192" fill="#94a3b8" font-size="8">ユーザーメッセージ・TODO・現在・次ステップ</text>
<text x="290" y="206" fill="#94a3b8" font-size="9">1 回のみ生成</text>
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">ステップ 3メッセージリスト置換</text>
<text x="540" y="152" fill="#991b1b" font-size="10">全旧メッセージ → 1 件の要約に</text>
<text x="540" y="168" fill="#991b1b" font-size="10">モデルは要約から作業を継続</text>
<text x="540" y="184" fill="#991b1b" font-size="10">recently_read ファイルリストを付与</text>
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ これは復元不可能な操作</text>
<!-- 圧縮前/後 比較 -->
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">圧縮前 messages</text>
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 件のメッセージ、62K トークンを占有</text>
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">圧縮後 messages</text>
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] 要約:目標 → hello.py を作成 ...</text>
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">最近のファイルhello.py, README.md ...</text>
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 件のメッセージ、1K トークンを占有</text>
<!-- サーキットブレーカー -->
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">サーキットブレーカー:</text>
<text x="145" y="362" fill="#475569" font-size="10">autocompact が連続 3 回失敗 → リトライ停止。コンテキストが復元不可能な場合の API 呼び出しの無駄な反復を防止。</text>
</svg>

After

Width:  |  Height:  |  Size: 6.0 KiB

View File

@@ -0,0 +1,72 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
</defs>
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM 全量摘要</text>
<!-- 触发条件 -->
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">触发条件</text>
<text x="105" y="70" fill="#991b1b" font-size="11">前三层预处理全跑完,估算 token &gt; contextWindow - maxOutputTokens - 13_000。</text>
<text x="105" y="86" fill="#991b1b" font-size="10">先尝试 sessionMemoryCompact用已有记忆做轻量摘要不足才调 LLM。</text>
<!-- 步骤 -->
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">步骤 1保存 transcript</text>
<text x="40" y="152" fill="#475569" font-size="10">完整对话写入 .transcripts/</text>
<text x="40" y="168" fill="#475569" font-size="10">JSONL 格式,一行一条消息</text>
<text x="40" y="184" fill="#475569" font-size="10">文件名transcript_{timestamp}.jsonl</text>
<text x="40" y="200" fill="#94a3b8" font-size="9">信息没有丢失,只是移出活跃区</text>
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">步骤 2LLM 生成摘要</text>
<text x="290" y="152" fill="#475569" font-size="10">把对话历史发给 LLM</text>
<text x="290" y="166" fill="#475569" font-size="9">摘要需包含 9 个部分:</text>
<text x="290" y="180" fill="#94a3b8" font-size="8">请求·概念·文件·错误·解决</text>
<text x="290" y="192" fill="#94a3b8" font-size="8">用户消息·待办·当前·下一步</text>
<text x="290" y="206" fill="#94a3b8" font-size="9">只生成一次</text>
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">步骤 3替换消息列表</text>
<text x="540" y="152" fill="#991b1b" font-size="10">所有旧消息 → 1 条摘要</text>
<text x="540" y="168" fill="#991b1b" font-size="10">模型从摘要继续工作</text>
<text x="540" y="184" fill="#991b1b" font-size="10">附带 recently_read 文件列表</text>
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ 这是无法恢复的操作</text>
<!-- Before/After 对比 -->
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">压缩前 messages</text>
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 条消息,占 62K token</text>
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">压缩后 messages</text>
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] 摘要:目标 → 创建 hello.py ...</text>
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">最近文件hello.py, README.md ...</text>
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 条消息,占 1K token</text>
<!-- 熔断器 -->
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">熔断器:</text>
<text x="95" y="362" fill="#475569" font-size="10">连续 autocompact 失败 3 次 → 停止重试。防止上下文不可恢复时反复浪费 API 调用。</text>
</svg>

After

Width:  |  Height:  |  Size: 5.6 KiB

View File

@@ -0,0 +1,138 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
<defs>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
</marker>
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
</marker>
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
</marker>
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
</marker>
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/>
<stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
</defs>
<!-- Background -->
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
<!-- Title -->
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
<text x="410" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Context Compact — Compression Before LLM Call, Three Trigger Modes</text>
<!-- Labels -->
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 Preserved</text>
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 New</text>
<!-- ===== ① messages[] ===== -->
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 preserved)</text>
<!-- messages → pipeline entry -->
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
<!-- ===== ② Compression Pipeline ===== -->
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">Compression Pipeline</text>
<!-- ── ① Every Turn Auto ── -->
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① Every Turn · Unconditional · 0 API</text>
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
<!-- ↓ → ◇ -->
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
<!-- ◇ Decision Diamond -->
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">Over threshold?</text>
<!-- No: right annotation -->
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">No → Pass</text>
<text x="306" y="252" fill="#94a3b8" font-size="7">Straight to LLM</text>
<!-- Yes: below annotation -->
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">Yes↓</text>
<!-- ── ② Conditional Trigger ── -->
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② Conditional · Token Over Threshold · 1 API</text>
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
<!-- Pipeline exit → LLM -->
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
<!-- ===== ③ LLM ===== -->
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
<!-- LLM No → Return -->
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600">No</text>
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">Return Result</text>
<!-- LLM Yes → TOOL_HANDLERS -->
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600">Yes</text>
<!-- ④ TOOL_HANDLERS -->
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
<!-- LLM API error → emergency compact → retry next turn -->
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API error</text>
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">retry to compression pipeline</text>
<!-- ===== ③ Emergency Trigger (after LLM API failure) ===== -->
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ Emergency Trigger</text>
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API returns prompt_too_long</text>
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → retry</text>
<!-- ===== Loop Back ===== -->
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">Tool results appended to messages[] → next turn → compress again → LLM</text>
<!-- ===== Legend ===== -->
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
<text x="94" y="414" fill="#334155" font-size="10">s07 Preserved: loop, hooks, skill loading, sub-agents</text>
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
<text x="94" y="436" fill="#334155" font-size="10">① Every Turn Auto: L3→L1→L2 run unconditionally before each LLM call, 0 API</text>
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
<text x="94" y="458" fill="#334155" font-size="10">② Conditional: after L3/L1/L2, tokens still over threshold → compact_history, 1 API</text>
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
<text x="94" y="480" fill="#334155" font-size="10">③ Emergency: API returns prompt_too_long → reactive_compact → retry</text>
<text x="70" y="498" fill="#94a3b8" font-size="9">Three modes with increasing cost: 0 API → 1 API → 1 API + more aggressive trimming</text>
</svg>

After

Width:  |  Height:  |  Size: 9.0 KiB

View File

@@ -0,0 +1,138 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
<defs>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
</marker>
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
</marker>
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
</marker>
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
</marker>
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/>
<stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
</defs>
<!-- 背景 -->
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
<!-- タイトル -->
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
<text x="410" y="31" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Context Compact — LLM 呼び出し前に圧縮、3 つのトリガーモード</text>
<!-- ラベル -->
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 保持</text>
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 新規</text>
<!-- ===== ① messages[] ===== -->
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 保持)</text>
<!-- messages → パイプライン入口 -->
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
<!-- ===== ② 圧縮パイプライン ===== -->
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">圧縮パイプライン</text>
<!-- ── ① 毎ターン自動 ── -->
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① 毎ターン自動 · 無条件 · 0 API</text>
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
<!-- ↓ → ◇ -->
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
<!-- ◇ 判定ダイヤモンド -->
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">閾値超過?</text>
<!-- いいえ:右側注釈 -->
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">No → 通過</text>
<text x="306" y="252" fill="#94a3b8" font-size="7">直接 LLM へ</text>
<!-- はい:下注釈 -->
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">Yes↓</text>
<!-- ── ② 条件トリガー ── -->
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② 条件 · トークン閾値超過 · 1 API</text>
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
<!-- パイプライン出口 → LLM -->
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
<!-- ===== ③ LLM ===== -->
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
<!-- LLM No → 返却 -->
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600">No</text>
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">結果を返す</text>
<!-- LLM Yes → TOOL_HANDLERS -->
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600">Yes</text>
<!-- ④ TOOL_HANDLERS -->
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
<!-- LLM API 例外 → 緊急圧縮 → 次ターンで再試行 -->
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API 例外</text>
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">圧縮パイプラインへ再試行</text>
<!-- ===== ③ 緊急トリガーLLM API 失敗後) ===== -->
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ 緊急トリガー</text>
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API が prompt_too_long を返す</text>
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → リトライ</text>
<!-- ===== ループバック ===== -->
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">ツール結果を messages[] に追加 → 次ターン → 再圧縮 → LLM</text>
<!-- ===== 凡例 ===== -->
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
<text x="94" y="414" fill="#334155" font-size="10">s07 保持:ループ、フック、スキルロード、サブエージェント</text>
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
<text x="94" y="436" fill="#334155" font-size="10">① 毎ターン自動L3→L1→L2 が各 LLM 呼び出し前に無条件実行、0 API</text>
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
<text x="94" y="458" fill="#334155" font-size="10">② 条件トリガーL3/L1/L2 後もトークン超過 → compact_history、1 API</text>
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
<text x="94" y="480" fill="#334155" font-size="10">③ 緊急トリガーAPI が prompt_too_long を返す → reactive_compact → リトライ</text>
<text x="70" y="498" fill="#94a3b8" font-size="9">3 つのモードはコスト増加0 API → 1 API → 1 API + より積極的なトリム</text>
</svg>

After

Width:  |  Height:  |  Size: 9.2 KiB

View File

@@ -0,0 +1,138 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
<defs>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
</marker>
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
</marker>
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
</marker>
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
</marker>
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/>
<stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
</defs>
<!-- 背景 -->
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
<!-- 标题 -->
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
<text x="410" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Context Compact — 压缩插在 LLM 调用前,三种触发模式</text>
<!-- 标签 -->
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 保留</text>
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 新增</text>
<!-- ===== ① messages[] ===== -->
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 保留)</text>
<!-- messages → 管线入口 -->
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
<!-- ===== ② 压缩管线(内部只放标签,不画路径线) ===== -->
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">压缩管线</text>
<!-- ── ① 每轮自动 ── -->
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① 每轮自动 · 无条件 · 0 API</text>
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
<!-- ↓ → ◇ -->
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
<!-- ◇ 判断菱形(紧凑) -->
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">超阈值?</text>
<!-- 否:右侧文字标注 -->
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">否 → 通过</text>
<text x="306" y="252" fill="#94a3b8" font-size="7">直接进 LLM</text>
<!-- 是:下方文字标注 -->
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">是↓</text>
<!-- ── ② 条件触发 ── -->
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② 条件触发 · token 超阈值 · 1 API</text>
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
<!-- 管线出口 → LLM -->
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
<!-- ===== ③ LLM ===== -->
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
<!-- LLM 否 → 返回 -->
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600"></text>
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">返回结果</text>
<!-- LLM 是 → TOOL_HANDLERS -->
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600"></text>
<!-- ④ TOOL_HANDLERS -->
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
<!-- LLM API 异常 → 应急压缩 → 下一轮重试 -->
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API 异常</text>
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">重试回到压缩管线</text>
<!-- ===== ③ 异常触发LLM API 调用失败后) ===== -->
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ 异常触发</text>
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API 返回 prompt_too_long</text>
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → 重试</text>
<!-- ===== 回环y=348 在管线框底 y=334 下方,完全不穿过) ===== -->
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">工具结果追加到 messages[] → 下一轮 → 再次压缩 → LLM</text>
<!-- ===== 图例 ===== -->
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
<text x="94" y="414" fill="#334155" font-size="10">s07 保留循环、hook、技能加载、子 Agent</text>
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
<text x="94" y="436" fill="#334155" font-size="10">① 每轮自动L3→L1→L2 在每次 LLM 调用前无条件执行0 API</text>
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
<text x="94" y="458" fill="#334155" font-size="10">② 条件触发L3/L1/L2 跑完 token 仍超阈值 → compact_history1 API</text>
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
<text x="94" y="480" fill="#334155" font-size="10">③ 异常触发API 返回 prompt_too_long → reactive_compact → 重试</text>
<text x="70" y="498" fill="#94a3b8" font-size="9">三种模式的代价递增0 API → 1 API → 1 API + 更激进的裁剪</text>
</svg>

After

Width:  |  Height:  |  Size: 9.0 KiB

View File

@@ -0,0 +1,98 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
</linearGradient>
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
</linearGradient>
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
</linearGradient>
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
</marker>
</defs>
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
<!-- Title bar -->
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Context Compaction — Pre-processing Pipeline + Auto-compact + Emergency Fallback</text>
<!-- Design principles (left) -->
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Design Principles</text>
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">Cheap operations first, expensive later</text>
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">Trim text before dropping messages</text>
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">Drop messages before calling LLM</text>
<!-- Cost escalation (right) -->
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Increasing Cost</text>
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">Text ops → LLM summary → Emergency trim</text>
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
<!-- ===== Pre-processing pipeline title ===== -->
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">Pre-processing Pipeline (execution order: L3 → L1 → L2, before every LLM call, 0 API)</text>
<!-- L3: toolResultBudget -->
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result total &gt; 200KB → spill largest item</text>
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">keep full content</text>
<text x="135" y="218" fill="#2563eb" font-size="9">Trigger: every turn, before microCompact can replace full content</text>
<!-- Arrow L3→L1 -->
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
<!-- L1: snipCompact -->
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
<text x="260" y="260" fill="#1e40af" font-size="11">messages &gt; 50 → trim middle</text>
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">keep head/tail</text>
<text x="135" y="278" fill="#2563eb" font-size="9">Trigger: message count exceeds threshold</text>
<!-- Arrow L1→L2 -->
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
<!-- L2: microCompact -->
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
<text x="260" y="320" fill="#1e40af" font-size="11">old tool_result → placeholder (keep latest 3)</text>
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">compact old</text>
<text x="135" y="338" fill="#2563eb" font-size="9">Trigger: every turn automatically; tutorial uses text placeholder</text>
<!-- ===== Auto-compact title ===== -->
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">Auto-compact Decision (triggered when pre-processing is insufficient, 1 API call)</text>
<!-- L4: autoCompact -->
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
<text x="260" y="412" fill="#991b1b" font-size="11">tokens over threshold → LLM summary</text>
<text x="650" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API call</text>
<text x="135" y="428" fill="#dc2626" font-size="9">Threshold: contextWindow - maxOutputTokens - 13,000 · Try sessionMemoryCompact first, then LLM</text>
<text x="135" y="442" fill="#dc2626" font-size="9">Circuit breaker: stop retrying after 3 consecutive failures</text>
<!-- ===== Emergency fallback title ===== -->
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">Emergency Fallback (triggered when API still returns prompt_too_long)</text>
<!-- Emergency: reactiveCompact -->
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">Emrg</text>
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
<text x="135" y="528" fill="#9a3412" font-size="10">API returns 413 / prompt_too_long → byte-level trim</text>
<text x="135" y="544" fill="#c2410c" font-size="9">Keep last 5 + summary; more aggressive than autoCompact</text>
</svg>

After

Width:  |  Height:  |  Size: 6.7 KiB

View File

@@ -0,0 +1,98 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
</linearGradient>
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
</linearGradient>
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
</linearGradient>
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
</marker>
</defs>
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
<!-- タイトルバー -->
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
<text x="380" y="28" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">コンテキスト圧縮 — 前処理パイプライン + 自動圧縮 + 緊急フォールバック</text>
<!-- 設計原則(左側) -->
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">設計原則</text>
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">安価な処理を先に、高価な処理を後に</text>
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">テキスト修正 → メッセージ削除の順</text>
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">メッセージ削除 → LLM 呼び出しの順</text>
<!-- コスト増加(右側) -->
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">コスト増加</text>
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">テキスト操作 → LLM 要約 → 緊急トリム</text>
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
<!-- ===== 前処理パイプラインタイトル ===== -->
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">前処理パイプライン実行順L3 → L1 → L2、各 LLM 呼び出し前に自動実行、0 API</text>
<!-- L3: toolResultBudget -->
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result 合計 &gt; 200KB → 最大項目を退避</text>
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">完全内容を保持</text>
<text x="135" y="218" fill="#2563eb" font-size="9">トリガー毎ターン、microCompact が完全内容を置換する前に実行</text>
<!-- 矢印 L3→L1 -->
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
<!-- L1: snipCompact -->
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
<text x="260" y="260" fill="#1e40af" font-size="11">メッセージ &gt; 50 → 中間をトリム</text>
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">先頭/末尾保持</text>
<text x="135" y="278" fill="#2563eb" font-size="9">トリガー:メッセージ数が閾値を超過</text>
<!-- 矢印 L1→L2 -->
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
<!-- L2: microCompact -->
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
<text x="260" y="320" fill="#1e40af" font-size="11">古い tool_result → プレースホルダー(最新 3 件保持)</text>
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">旧結果を圧縮</text>
<text x="135" y="338" fill="#2563eb" font-size="9">トリガー:毎ターン自動実行、チュートリアル版はテキストプレースホルダーで模擬</text>
<!-- ===== 自動圧縮タイトル ===== -->
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">自動圧縮判定前処理で不足時にトリガー、1 API 呼び出し)</text>
<!-- L4: autoCompact -->
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
<text x="260" y="412" fill="#991b1b" font-size="11">トークンが閾値超過 → LLM 全量要約</text>
<text x="590" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API 呼び出し</text>
<text x="135" y="428" fill="#dc2626" font-size="9">閾値: contextWindow - maxOutputTokens - 13,000 · sessionMemoryCompact を先に試行、不足時のみ LLM 呼び出し</text>
<text x="135" y="442" fill="#dc2626" font-size="9">サーキットブレーカー:連続 3 回失敗後にリトライ停止</text>
<!-- ===== 緊急フォールバックタイトル ===== -->
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">緊急フォールバックAPI が引き続き prompt_too_long を返す場合にトリガー)</text>
<!-- 緊急: reactiveCompact -->
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">緊急</text>
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
<text x="135" y="528" fill="#9a3412" font-size="10">API が 413 / prompt_too_long を返す → バイト単位でトリム</text>
<text x="135" y="544" fill="#c2410c" font-size="9">最後の 5 件 + 要約を保持、autoCompact より積極的</text>
</svg>

After

Width:  |  Height:  |  Size: 7.1 KiB

View File

@@ -0,0 +1,98 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
</linearGradient>
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
</linearGradient>
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
</linearGradient>
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
</marker>
</defs>
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
<!-- 标题栏 -->
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">上下文压缩 — 预处理管线 + 自动压缩 + 应急兜底</text>
<!-- 左侧说明 -->
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">设计原则</text>
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">便宜的先跑,贵的后跑</text>
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">能改文本 → 不删整条</text>
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">能删整条 → 不调 LLM</text>
<!-- 右侧代价箭头 -->
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">代价递增</text>
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">文本操作 → LLM 摘要 → 应急裁剪</text>
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
<!-- ===== 预处理管线标题 ===== -->
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">预处理管线执行顺序L3 → L1 → L2每轮 LLM 调用前自动执行0 API</text>
<!-- L3: toolResultBudget -->
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result 总和 &gt; 200KB → 最大项落盘</text>
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">保留完整内容</text>
<text x="135" y="218" fill="#2563eb" font-size="9">触发:每轮自动,必须在 microCompact 之前保留完整内容</text>
<!-- 箭头 L3→L1 -->
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
<!-- L1: snipCompact -->
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
<text x="260" y="260" fill="#1e40af" font-size="11">消息 &gt; 50 条 → 裁掉中间</text>
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">保留头尾</text>
<text x="135" y="278" fill="#2563eb" font-size="9">触发:消息数超过阈值</text>
<!-- 箭头 L1→L2 -->
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
<!-- L2: microCompact -->
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
<text x="260" y="320" fill="#1e40af" font-size="11">旧 tool_result → 占位符(保留最近 3 条)</text>
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">压旧结果</text>
<text x="135" y="338" fill="#2563eb" font-size="9">触发:每轮自动,教学版用文本占位符模拟</text>
<!-- ===== 自动压缩标题 ===== -->
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">自动压缩决策预处理不够时触发1 API 调用)</text>
<!-- L4: autoCompact -->
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
<text x="260" y="412" fill="#991b1b" font-size="11">token 超阈值 → LLM 全量摘要</text>
<text x="590" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API 调用</text>
<text x="135" y="428" fill="#dc2626" font-size="9">阈值: contextWindow - maxOutputTokens - 13,000 · 先尝试 sessionMemoryCompact不够才调 LLM</text>
<text x="135" y="442" fill="#dc2626" font-size="9">熔断:连续失败 3 次后停止重试</text>
<!-- ===== 应急兜底标题 ===== -->
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">应急兜底API 仍然返回 prompt_too_long 时触发)</text>
<!-- 应急: reactiveCompact -->
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">应急</text>
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
<text x="135" y="528" fill="#9a3412" font-size="10">API 返回 413 / prompt_too_long → 字节级裁剪</text>
<text x="135" y="544" fill="#c2410c" font-size="9">保留最后 5 条 + 摘要,比 autoCompact 更激进</text>
</svg>

After

Width:  |  Height:  |  Size: 6.6 KiB

View File

@@ -0,0 +1,50 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
</marker>
</defs>
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — Large Result Persistence</text>
<!-- Pain Point -->
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">Pain Point</text>
<text x="105" y="72" fill="#991b1b" font-size="11">Model read 30 files in one turn; total tool_result adds up to 500KB, filling the entire context window</text>
<!-- Before -->
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">Before</text>
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">Total 510KB → over budget</text>
<!-- Arrow -->
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
<!-- After -->
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">After</text>
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: &lt;persisted-output&gt;</text>
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
<text x="395" y="178" fill="#166534" font-size="9">Preview: (first 2000 chars) ...</text>
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">Total 18KB → normal</text>
<!-- How it works -->
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">How</text>
<text x="70" y="234" fill="#475569" font-size="10">1. Sum the size of all tool_result in the latest turn</text>
<text x="70" y="250" fill="#475569" font-size="10">2. Over 200KB → sort by size, persist the largest to .task_outputs/tool-results/</text>
<text x="70" y="266" fill="#475569" font-size="10">3. Keep only &lt;persisted-output&gt; marker + first 2000 chars preview in context</text>
<!-- Result summary -->
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
<text x="35" y="312" fill="#166534" font-size="11">Result: No data lost (full data on disk), context drops from 510KB to ~18KB, 0 API calls</text>
</svg>

After

Width:  |  Height:  |  Size: 3.5 KiB

View File

@@ -0,0 +1,50 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
</marker>
</defs>
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — 大結果の永続化</text>
<!-- ペインポイント -->
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">ペインポイント</text>
<text x="100" y="72" fill="#991b1b" font-size="11">モデルが一度に 30 ファイルを読み込み、単一ターンの tool_result が合計 500KB に達し、コンテキストウィンドウを圧迫</text>
<!-- 圧縮前 -->
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">圧縮前</text>
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">合計 510KB → 予算超過</text>
<!-- 矢印 -->
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
<!-- 圧縮後 -->
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">圧縮後</text>
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: &lt;persisted-output&gt;</text>
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
<text x="395" y="178" fill="#166534" font-size="9">Preview: (先頭 2000 文字) ...</text>
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">合計 18KB → 正常</text>
<!-- 原理説明 -->
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">方法</text>
<text x="70" y="234" fill="#475569" font-size="10">1. 最終ターンの全 tool_result の合計サイズを集計</text>
<text x="70" y="250" fill="#475569" font-size="10">2. 200KB 超過 → サイズ順にソートし、最大のものから .task_outputs/tool-results/ に永続化</text>
<text x="70" y="266" fill="#475569" font-size="10">3. コンテキストには &lt;persisted-output&gt; マーカー + 先頭 2000 文字のプレビューのみ残す</text>
<!-- 変更サマリー -->
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
<text x="35" y="312" fill="#166534" font-size="11">結果:情報は失われていない(ディスクに完全なデータあり)、コンテキストは 510KB → ~18KB に削減、0 回 API 呼び出し</text>
</svg>

After

Width:  |  Height:  |  Size: 3.7 KiB

View File

@@ -0,0 +1,50 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
</marker>
</defs>
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — 大结果落盘</text>
<!-- 痛点 -->
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">痛点</text>
<text x="75" y="72" fill="#991b1b" font-size="11">模型一次读了 30 个文件,单轮 tool_result 加起来 500KB直接把上下文窗口打满</text>
<!-- Before -->
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">压缩前</text>
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">合计 510KB → 超预算</text>
<!-- Arrow -->
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
<!-- After -->
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">压缩后</text>
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: &lt;persisted-output&gt;</text>
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
<text x="395" y="178" fill="#166534" font-size="9">Preview: (前 2000 字符) ...</text>
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">合计 18KB → 正常</text>
<!-- 原理说明 -->
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">怎么做</text>
<text x="85" y="234" fill="#475569" font-size="10">1. 统计最后一轮所有 tool_result 的总大小</text>
<text x="85" y="250" fill="#475569" font-size="10">2. 超过 200KB → 按大小排序,从最大的开始落盘到 .task_outputs/tool-results/</text>
<text x="85" y="266" fill="#475569" font-size="10">3. 上下文里只留 &lt;persisted-output&gt; 标记 + 前 2000 字符预览</text>
<!-- 变化摘要 -->
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
<text x="35" y="312" fill="#166534" font-size="11">结果:信息没丢(磁盘有完整数据),上下文从 510KB 降到 ~18KB0 次 API 调用</text>
</svg>

After

Width:  |  Height:  |  Size: 3.5 KiB

View File

@@ -0,0 +1,57 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
</marker>
</defs>
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — Old Result Placeholder Replacement</text>
<!-- Pain Point -->
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">Pain Point</text>
<text x="110" y="70" fill="#991b1b" font-size="11">Agent read 10 files in a row; the full content of reads 1-7 is still sitting in context, taking space but no longer useful</text>
<!-- Before -->
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">Before (all 10 tool_result complete)</text>
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (full content, 3200 chars)...</text>
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (full content, 1800 chars)...</text>
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (full content, 4500 chars)...</text>
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (full content, 2800 chars)</text>
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 old results waste ~25K chars</text>
<!-- Arrow -->
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
<!-- After -->
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">After (keep only latest 3 complete)</text>
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (full content, 2800 chars)</text>
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">Keep only latest 3; first 7 become placeholders</text>
<!-- How -->
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">How (teaching version)</text>
<text x="155" y="248" fill="#475569" font-size="10">Iterate through tool_result, keep only latest 3 complete, replace older ones with placeholders.</text>
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">Real CC</text>
<text x="95" y="264" fill="#475569" font-size="10">Clears old results via API cache_edits (without breaking prompt cache prefix), only for COMPACTABLE_TOOLS:</text>
<text x="95" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write. Teaching version uses text placeholders to simulate the same effect.</text>
</svg>

After

Width:  |  Height:  |  Size: 4.4 KiB

View File

@@ -0,0 +1,57 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
</marker>
</defs>
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — 旧結果のプレースホルダー置換</text>
<!-- ペインポイント -->
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">ペインポイント</text>
<text x="115" y="70" fill="#991b1b" font-size="11">Agent が連続で 10 ファイルを読み込み、1〜7 回目の完全なファイル内容がコンテキストに残ったまま、場所を占有しつつ既に不要</text>
<!-- 圧縮前 -->
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">圧縮前10 件の tool_result がすべて完全)</text>
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (完全な内容, 3200 文字)...</text>
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (完全な内容, 1800 文字)...</text>
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (完全な内容, 4500 文字)...</text>
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完全な内容, 2800 文字)</text>
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 件の旧結果が ~25K 文字を無駄に占有</text>
<!-- 矢印 -->
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
<!-- 圧縮後 -->
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">圧縮後(最新 3 件のみ完全保持)</text>
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完全な内容, 2800 文字)</text>
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">最新 3 件のみ保持、前 7 件はプレースホルダー化</text>
<!-- 原理 -->
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">方法(教学版)</text>
<text x="130" y="248" fill="#475569" font-size="10">tool_result を走査し、最新 3 件のみ完全保持、古いものはプレースホルダーに置換。</text>
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">実際の CC</text>
<text x="110" y="264" fill="#475569" font-size="10">API cache_edits で旧結果をクリアprompt cache プレフィックスを破壊しない、COMPACTABLE_TOOLS のみ対象:</text>
<text x="110" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write。教学版はテキストプレースホルダーで同様の効果を模擬。</text>
</svg>

After

Width:  |  Height:  |  Size: 4.7 KiB

View File

@@ -0,0 +1,57 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
</marker>
</defs>
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — 旧结果占位替换</text>
<!-- 痛点 -->
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">痛点</text>
<text x="75" y="70" fill="#991b1b" font-size="11">Agent 连续读了 10 个文件,第 1-7 次的完整文件内容还躺在上下文里,占着位置但早就没用了</text>
<!-- Before -->
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">压缩前10 条 tool_result 全部完整)</text>
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (完整内容, 3200 字符)...</text>
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (完整内容, 1800 字符)...</text>
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (完整内容, 4500 字符)...</text>
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完整内容, 2800 字符)</text>
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 条旧结果白占 ~25K 字符</text>
<!-- Arrow -->
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
<!-- After -->
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">压缩后(只保留最近 3 条完整)</text>
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完整内容, 2800 字符)</text>
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">只保留最近 3 条,前 7 条变占位</text>
<!-- 原理 -->
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">怎么做(教学版)</text>
<text x="115" y="248" fill="#475569" font-size="10">遍历 tool_result只保留最近 3 条完整,更旧的替换为占位符。</text>
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">真实 CC</text>
<text x="95" y="264" fill="#475569" font-size="10">通过 API cache_edits 清除旧结果(不破坏 prompt cache 前缀),仅对 COMPACTABLE_TOOLS 生效:</text>
<text x="95" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write。教学版用文本占位模拟同样效果。</text>
</svg>

After

Width:  |  Height:  |  Size: 4.4 KiB