* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building incrementally on the previous. Key fixes across chapters: - s01-s04: agent loop, tool dispatch, permission pipeline, hooks - s05-s08: todo write, subagent, skill loading, context compact - s09-s11: memory system, system prompt assembly, error recovery - s12-s14: task graph, background tasks, cron scheduler All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS, json.dumps cache, real-state context, can_start dep protection, etc.). * feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform chapters. Each chapter inherits all previous fixes and adds one mechanism: - s15: agent teams (TeamCreate, teammate threads, shared task list) - s16: team protocols (plan approval, shutdown handshake, consume_inbox) - s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox) - s18: worktree isolation (git worktree, bind_task, cwd switching, safety) - s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache) All appendix source code references verified against CC source. Config priority corrected: claude.ai < plugin < user < project < local. * fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash - s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02) - s06-s08: todo_write validates content/status required fields (inherited from s05) - s09: extract_memories uses pre-compression snapshot instead of compacted messages - s16: submit_plan docstring clarifies protocol-only (not code-level gate) - s17-s19: match_response restores type mismatch validation (from s16) - s17-s19: claim_task deps list handles missing dep files without crashing * fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation - s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task, non-interactive/SDK defaults to TodoWrite. Fix env var name to CLAUDE_CODE_ENABLE_TASKS (not TODO_V2). - s14/s15: add _validate_cron_field with per-field range checks (minute 0-59, hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi. Replace old try/except validation that only caught exceptions. - s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree, not just create_worktree. * fix: align s16-s19 teaching tool consistency * fix pr265 chapter diagrams * Add comprehensive s20 harness chapter * Fix chapter smoke test regressions * Clarify README tutorial track transition --------- Co-authored-by: Haoran <bill-billion@outlook.com>
293
s08_context_compact/README.en.md
Normal file
@@ -0,0 +1,293 @@
|
||||
# s08: Context Compact — Context Will Fill Up, Have a Way to Make Room
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
|
||||
> *"Context will fill up — have a way to make room"* — Four-layer compression pipeline: cheap first, expensive last.
|
||||
>
|
||||
> **Harness Layer**: Compression — clean memory, unlimited sessions.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
The agent is running along, then freezes.
|
||||
|
||||
It has bash, read, write — all the capabilities it needs. But it read a 1000-line file (~4000 tokens), then read 30 more files, ran 20 commands. Every command's output, every file's contents, all pile up in the `messages` list.
|
||||
|
||||
The context window is finite. Once full, the API outright rejects the call: `prompt_too_long`.
|
||||
|
||||
Without compression, an agent simply cannot work on large projects.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
The hook structure, skill loading, and sub-Agent from s07 are preserved, with some tools omitted to focus on compaction. The core change: insert three pre-processors (0 API calls) before each LLM call, trigger an LLM summary (1 API call) when tokens still exceed the threshold, and emergency-trim if the API throws an error.
|
||||
|
||||
Core design: cheap first, expensive last.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||

|
||||
|
||||
### L1: snip_compact — Trim Irrelevant Old Conversation
|
||||
|
||||
The agent ran 80 turns of conversation, accumulating 160 `messages`. The very first "help me create hello.py" is barely relevant to current work, yet it still occupies space.
|
||||
|
||||
Message count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle:
|
||||
|
||||
```python
|
||||
def snip_compact(messages, max_messages=50):
|
||||
if len(messages) <= max_messages:
|
||||
return messages
|
||||
keep_head, keep_tail = 3, max_messages - 3
|
||||
snipped = len(messages) - keep_head - keep_tail
|
||||
placeholder = {"role": "user",
|
||||
"content": f"[snipped {snipped} messages from conversation middle]"}
|
||||
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
|
||||
```
|
||||
|
||||
Entire messages are trimmed, but `tool_result` content within remaining messages keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.
|
||||
|
||||
### L2: micro_compact — Placeholder for Old Tool Results
|
||||
|
||||

|
||||
|
||||
The agent read 10 files consecutively. The full contents of reads 1–7 are still sitting in context, no longer needed, but hogging large amounts of space.
|
||||
|
||||
Keep only the 3 most recent `tool_result` entries intact; replace older ones with a one-line placeholder:
|
||||
|
||||
```python
|
||||
KEEP_RECENT_TOOL_RESULTS = 3
|
||||
|
||||
def micro_compact(messages):
|
||||
tool_results = collect_tool_result_blocks(messages)
|
||||
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
|
||||
return messages
|
||||
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
|
||||
if len(block.get("content", "")) > 120:
|
||||
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
|
||||
return messages
|
||||
```
|
||||
|
||||
Old results are cleared, but a single new result can be 500KB — one `cat` of a large file can max out the context. → L3.
|
||||
|
||||
### L3: tool_result_budget — Persist Large Results to Disk
|
||||
|
||||

|
||||
|
||||
The model read 5 large files in one go; all `tool_result` blocks in the last user message total 500KB.
|
||||
|
||||
Sum the size of all `tool_result` blocks in the last user message. If over 200KB → sort by size, starting from the largest, persist to `.task_outputs/tool-results/`, keeping only a `<persisted-output>` marker + a 2000-character preview in context. The model sees the marker and knows the full content is on disk, re-reading it when needed.
|
||||
|
||||
```python
|
||||
def tool_result_budget(messages, max_bytes=200_000):
|
||||
last = messages[-1]
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"])
|
||||
if b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= max_bytes:
|
||||
return messages
|
||||
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
|
||||
for idx, block in ranked:
|
||||
if total <= max_bytes:
|
||||
break
|
||||
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
|
||||
total = recalculate_total(blocks)
|
||||
return messages
|
||||
```
|
||||
|
||||
The first three layers are all plain-text / structural operations — 0 API calls — but they cannot "understand" conversation content. Context may still be too large. → L4.
|
||||
|
||||
### L4: compact_history — Full LLM Summary
|
||||
|
||||

|
||||
|
||||
All three previous layers have run, but after 30 minutes of continuous work on a huge project, tokens still exceed the threshold.
|
||||
|
||||
Three-step process:
|
||||
|
||||
1. **Save transcript**: Write the full conversation to `.transcripts/` in JSONL format. The transcript preserves a recoverable record, but the model's active context only contains the summary. For the model's current reasoning, the details are no longer in context. The teaching code does not provide a transcript retrieval tool.
|
||||
2. **LLM generates summary**: Send conversation history to the LLM, asking it to preserve key information: current goals, important findings, modified files, remaining work, user constraints, etc.
|
||||
3. **Replace message list**: All old messages are replaced with a single summary. The teaching version only keeps the summary; the real Claude Code re-attaches some recent files, plans, agent/skill/tool context after compaction.
|
||||
|
||||
```python
|
||||
def compact_history(messages):
|
||||
transcript_path = write_transcript(messages) # Save full conversation first
|
||||
summary = summarize_history(messages) # LLM generates summary
|
||||
return [{"role": "user",
|
||||
"content": f"[Compacted]\n\n{summary}"}]
|
||||
```
|
||||
|
||||
**Circuit breaker**: After 3 consecutive failures, stop retrying to prevent an infinite loop wasting API calls.
|
||||
|
||||
### Reactive: reactive_compact
|
||||
|
||||
Sometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers.
|
||||
|
||||
This triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, trimming to an API-acceptable size with byte-level precision, keeping only the last 5 messages + summary.
|
||||
|
||||
```python
|
||||
def reactive_compact(messages):
|
||||
transcript = write_transcript(messages)
|
||||
summary = summarize_history(messages)
|
||||
tail = messages[-5:]
|
||||
return [{"role": "user",
|
||||
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
|
||||
```
|
||||
|
||||
Reactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11.
|
||||
|
||||
### Putting It All Together
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# Three pre-processors (0 API calls)
|
||||
# Order: budget first, so large content is persisted before placeholders
|
||||
messages[:] = tool_result_budget(messages) # L3: persist large results
|
||||
messages[:] = snip_compact(messages) # L1: trim middle
|
||||
messages[:] = micro_compact(messages) # L2: old result placeholders
|
||||
|
||||
# Still too much? LLM summary (1 API call)
|
||||
if estimate_token_count(messages) > THRESHOLD:
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(...)
|
||||
except PromptTooLongError:
|
||||
if reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
messages[:] = reactive_compact(messages) # Emergency
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise # retry limit exceeded, raise exception
|
||||
# ... tool execution ...
|
||||
|
||||
# compact tool: when the model actively calls it, triggers compact_history
|
||||
if block.name == "compact":
|
||||
messages[:] = compact_history(messages)
|
||||
results.append({..., "content": "[Compacted. History summarized.]"})
|
||||
messages.append({"role": "user", "content": results})
|
||||
break # end current turn, start fresh with compacted context
|
||||
```
|
||||
|
||||
**The order must not be swapped.** L3 (budget) runs before L2 (micro) because micro replaces old large tool_results with one-line placeholders — budget must persist the full content before that happens. This is why CC source puts `applyToolResultBudget` first.
|
||||
|
||||
---
|
||||
|
||||
## Changes From s07
|
||||
|
||||
| Component | Before (s07) | After (s08) |
|
||||
|-----------|-------------|-------------|
|
||||
| Context management | None (context grows unbounded) | Four-layer compression pipeline + emergency |
|
||||
| New functions | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
|
||||
| Tools | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
|
||||
| Loop | LLM call → tool execution | Three pre-processors before each turn + threshold-triggered compact_history |
|
||||
| Design principle | — | Cheap first, expensive last |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s08_context_compact/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md` (read multiple files consecutively, observe L2 compressing old results)
|
||||
2. `Read every file in s08_context_compact/` (read a large amount of content at once, observe L3 persisting to disk)
|
||||
3. Chat for 20+ turns, observe whether `[auto compact]` or `[reactive compact]` appears
|
||||
|
||||
What to watch for: After each tool execution, are old `tool_result` entries compressed? When tokens exceed the threshold after extended conversation, is summarization triggered automatically?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
Context compression lets an agent run for a long time without crashing. But after each compression, the preferences and constraints the user told it are also lost. Can we let the agent selectively remember important things?
|
||||
|
||||
s09 Memory → three subsystems: choosing what to remember, extracting key information, consolidating and organizing. Across compressions, across sessions.
|
||||
|
||||
<details>
|
||||
<summary>Deep Dive Into CC Source Code</summary>
|
||||
|
||||
> The following is based on analysis of CC source code `compact.ts`, `autoCompact.ts`, `microCompact.ts`, and `query.ts`.
|
||||
|
||||
### Execution Order Comparison
|
||||
|
||||
The teaching version labels layers L1/L2/L3/L4 for pedagogical clarity, but actual execution order does not match the numbering:
|
||||
|
||||
| Dimension | Teaching Version | Claude Code |
|
||||
|-----------|-----------------|-------------|
|
||||
| Execution order | budget → snip → micro → auto | budget → snip → micro → collapse → auto (`query.ts:379-468`) |
|
||||
| snip_compact | Keep head 3 + tail 47 | CC only enables on main thread; implementation not in open-source repo (`HISTORY_SNIP` feature gate), but interface is visible: `snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`, also exposes `SnipTool` for model-initiated snipping. Teaching version's 3/47 are simplified parameters |
|
||||
| micro_compact | Text placeholder replacement | Two paths: time-based clears content directly, cached uses API `cache_edits` (legacy path removed) |
|
||||
| micro_compact whitelist | By position (most recent 3) | time-based triggers by time threshold; cached triggers by count (`microCompact.ts`) |
|
||||
| tool_result_budget | 200KB characters | 200,000 characters (`toolLimits.ts:49`) |
|
||||
| compact_history threshold | Character count estimate | Precise tokens: `contextWindow - maxOutputTokens - 13_000` |
|
||||
| Summary requirements | 5 categories of info | 9 sections + `<analysis>`/`<summary>` dual tags |
|
||||
| Compression prompt | Simple prompt | Double-ended hard guardrails forbidding tool calls |
|
||||
| PTL retry | Yes (simplified) | `truncateHeadForPTLRetry()` retreats by message groups (`compact.ts:243-290`) |
|
||||
| Post-compaction recovery | None (teaching version only keeps summary) | Auto re-read recent files, plans, agent/skill/tool context |
|
||||
| Circuit breaker | 3 times | 3 times (`autoCompact.ts:70`) |
|
||||
| Reactive retry | 1 time | CC has more granular tiered retries |
|
||||
|
||||
### Execution Order Details
|
||||
|
||||
The real order in CC source `query.ts`:
|
||||
|
||||
1. `applyToolResultBudget` (L379): persist large results first, ensuring full content is saved
|
||||
2. `snipCompact` (L403): trim middle messages
|
||||
3. `microcompact` (L414): old result placeholders
|
||||
4. `contextCollapse` (L441): independent context management system (not in teaching version)
|
||||
5. `autoCompact` (L454): LLM full summary
|
||||
|
||||
The teaching version's budget → snip → micro order matches this. The teaching version does not have the contextCollapse mechanism.
|
||||
|
||||
### Full Constant Reference
|
||||
|
||||
| Constant | Value | Source File |
|
||||
|----------|-------|-------------|
|
||||
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
|
||||
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
|
||||
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
|
||||
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
|
||||
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
|
||||
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
|
||||
| Time micro_compact interval | 60 minutes | `timeBasedMCConfig.ts` |
|
||||
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
|
||||
|
||||
### contextCollapse and sessionMemoryCompact
|
||||
|
||||
CC source code has two additional mechanisms not covered in this teaching version:
|
||||
|
||||
- **contextCollapse**: An independent context management system that, when enabled, suppresses proactive autocompact (`autoCompact.ts:215-222`), with collapse's commit/blocking flow taking over context management. Manual `/compact` and reactive fallback remain independent paths, unaffected by contextCollapse.
|
||||
- **sessionMemoryCompact**: Before compact_history, CC first attempts a lightweight summary using existing session memory (covered in s09) without calling the LLM. This mechanism becomes clearer after learning s09.
|
||||
|
||||
### What Does the Compression Prompt Look Like?
|
||||
|
||||
CC's compression prompt has two hard requirements:
|
||||
|
||||
1. **Absolutely no tool calls**: It begins with `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`, and appends another REMINDER at the end
|
||||
2. **Analyze first, then summarize**: The model must first reason in an `<analysis>` tag, then output the formal summary in a `<summary>` tag. The analysis is stripped during formatting
|
||||
|
||||
### Teaching Version Simplifications Are Intentional
|
||||
|
||||
- micro_compact uses text placeholders → we don't have API-level `cache_edits` access
|
||||
- Tokens estimated via character count → precise tokenizers are out of scope
|
||||
- Post-compaction recovery omitted → teaching version only keeps summary, does not auto re-attach files
|
||||
- Two auxiliary mechanisms not covered → they fall in the 10% detail category
|
||||
|
||||
The core design principle, cheap first, expensive last, is fully preserved.
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
293
s08_context_compact/README.ja.md
Normal file
@@ -0,0 +1,293 @@
|
||||
# s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
|
||||
> *"Context will fill up — have a way to make room"* — 4層圧縮戦略、安価なものを先に、高価なものを後に実行。
|
||||
>
|
||||
> **Harness レイヤー**: 圧縮 — クリーンな記憶、無限のセッション。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
Agent が動いている途中で、止まってしまう。
|
||||
|
||||
bash、read、write は揃っており、能力は十分。しかし 1000 行のファイル(~4000 token)を読み、さらに 30 のファイルを読み、20 のコマンドを実行したとします。各コマンドの出力、各ファイルの内容がすべて `messages` リストに蓄積されます。
|
||||
|
||||
コンテキストウィンドウには上限があります。満杯になると、API は即座に拒否します:`prompt_too_long`。
|
||||
|
||||
圧縮しなければ、Agent は大規模プロジェクトではまともに動けません。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
s07 のフック構造、スキルロード、サブ Agent の骨格を維持し、圧縮に焦点を当てるため一部のツールは省略。コアの変更点:各 LLM 呼び出し前に 3 層のプリプロセッサ(0 API)を挿入し、token が閾値を超えた場合は LLM 要約(1 API)をトリガー、API エラー時には緊急トリムを実行。
|
||||
|
||||
コア設計:安価なものを先に、高価なものを後に。
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||

|
||||
|
||||
### L1: snip_compact — 無関係な古い会話を切り捨て
|
||||
|
||||
Agent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。
|
||||
|
||||
メッセージ数が 50 を超えた場合 → 先頭 3 件(初期コンテキスト)と末尾 47 件(現在の作業)を保持し、中間を切り捨て:
|
||||
|
||||
```python
|
||||
def snip_compact(messages, max_messages=50):
|
||||
if len(messages) <= max_messages:
|
||||
return messages
|
||||
keep_head, keep_tail = 3, max_messages - 3
|
||||
snipped = len(messages) - keep_head - keep_tail
|
||||
placeholder = {"role": "user",
|
||||
"content": f"[snipped {snipped} messages from conversation middle]"}
|
||||
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
|
||||
```
|
||||
|
||||
メッセージ全体は切り捨てたが、残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。
|
||||
|
||||
### L2: micro_compact — 古いツール結果をプレースホルダに置換
|
||||
|
||||

|
||||
|
||||
Agent が連続して 10 個のファイルを読んだ。1〜7 回目の完全な内容はまだコンテキストに残っており、もう不要だが、大量のスペースを占有している。
|
||||
|
||||
直近 3 件の `tool_result` の完全な内容のみを保持し、それより古いものは 1 行のプレースホルダに置換:
|
||||
|
||||
```python
|
||||
KEEP_RECENT_TOOL_RESULTS = 3
|
||||
|
||||
def micro_compact(messages):
|
||||
tool_results = collect_tool_result_blocks(messages)
|
||||
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
|
||||
return messages
|
||||
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
|
||||
if len(block.get("content", "")) > 120:
|
||||
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
|
||||
return messages
|
||||
```
|
||||
|
||||
古い結果はクリーンアップされたが、1 件の新しい結果だけで 500KB の可能性がある。大きなファイルを `cat` するだけでコンテキストがいっぱいになる。→ L3。
|
||||
|
||||
### L3: tool_result_budget — 大きな結果をディスクに退避
|
||||
|
||||

|
||||
|
||||
モデルが一度に 5 つの大きなファイルを読み、1 つの user メッセージ内の全 `tool_result` の合計が 500KB に達した。
|
||||
|
||||
最後の user メッセージ内のすべての `tool_result` の合計サイズを集計。200KB を超えた場合 → サイズ順にソートし、最大のものから順に `.task_outputs/tool-results/` に退避。コンテキストには `<persisted-output>` マーカー + 先頭 2000 文字のプレビューのみを残す。モデルはマーカーを見て完全な内容がディスク上にあることを認識し、必要に応じて再読み込みできる。
|
||||
|
||||
```python
|
||||
def tool_result_budget(messages, max_bytes=200_000):
|
||||
last = messages[-1]
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"])
|
||||
if b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= max_bytes:
|
||||
return messages
|
||||
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
|
||||
for idx, block in ranked:
|
||||
if total <= max_bytes:
|
||||
break
|
||||
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
|
||||
total = recalculate_total(blocks)
|
||||
return messages
|
||||
```
|
||||
|
||||
最初の 3 層はすべて純粋なテキスト/構造操作(0 API 呼び出し)だが、会話内容を「理解」することはできない。コンテキストがまだ大きすぎる可能性がある。→ L4。
|
||||
|
||||
### L4: compact_history — LLM 全量要約
|
||||
|
||||

|
||||
|
||||
最初の 3 層がすべて実行されたが、超大規模プロジェクトで 30 分間連続作業すると、token がまだ閾値を超えている。
|
||||
|
||||
3 ステップのフロー:
|
||||
|
||||
1. **transcript を保存**:完全な会話を `.transcripts/` に JSONL 形式で書き出す。transcript は回復可能な記録として保存されるが、モデルのアクティブなコンテキストには要約しか残らない。モデルの現在の推論にとって、詳細はすでにコンテキストにない。教学コードは transcript 検索ツールを提供しない。
|
||||
2. **LLM で要約を生成**:会話履歴を LLM に送り、現在の目標、重要な発見、変更済みファイル、残りの作業、ユーザーの制約などの重要な情報を保持するよう指示。
|
||||
3. **メッセージリストを置換**:すべての古いメッセージが 1 件の要約に置き換えられる。教学版は要約のみを保持する。実際の Claude Code は compact 後に直近のファイル、計画、agent/skill/tool などのコンテキストを再付加する。
|
||||
|
||||
```python
|
||||
def compact_history(messages):
|
||||
transcript_path = write_transcript(messages) # 先に完全な会話を保存
|
||||
summary = summarize_history(messages) # LLM で要約を生成
|
||||
return [{"role": "user",
|
||||
"content": f"[Compacted]\n\n{summary}"}]
|
||||
```
|
||||
|
||||
**サーキットブレーカー**:連続 3 回失敗したらリトライを停止し、無限ループによる API 呼び出しの浪費を防止。
|
||||
|
||||
### 緊急: reactive_compact
|
||||
|
||||
API がまだ `prompt_too_long`(413)を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。
|
||||
|
||||
この時 **reactive_compact** がトリガーされる:compact_history よりもさらに積極的で、末尾からバイト単位の精度で API が受け入れ可能なサイズまで切り詰め、最後の 5 件のメッセージ + 要約のみを保持。
|
||||
|
||||
```python
|
||||
def reactive_compact(messages):
|
||||
transcript = write_transcript(messages)
|
||||
summary = summarize_history(messages)
|
||||
tail = messages[-5:]
|
||||
return [{"role": "user",
|
||||
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
|
||||
```
|
||||
|
||||
reactive compact にはリトライ上限がある(デフォルト 1 回)。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。
|
||||
|
||||
### 合わせて実行
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# 3 つのプリプロセッサ(0 API 呼び出し)
|
||||
# 順序:budget を先に実行し、大きな内容をプレースホルダ化する前に退避
|
||||
messages[:] = tool_result_budget(messages) # L3: 大きな結果を退避
|
||||
messages[:] = snip_compact(messages) # L1: 中間を切り捨て
|
||||
messages[:] = micro_compact(messages) # L2: 古い結果をプレースホルダに
|
||||
|
||||
# まだ足りない?LLM 要約(1 API 呼び出し)
|
||||
if estimate_token_count(messages) > THRESHOLD:
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(...)
|
||||
except PromptTooLongError:
|
||||
if reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
messages[:] = reactive_compact(messages) # 緊急対応
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise # リトライ上限超過、例外をスロー
|
||||
# ... ツール実行 ...
|
||||
|
||||
# compact ツール:モデルが能動的に呼び出した場合、compact_history をトリガー
|
||||
if block.name == "compact":
|
||||
messages[:] = compact_history(messages)
|
||||
results.append({..., "content": "[Compacted. History summarized.]"})
|
||||
messages.append({"role": "user", "content": results})
|
||||
break # 現在のターンを終了し、圧縮後のコンテキストで新しく開始
|
||||
```
|
||||
|
||||
**順序は変えられない。** L3(budget)が L2(micro)の前に実行される理由:micro は古い大きな tool_result を 1 行のプレースホルダに置換するため、budget はその前に完全な内容を退避させる必要がある。CC ソースが `applyToolResultBudget` を最初に配置する理由も同じ。
|
||||
|
||||
---
|
||||
|
||||
## s07 からの変更点
|
||||
|
||||
| コンポーネント | 変更前 (s07) | 変更後 (s08) |
|
||||
|------|-----------|-----------|
|
||||
| コンテキスト管理 | なし(コンテキストが無限に膨張) | 4 層圧縮パイプライン + 緊急対応 |
|
||||
| 新規関数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
|
||||
| ツール | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
|
||||
| ループ | LLM 呼び出し → ツール実行 | 各ラウンド前に 3 層プリプロセッサを実行 + 閾値で compact_history をトリガー |
|
||||
| 設計原則 | — | 安価なものを先に、高価なものを後に |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s08_context_compact/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみてください:
|
||||
|
||||
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(連続して複数のファイルを読み、L2 の古い結果圧縮を観察)
|
||||
2. `Read every file in s08_context_compact/`(一度に大量の内容を読み込み、L3 のディスク退避を観察)
|
||||
3. 20+ ラウンドの対話を繰り返し、`[auto compact]` または `[reactive compact]` が表示されるか観察
|
||||
|
||||
観察のポイント:ツール実行のたびに、古い tool_result は圧縮されているか?連続対話で token が閾値を超えたとき、要約が自動的にトリガーされたか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
コンテキスト圧縮により、Agent は長時間クラッシュせずに動けるようになった。しかし、圧縮のたびにユーザーが以前に伝えた偏好や制約も一緒に失われてしまう。Agent が重要なことを選択的に記憶できるようにできないか?
|
||||
|
||||
s09 Memory → 3 つのサブシステム:何を記憶するかの選択、重要情報の抽出、整理と統合。圧縮を越え、セッションを越えて。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードの詳細</summary>
|
||||
|
||||
> 以下は CC ソースコード `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` の分析に基づく。
|
||||
|
||||
### 実行順序の対応
|
||||
|
||||
教学版は説明の便宜上 L1/L2/L3/L4 と番号を振っているが、実際の実行順序は番号と完全には一致しない:
|
||||
|
||||
| 項目 | 教学版 | Claude Code |
|
||||
|------|--------|-------------|
|
||||
| 実行順序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto(`query.ts:379-468`) |
|
||||
| snip_compact | 先頭 3 + 末尾 47 を保持 | CC はメインスレッドのみ有効;実装はオープンソースリポジトリにない(`HISTORY_SNIP` feature gate)、インターフェースは確認可能:`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`、`SnipTool` もモデルが能動的に呼び出し可能。教学版の 3/47 は簡略パラメータ |
|
||||
| micro_compact | テキストプレースホルダで置換 | 2 つのパス:time-based は直接内容をクリア、cached は API の `cache_edits` を使用(legacy パスは削除済み) |
|
||||
| micro_compact ホワイトリスト | 位置による(直近 3 件) | time-based は時間閾値でトリガー、cached はカウントでトリガー(`microCompact.ts`) |
|
||||
| tool_result_budget | 200KB 文字 | 200,000 文字(`toolLimits.ts:49`) |
|
||||
| compact_history 閾値 | 文字数で推定 | 精密な token 数:`contextWindow - maxOutputTokens - 13_000` |
|
||||
| 要約の要求 | 5 種類の情報 | 9 つのセクション + `<analysis>`/`<summary>` デュアルタグ |
|
||||
| 圧縮プロンプト | シンプルなプロンプト | 先頭と末尾に二重の安全ガードでツール呼び出しを禁止 |
|
||||
| PTL retry | あり(簡略版) | `truncateHeadForPTLRetry()` がメッセージグループ単位でロールバック(`compact.ts:243-290`) |
|
||||
| 圧縮後のリカバリ | なし(教学版は要約のみ保持) | 直近のファイル、計画、agent/skill/tool などの自動再付加 |
|
||||
| サーキットブレーカー | 3 回 | 3 回(`autoCompact.ts:70`) |
|
||||
| reactive リトライ | 1 回 | CC にはより精緻な段階別リトライがある |
|
||||
|
||||
### 実行順序の詳細
|
||||
|
||||
CC ソース `query.ts` での実際の順序:
|
||||
|
||||
1. `applyToolResultBudget`(L379):まず大きな結果を処理し、完全な内容を退避
|
||||
2. `snipCompact`(L403):中間メッセージを切り捨て
|
||||
3. `microcompact`(L414):古い結果のプレースホルダ化
|
||||
4. `contextCollapse`(L441):独立したコンテキスト管理システム(教学版にはなし)
|
||||
5. `autoCompact`(L454):LLM 全量要約
|
||||
|
||||
教学版の budget → snip → micro の順序はこれと一致する。教学版には contextCollapse メカニズムがない。
|
||||
|
||||
### 完全な定数リファレンス
|
||||
|
||||
| 定数 | 値 | ソースファイル |
|
||||
|------|-----|--------|
|
||||
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
|
||||
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
|
||||
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
|
||||
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
|
||||
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
|
||||
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
|
||||
| 時間ベース micro_compact 間隔 | 60 分 | `timeBasedMCConfig.ts` |
|
||||
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
|
||||
|
||||
### contextCollapse と sessionMemoryCompact
|
||||
|
||||
CC ソースコードには、この教学版では展開していない 2 つのメカニズムが存在する:
|
||||
|
||||
- **contextCollapse**:独立したコンテキスト管理システム。有効時には proactive autocompact を抑制し(`autoCompact.ts:215-222`)、collapse の commit/blocking フローがコンテキスト管理を引き継ぐ。ただし manual `/compact` と reactive fallback は独立パスのままで、contextCollapse の影響を受けない。
|
||||
- **sessionMemoryCompact**:compact_history の前に、CC は既存の session memory(s09 で解説)を使った軽量要約を先に試みる。LLM を呼び出さない。このメカニズムは s09 を学んだ後に振り返るとより理解しやすい。
|
||||
|
||||
### 圧縮プロンプトの中身
|
||||
|
||||
CC の圧縮プロンプトには 2 つの厳格な要件がある:
|
||||
|
||||
1. **ツール呼び出しの絶対禁止**:冒頭が `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.` で、末尾にも再度 REMINDER がある
|
||||
2. **先に分析してから要約**:モデルはまず `<analysis>` タグで思考を整理し、その後 `<summary>` タグで正式な要約を出力する。analysis はフォーマット時に除去される
|
||||
|
||||
### 教学版の簡略化は意図的
|
||||
|
||||
- micro_compact でテキストプレースホルダを使用 → API 層の `cache_edits` 権限がないため
|
||||
- token を文字数で推定 → 精密な tokenizer は教学の対象外
|
||||
- 圧縮後のリカバリを省略 → 教学版は要約のみを保持し、ファイルの自動再付加を行わない
|
||||
- 2 つの補助メカニズムを展開しない → 10% の細部に属する
|
||||
|
||||
コア設計思想、安価なものを先に高価なものを後に、は完全に保持されている。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
293
s08_context_compact/README.md
Normal file
@@ -0,0 +1,293 @@
|
||||
# s08: Context Compact — 上下文总会满,要有办法腾地方
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
|
||||
> *"上下文总会满, 要有办法腾地方"* — 四层压缩策略, 便宜的先跑贵的后跑。
|
||||
>
|
||||
> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
Agent 跑着跑着,不动了。
|
||||
|
||||
手里有 bash、有 read、有 write,能力是够的。但它读了一个 1000 行的文件(~4000 token),又读了 30 个文件,跑了 20 条命令。每条命令的输出、每个文件的内容,全都堆在 `messages` 列表里。
|
||||
|
||||
上下文窗口是有限的。满了之后,API 直接拒绝:`prompt_too_long`。
|
||||
|
||||
不压缩,Agent 根本没法在大项目里干活。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
保留 s07 的 hook 结构、技能加载、子 Agent 等骨架,省略部分工具细节以聚焦压缩。核心变动:每轮 LLM 调用前插入三层预处理器(0 API),token 仍超阈值时触发 LLM 摘要(1 API),API 报错时应急裁剪。
|
||||
|
||||
核心设计:便宜的先跑,贵的后跑。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||

|
||||
|
||||
### L1: snip_compact — 裁掉无关的旧对话
|
||||
|
||||
Agent 跑了 80 轮对话,`messages` 攒了 160 条。最前面的"帮我创建 hello.py"和当前工作几乎无关了,但全占着位置。
|
||||
|
||||
消息数超过 50 条 → 保留头部 3 条(初始上下文)和尾部 47 条(当前工作),中间裁掉:
|
||||
|
||||
```python
|
||||
def snip_compact(messages, max_messages=50):
|
||||
if len(messages) <= max_messages:
|
||||
return messages
|
||||
keep_head, keep_tail = 3, max_messages - 3
|
||||
snipped = len(messages) - keep_head - keep_tail
|
||||
placeholder = {"role": "user",
|
||||
"content": f"[snipped {snipped} messages from conversation middle]"}
|
||||
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
|
||||
```
|
||||
|
||||
裁掉了整条消息,但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。
|
||||
|
||||
### L2: micro_compact — 旧工具结果占位
|
||||
|
||||

|
||||
|
||||
Agent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里,早就不需要了,但占着大量空间。
|
||||
|
||||
只保留最近 3 条 `tool_result` 的完整内容,更旧的替换为一行占位符:
|
||||
|
||||
```python
|
||||
KEEP_RECENT_TOOL_RESULTS = 3
|
||||
|
||||
def micro_compact(messages):
|
||||
tool_results = collect_tool_result_blocks(messages)
|
||||
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
|
||||
return messages
|
||||
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
|
||||
if len(block.get("content", "")) > 120:
|
||||
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
|
||||
return messages
|
||||
```
|
||||
|
||||
旧结果清掉了,但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。
|
||||
|
||||
### L3: tool_result_budget — 大结果落盘
|
||||
|
||||

|
||||
|
||||
模型一次读了 5 个大文件,单条 user 消息里所有 `tool_result` 加起来 500KB。
|
||||
|
||||
统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序,从最大的开始落盘到 `.task_outputs/tool-results/`,上下文里只留 `<persisted-output>` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上,需要时可以重新读。
|
||||
|
||||
```python
|
||||
def tool_result_budget(messages, max_bytes=200_000):
|
||||
last = messages[-1]
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"])
|
||||
if b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= max_bytes:
|
||||
return messages
|
||||
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
|
||||
for idx, block in ranked:
|
||||
if total <= max_bytes:
|
||||
break
|
||||
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
|
||||
total = recalculate_total(blocks)
|
||||
return messages
|
||||
```
|
||||
|
||||
前三层都是纯文本/结构操作,0 API 调用,但也无法"理解"对话内容。上下文可能仍然太大。→ L4。
|
||||
|
||||
### L4: compact_history — LLM 全量摘要
|
||||
|
||||

|
||||
|
||||
前三层全跑完了,但在超大项目中连续工作 30 分钟后,token 仍然超过阈值。
|
||||
|
||||
三步流程:
|
||||
|
||||
1. **保存 transcript**:完整对话写入 `.transcripts/`,JSONL 格式。transcript 保留了可恢复记录,但模型的活跃上下文里只剩摘要。对模型当下推理来说,细节已经不在上下文中了。教学代码没有提供 transcript 检索工具。
|
||||
2. **LLM 生成摘要**:把对话历史发给 LLM,要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。
|
||||
3. **替换消息列表**:所有旧消息被替换为一条摘要。教学版只保留摘要;真实 Claude Code 会在 compact 后重新附加部分最近文件、计划、agent/skill/tool 等上下文。
|
||||
|
||||
```python
|
||||
def compact_history(messages):
|
||||
transcript_path = write_transcript(messages) # 先保存完整对话
|
||||
summary = summarize_history(messages) # LLM 生成摘要
|
||||
return [{"role": "user",
|
||||
"content": f"[Compacted]\n\n{summary}"}]
|
||||
```
|
||||
|
||||
**熔断器**:连续失败 3 次后停止重试,防止死循环浪费 API 调用。
|
||||
|
||||
### 应急: reactive_compact
|
||||
|
||||
有时候 API 还是返回 `prompt_too_long`(413),上下文增长速度快于压缩触发速度时。
|
||||
|
||||
这时触发 **reactive_compact**:比 compact_history 更激进,从尾部回退,以字节级精度裁剪到 API 可接受的大小,只保留最后 5 条消息 + 摘要。
|
||||
|
||||
```python
|
||||
def reactive_compact(messages):
|
||||
transcript = write_transcript(messages)
|
||||
summary = summarize_history(messages)
|
||||
tail = messages[-5:]
|
||||
return [{"role": "user",
|
||||
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
|
||||
```
|
||||
|
||||
reactive compact 有重试上限(默认 1 次)。再失败就抛出异常,不无限循环。完整的错误恢复逻辑留给 s11。
|
||||
|
||||
### 合起来跑
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# 三个预处理器(0 API 调用)
|
||||
# 顺序:budget 先跑,确保大内容落盘后再做占位和裁剪
|
||||
messages[:] = tool_result_budget(messages) # L3: 大结果落盘
|
||||
messages[:] = snip_compact(messages) # L1: 裁中间
|
||||
messages[:] = micro_compact(messages) # L2: 旧结果占位
|
||||
|
||||
# 还不够?LLM 摘要(1 API 调用)
|
||||
if estimate_token_count(messages) > THRESHOLD:
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(...)
|
||||
except PromptTooLongError:
|
||||
if reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
messages[:] = reactive_compact(messages) # 应急
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise # 超过重试上限,抛出异常
|
||||
# ... 工具执行 ...
|
||||
|
||||
# compact 工具:模型主动调用时触发 compact_history
|
||||
if block.name == "compact":
|
||||
messages[:] = compact_history(messages)
|
||||
results.append({..., "content": "[Compacted. History summarized.]"})
|
||||
messages.append({"role": "user", "content": results})
|
||||
break # 结束当前 turn,用压缩后的上下文开始新一轮
|
||||
```
|
||||
|
||||
**顺序不能换。** L3(budget)在 L2(micro)前面,因为 micro 会把旧的大 tool_result 替换成一行占位符,budget 必须在那之前把完整内容落盘。这也是为什么 CC 源码把 `applyToolResultBudget` 放在最前面。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s07 的变更
|
||||
|
||||
| 组件 | 之前 (s07) | 之后 (s08) |
|
||||
|------|-----------|-----------|
|
||||
| 上下文管理 | 无(上下文无限膨胀) | 四层压缩管线 + 应急 |
|
||||
| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
|
||||
| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
|
||||
| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history |
|
||||
| 设计原则 | — | 便宜的先跑,贵的后跑 |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s08_context_compact/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(连续读多个文件,观察 L2 压缩旧结果)
|
||||
2. `Read every file in s08_context_compact/`(一次性读大量内容,观察 L3 落盘)
|
||||
3. 反复对话 20+ 轮,观察是否出现 `[auto compact]` 或 `[reactive compact]`
|
||||
|
||||
观察重点:每次工具执行后,旧 tool_result 是否被压缩?连续对话后 token 超阈值时,是否自动触发了摘要?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
上下文压缩让 Agent 能跑很久不会崩。但每次压缩后,用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent 有选择地记住重要的事?
|
||||
|
||||
s09 Memory → 三个子系统:选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` 的分析。
|
||||
|
||||
### 执行顺序对照
|
||||
|
||||
教学版为了讲解方便按 L1/L2/L3/L4 编号,但实际执行顺序和编号不完全对应:
|
||||
|
||||
| 维度 | 教学版 | Claude Code |
|
||||
|------|--------|-------------|
|
||||
| 执行顺序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto(`query.ts:379-468`) |
|
||||
| snip_compact | 保留头 3 + 尾 47 | CC 仅主线程启用;实现不在开源仓库中(`HISTORY_SNIP` feature gate),但接口可见:`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`,还暴露了 `SnipTool` 工具让模型主动调用。教学版的 3/47 是简化参数 |
|
||||
| micro_compact | 文本占位符替换 | 两条路径:time-based 直接清内容,cached 走 API `cache_edits`(legacy path 已移除) |
|
||||
| micro_compact 白名单 | 按位置(最近 3 条) | time-based 按时间阈值触发;cached 按计数触发(`microCompact.ts`) |
|
||||
| tool_result_budget | 200KB 字符 | 200,000 字符(`toolLimits.ts:49`) |
|
||||
| compact_history 阈值 | 字符数估算 | 精确 token:`contextWindow - maxOutputTokens - 13_000` |
|
||||
| 摘要要求 | 5 类信息 | 9 个部分 + `<analysis>`/`<summary>` 双标签 |
|
||||
| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 |
|
||||
| PTL retry | 有(简化) | `truncateHeadForPTLRetry()` 按消息组回退(`compact.ts:243-290`) |
|
||||
| 后压缩恢复 | 无(教学版只保留摘要) | 自动重新读取最近文件、计划、agent/skill/tool 等 |
|
||||
| 熔断器 | 3 次 | 3 次(`autoCompact.ts:70`) |
|
||||
| reactive 重试 | 1 次 | CC 有更精细的分级重试 |
|
||||
|
||||
### 执行顺序详解
|
||||
|
||||
CC 源码 `query.ts` 中的真实顺序:
|
||||
|
||||
1. `applyToolResultBudget`(L379):先处理大结果,确保完整内容落盘
|
||||
2. `snipCompact`(L403):裁中间消息
|
||||
3. `microcompact`(L414):旧结果占位
|
||||
4. `contextCollapse`(L441):独立的上下文管理系统(教学版无)
|
||||
5. `autoCompact`(L454):LLM 全量摘要
|
||||
|
||||
教学版的 budget → snip → micro 顺序与此一致。教学版没有 contextCollapse 机制。
|
||||
|
||||
### 完整常量参考
|
||||
|
||||
| 常量 | 值 | 源文件 |
|
||||
|------|-----|--------|
|
||||
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
|
||||
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
|
||||
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
|
||||
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
|
||||
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
|
||||
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
|
||||
| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts` |
|
||||
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
|
||||
|
||||
### contextCollapse 和 sessionMemoryCompact
|
||||
|
||||
CC 源码中还有两个机制本教学版没有展开:
|
||||
|
||||
- **contextCollapse**:独立的上下文管理系统,启用时抑制 proactive autocompact(`autoCompact.ts:215-222`),由 collapse 的 commit/blocking 流程接管上下文管理。但 manual `/compact` 和 reactive fallback 仍是独立路径,不受 contextCollapse 影响。
|
||||
- **sessionMemoryCompact**:compact_history 之前,CC 会先尝试用已有的 session memory(s09 会讲到)做轻量摘要,不调 LLM。这个机制等学完 s09 之后回头看会更清楚。
|
||||
|
||||
### 压缩 prompt 长什么样?
|
||||
|
||||
CC 的压缩 prompt 有两个硬性要求:
|
||||
|
||||
1. **绝对禁止调用工具**:开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`,末尾还会再 REMINDER 一次
|
||||
2. **先分析再总结**:模型需要先在 `<analysis>` 标签里理清思路,然后在 `<summary>` 标签里输出正式摘要。analysis 在格式化时被剥离
|
||||
|
||||
### 教学版的简化是刻意的
|
||||
|
||||
- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限
|
||||
- token 用字符数估算 → 精确 tokenizer 不在教学范围内
|
||||
- 后压缩恢复省略 → 教学版只保留摘要,不自动重新附加文件
|
||||
- 两个辅助机制不展开 → 属于 10% 的细节
|
||||
|
||||
核心设计思想,便宜的先跑贵的后跑,完整保留。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
469
s08_context_compact/code.py
Normal file
@@ -0,0 +1,469 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s08_context_compact.py - Context Compact
|
||||
|
||||
Four-layer compaction pipeline inserted before LLM calls:
|
||||
|
||||
L1: snip_compact — trim middle messages when count > 50
|
||||
L2: micro_compact — replace old tool_results with placeholders
|
||||
L3: tool_result_budget — persist large results to disk
|
||||
L4: compact_history — LLM full summary (1 API call)
|
||||
|
||||
Emergency: reactive_compact — when API still returns prompt_too_long
|
||||
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ messages[] │
|
||||
│ ↓ │
|
||||
│ L3 budget ─→ L1 snip ─→ L2 micro ─→ [token > threshold?] │
|
||||
│ ├─ No → LLM │
|
||||
│ └─ Yes → L4 summary │
|
||||
│ ↓ │
|
||||
│ LLM call │
|
||||
│ [prompt_too_long?] │
|
||||
│ └─ Yes → reactive │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
|
||||
Core principle: cheap first, expensive last.
|
||||
Execution order matches CC source: budget → snip → micro → auto.
|
||||
|
||||
Builds on s07 (skill loading). Usage:
|
||||
|
||||
python s08_context_compact/code.py
|
||||
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
|
||||
"""
|
||||
|
||||
import os, subprocess, json, time
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"): os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
SKILLS_DIR = WORKDIR / "skills"
|
||||
TRANSCRIPT_DIR = WORKDIR / ".transcripts"
|
||||
TOOL_RESULTS_DIR = WORKDIR / ".task_outputs" / "tool-results"
|
||||
TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True)
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
# s07: Skill catalog scan (inherited from s07)
|
||||
def _parse_frontmatter(text: str) -> tuple[dict, str]:
|
||||
if not text.startswith("---"):
|
||||
return {}, text
|
||||
parts = text.split("---", 2)
|
||||
if len(parts) < 3:
|
||||
return {}, text
|
||||
meta = {}
|
||||
for line in parts[1].strip().splitlines():
|
||||
if ":" in line:
|
||||
k, v = line.split(":", 1)
|
||||
meta[k.strip()] = v.strip().strip('"').strip("'")
|
||||
return meta, parts[2].strip()
|
||||
|
||||
SKILL_REGISTRY: dict[str, dict] = {}
|
||||
|
||||
def _scan_skills():
|
||||
if not SKILLS_DIR.exists():
|
||||
return
|
||||
for d in sorted(SKILLS_DIR.iterdir()):
|
||||
if not d.is_dir():
|
||||
continue
|
||||
manifest = d / "SKILL.md"
|
||||
if manifest.exists():
|
||||
raw = manifest.read_text()
|
||||
meta, body = _parse_frontmatter(raw)
|
||||
name = meta.get("name", d.name)
|
||||
desc = meta.get("description", raw.split("\n")[0].lstrip("#").strip())
|
||||
SKILL_REGISTRY[name] = {"name": name, "description": desc, "content": raw}
|
||||
|
||||
_scan_skills()
|
||||
|
||||
def list_skills() -> str:
|
||||
if not SKILL_REGISTRY:
|
||||
return "(no skills found)"
|
||||
return "\n".join(f"- **{s['name']}**: {s['description']}" for s in SKILL_REGISTRY.values())
|
||||
|
||||
def load_skill(name: str) -> str:
|
||||
skill = SKILL_REGISTRY.get(name)
|
||||
if not skill:
|
||||
return f"Skill not found: {name}"
|
||||
return skill["content"]
|
||||
|
||||
# s08: SYSTEM includes skill catalog (inherited from s07 build_system)
|
||||
def build_system() -> str:
|
||||
catalog = list_skills()
|
||||
return (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
f"Skills available:\n{catalog}\n"
|
||||
"Use load_skill to get full details when needed."
|
||||
)
|
||||
|
||||
SYSTEM = build_system()
|
||||
|
||||
# s08: subagent gets its own system prompt — no compact, no skill loading
|
||||
SUB_SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Complete the task you were given, then return a concise summary. "
|
||||
"Do not delegate further."
|
||||
)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02-s07 (unchanged): Basic Tools
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR): raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR, capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired: return "Error: Timeout (120s)"
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines): lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path); file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content); return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text: return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_todo_write(todos: list) -> str:
|
||||
for i, t in enumerate(todos):
|
||||
if "content" not in t or "status" not in t:
|
||||
return f"Error: todos[{i}] missing 'content' or 'status'"
|
||||
if t["status"] not in ("pending", "in_progress", "completed"):
|
||||
return f"Error: todos[{i}] has invalid status '{t['status']}'"
|
||||
tasks_file = TASKS_DIR / "current_todos.json"
|
||||
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
|
||||
lines = ["\n\033[33m## Current Tasks\033[0m"]
|
||||
for t in todos:
|
||||
icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]]
|
||||
lines.append(f" [{icon}] {t['content']}")
|
||||
print("\n".join(lines))
|
||||
return f"Updated {len(todos)} tasks"
|
||||
|
||||
def extract_text(content) -> str:
|
||||
if not isinstance(content, list): return str(content)
|
||||
return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text")
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s06-s07 (unchanged): Subagent
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
SUB_TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
]
|
||||
SUB_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob}
|
||||
|
||||
def spawn_subagent(task: str) -> str:
|
||||
print(f"\n\033[35m[Subagent spawned]\033[0m")
|
||||
messages = [{"role": "user", "content": task}]
|
||||
for _ in range(30):
|
||||
response = client.messages.create(model=MODEL, system=SUB_SYSTEM,
|
||||
messages=messages, tools=SUB_TOOLS, max_tokens=8000)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
handler = SUB_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
print(f" \033[90m[sub] {block.name}: {str(output)[:100]}\033[0m")
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
result = extract_text(messages[-1]["content"])
|
||||
if not result:
|
||||
for msg in reversed(messages):
|
||||
if msg["role"] == "assistant":
|
||||
result = extract_text(msg["content"])
|
||||
if result:
|
||||
break
|
||||
if not result:
|
||||
result = "Subagent stopped after 30 turns without final answer."
|
||||
print(f"\033[35m[Subagent done]\033[0m")
|
||||
return result
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s08: Four-Layer Compaction Pipeline
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
CONTEXT_LIMIT = 50000
|
||||
KEEP_RECENT = 3
|
||||
PERSIST_THRESHOLD = 30000
|
||||
|
||||
def estimate_size(msgs): return len(str(msgs))
|
||||
|
||||
|
||||
# L1: snipCompact — trim middle messages
|
||||
def snip_compact(messages, max_messages=50):
|
||||
if len(messages) <= max_messages: return messages
|
||||
keep_head, keep_tail = 3, max_messages - 3
|
||||
snipped = len(messages) - keep_head - keep_tail
|
||||
return messages[:keep_head] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] + messages[-keep_tail:]
|
||||
|
||||
|
||||
# L2: microCompact — old result placeholders
|
||||
def collect_tool_results(messages):
|
||||
blocks = []
|
||||
for mi, msg in enumerate(messages):
|
||||
if msg.get("role") != "user" or not isinstance(msg.get("content"), list): continue
|
||||
for bi, block in enumerate(msg["content"]):
|
||||
if isinstance(block, dict) and block.get("type") == "tool_result":
|
||||
blocks.append((mi, bi, block))
|
||||
return blocks
|
||||
|
||||
def micro_compact(messages):
|
||||
tool_results = collect_tool_results(messages)
|
||||
if len(tool_results) <= KEEP_RECENT: return messages
|
||||
for _, _, block in tool_results[:-KEEP_RECENT]:
|
||||
if len(block.get("content", "")) > 120:
|
||||
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
|
||||
return messages
|
||||
|
||||
|
||||
# L3: toolResultBudget — persist large results to disk
|
||||
def persist_large_output(tool_use_id, output):
|
||||
if len(output) <= PERSIST_THRESHOLD: return output
|
||||
TOOL_RESULTS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
path = TOOL_RESULTS_DIR / f"{tool_use_id}.txt"
|
||||
if not path.exists(): path.write_text(output)
|
||||
return f"<persisted-output>\nFull output: {path}\nPreview:\n{output[:2000]}\n</persisted-output>"
|
||||
|
||||
def tool_result_budget(messages, max_bytes=200_000):
|
||||
last = messages[-1] if messages else None
|
||||
if not last or last.get("role") != "user" or not isinstance(last.get("content"), list): return messages
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"]) if isinstance(b, dict) and b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= max_bytes: return messages
|
||||
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
|
||||
for _, block in ranked:
|
||||
if total <= max_bytes: break
|
||||
content = str(block.get("content", ""))
|
||||
if len(content) <= PERSIST_THRESHOLD: continue
|
||||
tid = block.get("tool_use_id", "unknown")
|
||||
block["content"] = persist_large_output(tid, content)
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
return messages
|
||||
|
||||
|
||||
# L4: autoCompact — LLM full summary
|
||||
def write_transcript(messages):
|
||||
TRANSCRIPT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
|
||||
with path.open("w") as f:
|
||||
for msg in messages: f.write(json.dumps(msg, default=str) + "\n")
|
||||
return path
|
||||
|
||||
def summarize_history(messages):
|
||||
conversation = json.dumps(messages, default=str)[:80000]
|
||||
prompt = ("Summarize this coding-agent conversation so work can continue.\n"
|
||||
"Preserve: 1. current goal, 2. key findings/decisions, 3. files read/changed, "
|
||||
"4. remaining work, 5. user constraints.\nBe compact but concrete.\n\n" + conversation)
|
||||
response = client.messages.create(model=MODEL, messages=[{"role": "user", "content": prompt}], max_tokens=2000)
|
||||
return "\n".join(
|
||||
getattr(block, "text", "")
|
||||
for block in response.content
|
||||
if getattr(block, "type", None) == "text").strip() or "(empty summary)"
|
||||
|
||||
def compact_history(messages):
|
||||
transcript_path = write_transcript(messages)
|
||||
print(f"[transcript saved: {transcript_path}]")
|
||||
summary = summarize_history(messages)
|
||||
return [{"role": "user", "content": f"[Compacted]\n\n{summary}"}]
|
||||
|
||||
|
||||
# Emergency: reactiveCompact — on API error
|
||||
def reactive_compact(messages):
|
||||
transcript = write_transcript(messages)
|
||||
summary = summarize_history(messages)
|
||||
return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *messages[-5:]]
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s07: Tool Definitions
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
{"name": "todo_write", "description": "Create and manage a task list for your current coding session.",
|
||||
"input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}, "required": ["content", "status"]}}}, "required": ["todos"]}},
|
||||
{"name": "task", "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.",
|
||||
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}},
|
||||
{"name": "load_skill", "description": "Load the full content of a skill by name.",
|
||||
"input_schema": {"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}},
|
||||
# s08 change: new compact tool — triggers compact_history, not a no-op
|
||||
{"name": "compact", "description": "Summarize earlier conversation to free context space.",
|
||||
"input_schema": {"type": "object", "properties": {"focus": {"type": "string"}}}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write,
|
||||
"task": spawn_subagent, "load_skill": load_skill,
|
||||
}
|
||||
|
||||
# FROM s04 (unchanged): Hooks
|
||||
HOOKS = {"PreToolUse": [], "PostToolUse": []}
|
||||
def trigger_hooks(event, *args):
|
||||
for cb in HOOKS[event]:
|
||||
r = cb(*args)
|
||||
if r is not None: return r
|
||||
return None
|
||||
|
||||
DENY_LIST = ["rm -rf /", "sudo", "shutdown"]
|
||||
def permission_hook(block):
|
||||
if block.name == "bash":
|
||||
for p in DENY_LIST:
|
||||
if p in block.input.get("command", ""): return "Permission denied"
|
||||
return None
|
||||
def log_hook(block):
|
||||
print(f"\033[90m[HOOK] {block.name}\033[0m")
|
||||
return None
|
||||
|
||||
HOOKS["PreToolUse"].append(permission_hook)
|
||||
HOOKS["PreToolUse"].append(log_hook)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — s08 core: run compaction pipeline before LLM
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
MAX_REACTIVE_RETRIES = 1 # retry limit for reactive compact
|
||||
|
||||
def agent_loop(messages: list):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# s08 change: three preprocessors (0 API calls, cheap first)
|
||||
# Order matches CC source: budget → snip → micro
|
||||
messages[:] = tool_result_budget(messages) # L3: persist large results first
|
||||
messages[:] = snip_compact(messages) # L1: trim middle
|
||||
messages[:] = micro_compact(messages) # L2: old result placeholders
|
||||
|
||||
# s08 change: tokens still over threshold → LLM summary (1 API call)
|
||||
if estimate_size(messages) > CONTEXT_LIMIT:
|
||||
print("[auto compact]")
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(model=MODEL, system=SYSTEM, messages=messages, tools=TOOLS, max_tokens=8000)
|
||||
reactive_retries = 0 # reset on successful API call
|
||||
except Exception as e:
|
||||
if ("prompt_too_long" in str(e).lower() or "too many tokens" in str(e).lower()) and reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
print("[reactive compact]")
|
||||
messages[:] = reactive_compact(messages)
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise
|
||||
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use": return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use": continue
|
||||
print(f"\033[36m> {block.name}\033[0m")
|
||||
|
||||
# s08: compact tool triggers compact_history, not a no-op string
|
||||
if block.name == "compact":
|
||||
messages[:] = compact_history(messages)
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": "[Compacted. Conversation history has been summarized.]"})
|
||||
messages.append({"role": "user", "content": results})
|
||||
break # end current turn, start fresh with compacted context
|
||||
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(blocked)})
|
||||
continue
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
print(str(output)[:200])
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(output)})
|
||||
else:
|
||||
# normal path: no compact was called
|
||||
messages.append({"role": "user", "content": results})
|
||||
continue
|
||||
# compact was called: results already appended above
|
||||
continue
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s08: Context Compact — four-layer compaction pipeline")
|
||||
print("输入问题,回车发送。输入 q 退出。\n")
|
||||
history = []
|
||||
while True:
|
||||
try: query = input("\033[36ms08 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt): break
|
||||
if query.strip().lower() in ("q", "exit", ""): break
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text": print(block.text)
|
||||
print()
|
||||
72
s08_context_compact/images/auto-compact.en.svg
Normal file
@@ -0,0 +1,72 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM Full Summary</text>
|
||||
|
||||
<!-- Trigger Condition -->
|
||||
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">Trigger Condition</text>
|
||||
<text x="140" y="70" fill="#991b1b" font-size="11">All three preprocessing layers have run, estimated tokens > contextWindow - maxOutputTokens - 13_000.</text>
|
||||
<text x="140" y="86" fill="#991b1b" font-size="10">Tries sessionMemoryCompact first (lightweight summary from existing memory), only calls LLM if insufficient.</text>
|
||||
|
||||
<!-- Steps -->
|
||||
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Step 1: Save transcript</text>
|
||||
<text x="40" y="152" fill="#475569" font-size="10">Write full conversation to .transcripts/</text>
|
||||
<text x="40" y="168" fill="#475569" font-size="10">JSONL format, one message per line</text>
|
||||
<text x="40" y="184" fill="#475569" font-size="10">Filename: transcript_{timestamp}.jsonl</text>
|
||||
<text x="40" y="200" fill="#94a3b8" font-size="9">No data lost, just moved out of active area</text>
|
||||
|
||||
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Step 2: LLM generates summary</text>
|
||||
<text x="290" y="152" fill="#475569" font-size="10">Send conversation history to LLM</text>
|
||||
<text x="290" y="166" fill="#475569" font-size="9">Summary must include 9 sections:</text>
|
||||
<text x="290" y="180" fill="#94a3b8" font-size="8">request · concepts · files · errors · resolutions</text>
|
||||
<text x="290" y="192" fill="#94a3b8" font-size="8">user messages · todos · current state · next steps</text>
|
||||
<text x="290" y="206" fill="#94a3b8" font-size="9">Generated only once</text>
|
||||
|
||||
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">Step 3: Replace message list</text>
|
||||
<text x="540" y="152" fill="#991b1b" font-size="10">All old messages → 1 summary</text>
|
||||
<text x="540" y="168" fill="#991b1b" font-size="10">Model continues from summary</text>
|
||||
<text x="540" y="184" fill="#991b1b" font-size="10">Includes recently_read file list</text>
|
||||
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ This is an irreversible operation</text>
|
||||
|
||||
<!-- Before/After comparison -->
|
||||
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">Before messages</text>
|
||||
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
|
||||
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 messages, occupying 62K tokens</text>
|
||||
|
||||
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">After messages</text>
|
||||
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] Summary: goal → create hello.py ...</text>
|
||||
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">Recent files: hello.py, README.md ...</text>
|
||||
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 message, occupying 1K tokens</text>
|
||||
|
||||
<!-- Circuit breaker -->
|
||||
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">Circuit breaker:</text>
|
||||
<text x="130" y="362" fill="#475569" font-size="10">3 consecutive autocompact failures → stop retrying. Prevents wasting API calls when context is unrecoverable.</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.7 KiB |
72
s08_context_compact/images/auto-compact.ja.svg
Normal file
@@ -0,0 +1,72 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM 完全要約</text>
|
||||
|
||||
<!-- トリガー条件 -->
|
||||
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">トリガー条件</text>
|
||||
<text x="115" y="70" fill="#991b1b" font-size="11">前 3 層の前処理を全て実行後、推定 token > contextWindow - maxOutputTokens - 13_000。</text>
|
||||
<text x="115" y="86" fill="#991b1b" font-size="10">まず sessionMemoryCompact を試行(既存のメモリで軽量要約)、不足時のみ LLM を呼び出し。</text>
|
||||
|
||||
<!-- ステップ -->
|
||||
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">ステップ 1:transcript 保存</text>
|
||||
<text x="40" y="152" fill="#475569" font-size="10">完全な対話を .transcripts/ に書き込み</text>
|
||||
<text x="40" y="168" fill="#475569" font-size="10">JSONL 形式、1 行 1 メッセージ</text>
|
||||
<text x="40" y="184" fill="#475569" font-size="10">ファイル名:transcript_{timestamp}.jsonl</text>
|
||||
<text x="40" y="200" fill="#94a3b8" font-size="9">情報は失われていない、アクティブ領域から移動のみ</text>
|
||||
|
||||
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">ステップ 2:LLM 要約生成</text>
|
||||
<text x="290" y="152" fill="#475569" font-size="10">対話履歴を LLM に送信</text>
|
||||
<text x="290" y="166" fill="#475569" font-size="9">要約は 9 つのセクションを含む:</text>
|
||||
<text x="290" y="180" fill="#94a3b8" font-size="8">リクエスト・概念・ファイル・エラー・解決</text>
|
||||
<text x="290" y="192" fill="#94a3b8" font-size="8">ユーザーメッセージ・TODO・現在・次ステップ</text>
|
||||
<text x="290" y="206" fill="#94a3b8" font-size="9">1 回のみ生成</text>
|
||||
|
||||
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">ステップ 3:メッセージリスト置換</text>
|
||||
<text x="540" y="152" fill="#991b1b" font-size="10">全旧メッセージ → 1 件の要約に</text>
|
||||
<text x="540" y="168" fill="#991b1b" font-size="10">モデルは要約から作業を継続</text>
|
||||
<text x="540" y="184" fill="#991b1b" font-size="10">recently_read ファイルリストを付与</text>
|
||||
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ これは復元不可能な操作</text>
|
||||
|
||||
<!-- 圧縮前/後 比較 -->
|
||||
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">圧縮前 messages</text>
|
||||
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
|
||||
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 件のメッセージ、62K トークンを占有</text>
|
||||
|
||||
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">圧縮後 messages</text>
|
||||
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] 要約:目標 → hello.py を作成 ...</text>
|
||||
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">最近のファイル:hello.py, README.md ...</text>
|
||||
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 件のメッセージ、1K トークンを占有</text>
|
||||
|
||||
<!-- サーキットブレーカー -->
|
||||
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">サーキットブレーカー:</text>
|
||||
<text x="145" y="362" fill="#475569" font-size="10">autocompact が連続 3 回失敗 → リトライ停止。コンテキストが復元不可能な場合の API 呼び出しの無駄な反復を防止。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.0 KiB |
72
s08_context_compact/images/auto-compact.svg
Normal file
@@ -0,0 +1,72 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM 全量摘要</text>
|
||||
|
||||
<!-- 触发条件 -->
|
||||
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">触发条件</text>
|
||||
<text x="105" y="70" fill="#991b1b" font-size="11">前三层预处理全跑完,估算 token > contextWindow - maxOutputTokens - 13_000。</text>
|
||||
<text x="105" y="86" fill="#991b1b" font-size="10">先尝试 sessionMemoryCompact(用已有记忆做轻量摘要),不足才调 LLM。</text>
|
||||
|
||||
<!-- 步骤 -->
|
||||
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">步骤 1:保存 transcript</text>
|
||||
<text x="40" y="152" fill="#475569" font-size="10">完整对话写入 .transcripts/</text>
|
||||
<text x="40" y="168" fill="#475569" font-size="10">JSONL 格式,一行一条消息</text>
|
||||
<text x="40" y="184" fill="#475569" font-size="10">文件名:transcript_{timestamp}.jsonl</text>
|
||||
<text x="40" y="200" fill="#94a3b8" font-size="9">信息没有丢失,只是移出活跃区</text>
|
||||
|
||||
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">步骤 2:LLM 生成摘要</text>
|
||||
<text x="290" y="152" fill="#475569" font-size="10">把对话历史发给 LLM</text>
|
||||
<text x="290" y="166" fill="#475569" font-size="9">摘要需包含 9 个部分:</text>
|
||||
<text x="290" y="180" fill="#94a3b8" font-size="8">请求·概念·文件·错误·解决</text>
|
||||
<text x="290" y="192" fill="#94a3b8" font-size="8">用户消息·待办·当前·下一步</text>
|
||||
<text x="290" y="206" fill="#94a3b8" font-size="9">只生成一次</text>
|
||||
|
||||
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">步骤 3:替换消息列表</text>
|
||||
<text x="540" y="152" fill="#991b1b" font-size="10">所有旧消息 → 1 条摘要</text>
|
||||
<text x="540" y="168" fill="#991b1b" font-size="10">模型从摘要继续工作</text>
|
||||
<text x="540" y="184" fill="#991b1b" font-size="10">附带 recently_read 文件列表</text>
|
||||
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ 这是无法恢复的操作</text>
|
||||
|
||||
<!-- Before/After 对比 -->
|
||||
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">压缩前 messages</text>
|
||||
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
|
||||
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 条消息,占 62K token</text>
|
||||
|
||||
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">压缩后 messages</text>
|
||||
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] 摘要:目标 → 创建 hello.py ...</text>
|
||||
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">最近文件:hello.py, README.md ...</text>
|
||||
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 条消息,占 1K token</text>
|
||||
|
||||
<!-- 熔断器 -->
|
||||
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">熔断器:</text>
|
||||
<text x="95" y="362" fill="#475569" font-size="10">连续 autocompact 失败 3 次 → 停止重试。防止上下文不可恢复时反复浪费 API 调用。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.6 KiB |
138
s08_context_compact/images/compact-overview.en.svg
Normal file
@@ -0,0 +1,138 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
|
||||
<text x="410" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Context Compact — Compression Before LLM Call, Three Trigger Modes</text>
|
||||
|
||||
<!-- Labels -->
|
||||
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 Preserved</text>
|
||||
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 New</text>
|
||||
|
||||
<!-- ===== ① messages[] ===== -->
|
||||
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 preserved)</text>
|
||||
|
||||
<!-- messages → pipeline entry -->
|
||||
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
|
||||
|
||||
<!-- ===== ② Compression Pipeline ===== -->
|
||||
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">Compression Pipeline</text>
|
||||
|
||||
<!-- ── ① Every Turn Auto ── -->
|
||||
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
|
||||
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① Every Turn · Unconditional · 0 API</text>
|
||||
|
||||
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
|
||||
|
||||
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
|
||||
|
||||
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
|
||||
|
||||
<!-- ↓ → ◇ -->
|
||||
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ◇ Decision Diamond -->
|
||||
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">Over threshold?</text>
|
||||
|
||||
<!-- No: right annotation -->
|
||||
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">No → Pass</text>
|
||||
<text x="306" y="252" fill="#94a3b8" font-size="7">Straight to LLM</text>
|
||||
|
||||
<!-- Yes: below annotation -->
|
||||
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">Yes↓</text>
|
||||
|
||||
<!-- ── ② Conditional Trigger ── -->
|
||||
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
|
||||
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② Conditional · Token Over Threshold · 1 API</text>
|
||||
|
||||
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
|
||||
|
||||
<!-- Pipeline exit → LLM -->
|
||||
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== ③ LLM ===== -->
|
||||
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM No → Return -->
|
||||
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600">No</text>
|
||||
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">Return Result</text>
|
||||
|
||||
<!-- LLM Yes → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS -->
|
||||
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
|
||||
|
||||
<!-- LLM API error → emergency compact → retry next turn -->
|
||||
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API error</text>
|
||||
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">retry to compression pipeline</text>
|
||||
|
||||
<!-- ===== ③ Emergency Trigger (after LLM API failure) ===== -->
|
||||
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ Emergency Trigger</text>
|
||||
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API returns prompt_too_long</text>
|
||||
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → retry</text>
|
||||
|
||||
<!-- ===== Loop Back ===== -->
|
||||
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">Tool results appended to messages[] → next turn → compress again → LLM</text>
|
||||
|
||||
<!-- ===== Legend ===== -->
|
||||
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
|
||||
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="94" y="414" fill="#334155" font-size="10">s07 Preserved: loop, hooks, skill loading, sub-agents</text>
|
||||
|
||||
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="94" y="436" fill="#334155" font-size="10">① Every Turn Auto: L3→L1→L2 run unconditionally before each LLM call, 0 API</text>
|
||||
|
||||
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="94" y="458" fill="#334155" font-size="10">② Conditional: after L3/L1/L2, tokens still over threshold → compact_history, 1 API</text>
|
||||
|
||||
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
|
||||
<text x="94" y="480" fill="#334155" font-size="10">③ Emergency: API returns prompt_too_long → reactive_compact → retry</text>
|
||||
|
||||
<text x="70" y="498" fill="#94a3b8" font-size="9">Three modes with increasing cost: 0 API → 1 API → 1 API + more aggressive trimming</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 9.0 KiB |
138
s08_context_compact/images/compact-overview.ja.svg
Normal file
@@ -0,0 +1,138 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
|
||||
<text x="410" y="31" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Context Compact — LLM 呼び出し前に圧縮、3 つのトリガーモード</text>
|
||||
|
||||
<!-- ラベル -->
|
||||
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 保持</text>
|
||||
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 新規</text>
|
||||
|
||||
<!-- ===== ① messages[] ===== -->
|
||||
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 保持)</text>
|
||||
|
||||
<!-- messages → パイプライン入口 -->
|
||||
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
|
||||
|
||||
<!-- ===== ② 圧縮パイプライン ===== -->
|
||||
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">圧縮パイプライン</text>
|
||||
|
||||
<!-- ── ① 毎ターン自動 ── -->
|
||||
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
|
||||
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① 毎ターン自動 · 無条件 · 0 API</text>
|
||||
|
||||
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
|
||||
|
||||
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
|
||||
|
||||
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
|
||||
|
||||
<!-- ↓ → ◇ -->
|
||||
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ◇ 判定ダイヤモンド -->
|
||||
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">閾値超過?</text>
|
||||
|
||||
<!-- いいえ:右側注釈 -->
|
||||
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">No → 通過</text>
|
||||
<text x="306" y="252" fill="#94a3b8" font-size="7">直接 LLM へ</text>
|
||||
|
||||
<!-- はい:下注釈 -->
|
||||
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">Yes↓</text>
|
||||
|
||||
<!-- ── ② 条件トリガー ── -->
|
||||
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
|
||||
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② 条件 · トークン閾値超過 · 1 API</text>
|
||||
|
||||
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
|
||||
|
||||
<!-- パイプライン出口 → LLM -->
|
||||
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== ③ LLM ===== -->
|
||||
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM No → 返却 -->
|
||||
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600">No</text>
|
||||
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
|
||||
<!-- LLM Yes → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS -->
|
||||
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
|
||||
|
||||
<!-- LLM API 例外 → 緊急圧縮 → 次ターンで再試行 -->
|
||||
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API 例外</text>
|
||||
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">圧縮パイプラインへ再試行</text>
|
||||
|
||||
<!-- ===== ③ 緊急トリガー(LLM API 失敗後) ===== -->
|
||||
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ 緊急トリガー</text>
|
||||
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API が prompt_too_long を返す</text>
|
||||
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → リトライ</text>
|
||||
|
||||
<!-- ===== ループバック ===== -->
|
||||
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">ツール結果を messages[] に追加 → 次ターン → 再圧縮 → LLM</text>
|
||||
|
||||
<!-- ===== 凡例 ===== -->
|
||||
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
|
||||
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="94" y="414" fill="#334155" font-size="10">s07 保持:ループ、フック、スキルロード、サブエージェント</text>
|
||||
|
||||
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="94" y="436" fill="#334155" font-size="10">① 毎ターン自動:L3→L1→L2 が各 LLM 呼び出し前に無条件実行、0 API</text>
|
||||
|
||||
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="94" y="458" fill="#334155" font-size="10">② 条件トリガー:L3/L1/L2 後もトークン超過 → compact_history、1 API</text>
|
||||
|
||||
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
|
||||
<text x="94" y="480" fill="#334155" font-size="10">③ 緊急トリガー:API が prompt_too_long を返す → reactive_compact → リトライ</text>
|
||||
|
||||
<text x="70" y="498" fill="#94a3b8" font-size="9">3 つのモードはコスト増加:0 API → 1 API → 1 API + より積極的なトリム</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 9.2 KiB |
138
s08_context_compact/images/compact-overview.svg
Normal file
@@ -0,0 +1,138 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
|
||||
<text x="410" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Context Compact — 压缩插在 LLM 调用前,三种触发模式</text>
|
||||
|
||||
<!-- 标签 -->
|
||||
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 保留</text>
|
||||
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 新增</text>
|
||||
|
||||
<!-- ===== ① messages[] ===== -->
|
||||
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 保留)</text>
|
||||
|
||||
<!-- messages → 管线入口 -->
|
||||
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
|
||||
|
||||
<!-- ===== ② 压缩管线(内部只放标签,不画路径线) ===== -->
|
||||
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">压缩管线</text>
|
||||
|
||||
<!-- ── ① 每轮自动 ── -->
|
||||
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
|
||||
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① 每轮自动 · 无条件 · 0 API</text>
|
||||
|
||||
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
|
||||
|
||||
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
|
||||
|
||||
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
|
||||
|
||||
<!-- ↓ → ◇ -->
|
||||
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ◇ 判断菱形(紧凑) -->
|
||||
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">超阈值?</text>
|
||||
|
||||
<!-- 否:右侧文字标注 -->
|
||||
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">否 → 通过</text>
|
||||
<text x="306" y="252" fill="#94a3b8" font-size="7">直接进 LLM</text>
|
||||
|
||||
<!-- 是:下方文字标注 -->
|
||||
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">是↓</text>
|
||||
|
||||
<!-- ── ② 条件触发 ── -->
|
||||
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
|
||||
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② 条件触发 · token 超阈值 · 1 API</text>
|
||||
|
||||
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
|
||||
|
||||
<!-- 管线出口 → LLM -->
|
||||
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== ③ LLM ===== -->
|
||||
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM 否 → 返回 -->
|
||||
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600">否</text>
|
||||
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
|
||||
<!-- LLM 是 → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600">是</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS -->
|
||||
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
|
||||
|
||||
<!-- LLM API 异常 → 应急压缩 → 下一轮重试 -->
|
||||
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API 异常</text>
|
||||
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">重试回到压缩管线</text>
|
||||
|
||||
<!-- ===== ③ 异常触发(LLM API 调用失败后) ===== -->
|
||||
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ 异常触发</text>
|
||||
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API 返回 prompt_too_long</text>
|
||||
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → 重试</text>
|
||||
|
||||
<!-- ===== 回环(y=348 在管线框底 y=334 下方,完全不穿过) ===== -->
|
||||
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">工具结果追加到 messages[] → 下一轮 → 再次压缩 → LLM</text>
|
||||
|
||||
<!-- ===== 图例 ===== -->
|
||||
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
|
||||
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="94" y="414" fill="#334155" font-size="10">s07 保留:循环、hook、技能加载、子 Agent</text>
|
||||
|
||||
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="94" y="436" fill="#334155" font-size="10">① 每轮自动:L3→L1→L2 在每次 LLM 调用前无条件执行,0 API</text>
|
||||
|
||||
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="94" y="458" fill="#334155" font-size="10">② 条件触发:L3/L1/L2 跑完 token 仍超阈值 → compact_history,1 API</text>
|
||||
|
||||
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
|
||||
<text x="94" y="480" fill="#334155" font-size="10">③ 异常触发:API 返回 prompt_too_long → reactive_compact → 重试</text>
|
||||
|
||||
<text x="70" y="498" fill="#94a3b8" font-size="9">三种模式的代价递增:0 API → 1 API → 1 API + 更激进的裁剪</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 9.0 KiB |
98
s08_context_compact/images/compaction-layers.en.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
|
||||
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title bar -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Context Compaction — Pre-processing Pipeline + Auto-compact + Emergency Fallback</text>
|
||||
|
||||
<!-- Design principles (left) -->
|
||||
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Design Principles</text>
|
||||
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">Cheap operations first, expensive later</text>
|
||||
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">Trim text before dropping messages</text>
|
||||
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">Drop messages before calling LLM</text>
|
||||
|
||||
<!-- Cost escalation (right) -->
|
||||
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Increasing Cost</text>
|
||||
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">Text ops → LLM summary → Emergency trim</text>
|
||||
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
|
||||
|
||||
<!-- ===== Pre-processing pipeline title ===== -->
|
||||
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">Pre-processing Pipeline (execution order: L3 → L1 → L2, before every LLM call, 0 API)</text>
|
||||
|
||||
<!-- L3: toolResultBudget -->
|
||||
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
|
||||
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
|
||||
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result total > 200KB → spill largest item</text>
|
||||
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">keep full content</text>
|
||||
<text x="135" y="218" fill="#2563eb" font-size="9">Trigger: every turn, before microCompact can replace full content</text>
|
||||
|
||||
<!-- Arrow L3→L1 -->
|
||||
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L1: snipCompact -->
|
||||
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
|
||||
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
|
||||
<text x="260" y="260" fill="#1e40af" font-size="11">messages > 50 → trim middle</text>
|
||||
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">keep head/tail</text>
|
||||
<text x="135" y="278" fill="#2563eb" font-size="9">Trigger: message count exceeds threshold</text>
|
||||
|
||||
<!-- Arrow L1→L2 -->
|
||||
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L2: microCompact -->
|
||||
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
|
||||
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
|
||||
<text x="260" y="320" fill="#1e40af" font-size="11">old tool_result → placeholder (keep latest 3)</text>
|
||||
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">compact old</text>
|
||||
<text x="135" y="338" fill="#2563eb" font-size="9">Trigger: every turn automatically; tutorial uses text placeholder</text>
|
||||
|
||||
<!-- ===== Auto-compact title ===== -->
|
||||
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">Auto-compact Decision (triggered when pre-processing is insufficient, 1 API call)</text>
|
||||
|
||||
<!-- L4: autoCompact -->
|
||||
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
|
||||
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
|
||||
<text x="260" y="412" fill="#991b1b" font-size="11">tokens over threshold → LLM summary</text>
|
||||
<text x="650" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API call</text>
|
||||
<text x="135" y="428" fill="#dc2626" font-size="9">Threshold: contextWindow - maxOutputTokens - 13,000 · Try sessionMemoryCompact first, then LLM</text>
|
||||
<text x="135" y="442" fill="#dc2626" font-size="9">Circuit breaker: stop retrying after 3 consecutive failures</text>
|
||||
|
||||
<!-- ===== Emergency fallback title ===== -->
|
||||
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">Emergency Fallback (triggered when API still returns prompt_too_long)</text>
|
||||
|
||||
<!-- Emergency: reactiveCompact -->
|
||||
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
|
||||
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">Emrg</text>
|
||||
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
|
||||
<text x="135" y="528" fill="#9a3412" font-size="10">API returns 413 / prompt_too_long → byte-level trim</text>
|
||||
<text x="135" y="544" fill="#c2410c" font-size="9">Keep last 5 + summary; more aggressive than autoCompact</text>
|
||||
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.7 KiB |
98
s08_context_compact/images/compaction-layers.ja.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
|
||||
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトルバー -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">コンテキスト圧縮 — 前処理パイプライン + 自動圧縮 + 緊急フォールバック</text>
|
||||
|
||||
<!-- 設計原則(左側) -->
|
||||
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">設計原則</text>
|
||||
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">安価な処理を先に、高価な処理を後に</text>
|
||||
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">テキスト修正 → メッセージ削除の順</text>
|
||||
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">メッセージ削除 → LLM 呼び出しの順</text>
|
||||
|
||||
<!-- コスト増加(右側) -->
|
||||
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">コスト増加</text>
|
||||
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">テキスト操作 → LLM 要約 → 緊急トリム</text>
|
||||
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
|
||||
|
||||
<!-- ===== 前処理パイプラインタイトル ===== -->
|
||||
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">前処理パイプライン(実行順:L3 → L1 → L2、各 LLM 呼び出し前に自動実行、0 API)</text>
|
||||
|
||||
<!-- L3: toolResultBudget -->
|
||||
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
|
||||
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
|
||||
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result 合計 > 200KB → 最大項目を退避</text>
|
||||
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">完全内容を保持</text>
|
||||
<text x="135" y="218" fill="#2563eb" font-size="9">トリガー:毎ターン、microCompact が完全内容を置換する前に実行</text>
|
||||
|
||||
<!-- 矢印 L3→L1 -->
|
||||
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L1: snipCompact -->
|
||||
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
|
||||
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
|
||||
<text x="260" y="260" fill="#1e40af" font-size="11">メッセージ > 50 → 中間をトリム</text>
|
||||
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">先頭/末尾保持</text>
|
||||
<text x="135" y="278" fill="#2563eb" font-size="9">トリガー:メッセージ数が閾値を超過</text>
|
||||
|
||||
<!-- 矢印 L1→L2 -->
|
||||
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L2: microCompact -->
|
||||
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
|
||||
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
|
||||
<text x="260" y="320" fill="#1e40af" font-size="11">古い tool_result → プレースホルダー(最新 3 件保持)</text>
|
||||
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">旧結果を圧縮</text>
|
||||
<text x="135" y="338" fill="#2563eb" font-size="9">トリガー:毎ターン自動実行、チュートリアル版はテキストプレースホルダーで模擬</text>
|
||||
|
||||
<!-- ===== 自動圧縮タイトル ===== -->
|
||||
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">自動圧縮判定(前処理で不足時にトリガー、1 API 呼び出し)</text>
|
||||
|
||||
<!-- L4: autoCompact -->
|
||||
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
|
||||
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
|
||||
<text x="260" y="412" fill="#991b1b" font-size="11">トークンが閾値超過 → LLM 全量要約</text>
|
||||
<text x="590" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API 呼び出し</text>
|
||||
<text x="135" y="428" fill="#dc2626" font-size="9">閾値: contextWindow - maxOutputTokens - 13,000 · sessionMemoryCompact を先に試行、不足時のみ LLM 呼び出し</text>
|
||||
<text x="135" y="442" fill="#dc2626" font-size="9">サーキットブレーカー:連続 3 回失敗後にリトライ停止</text>
|
||||
|
||||
<!-- ===== 緊急フォールバックタイトル ===== -->
|
||||
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">緊急フォールバック(API が引き続き prompt_too_long を返す場合にトリガー)</text>
|
||||
|
||||
<!-- 緊急: reactiveCompact -->
|
||||
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
|
||||
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">緊急</text>
|
||||
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
|
||||
<text x="135" y="528" fill="#9a3412" font-size="10">API が 413 / prompt_too_long を返す → バイト単位でトリム</text>
|
||||
<text x="135" y="544" fill="#c2410c" font-size="9">最後の 5 件 + 要約を保持、autoCompact より積極的</text>
|
||||
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.1 KiB |
98
s08_context_compact/images/compaction-layers.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
|
||||
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题栏 -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">上下文压缩 — 预处理管线 + 自动压缩 + 应急兜底</text>
|
||||
|
||||
<!-- 左侧说明 -->
|
||||
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">设计原则</text>
|
||||
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">便宜的先跑,贵的后跑</text>
|
||||
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">能改文本 → 不删整条</text>
|
||||
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">能删整条 → 不调 LLM</text>
|
||||
|
||||
<!-- 右侧代价箭头 -->
|
||||
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">代价递增</text>
|
||||
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">文本操作 → LLM 摘要 → 应急裁剪</text>
|
||||
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
|
||||
|
||||
<!-- ===== 预处理管线标题 ===== -->
|
||||
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">预处理管线(执行顺序:L3 → L1 → L2,每轮 LLM 调用前自动执行,0 API)</text>
|
||||
|
||||
<!-- L3: toolResultBudget -->
|
||||
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
|
||||
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
|
||||
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result 总和 > 200KB → 最大项落盘</text>
|
||||
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">保留完整内容</text>
|
||||
<text x="135" y="218" fill="#2563eb" font-size="9">触发:每轮自动,必须在 microCompact 之前保留完整内容</text>
|
||||
|
||||
<!-- 箭头 L3→L1 -->
|
||||
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L1: snipCompact -->
|
||||
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
|
||||
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
|
||||
<text x="260" y="260" fill="#1e40af" font-size="11">消息 > 50 条 → 裁掉中间</text>
|
||||
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">保留头尾</text>
|
||||
<text x="135" y="278" fill="#2563eb" font-size="9">触发:消息数超过阈值</text>
|
||||
|
||||
<!-- 箭头 L1→L2 -->
|
||||
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L2: microCompact -->
|
||||
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
|
||||
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
|
||||
<text x="260" y="320" fill="#1e40af" font-size="11">旧 tool_result → 占位符(保留最近 3 条)</text>
|
||||
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">压旧结果</text>
|
||||
<text x="135" y="338" fill="#2563eb" font-size="9">触发:每轮自动,教学版用文本占位符模拟</text>
|
||||
|
||||
<!-- ===== 自动压缩标题 ===== -->
|
||||
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">自动压缩决策(预处理不够时触发,1 API 调用)</text>
|
||||
|
||||
<!-- L4: autoCompact -->
|
||||
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
|
||||
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
|
||||
<text x="260" y="412" fill="#991b1b" font-size="11">token 超阈值 → LLM 全量摘要</text>
|
||||
<text x="590" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API 调用</text>
|
||||
<text x="135" y="428" fill="#dc2626" font-size="9">阈值: contextWindow - maxOutputTokens - 13,000 · 先尝试 sessionMemoryCompact,不够才调 LLM</text>
|
||||
<text x="135" y="442" fill="#dc2626" font-size="9">熔断:连续失败 3 次后停止重试</text>
|
||||
|
||||
<!-- ===== 应急兜底标题 ===== -->
|
||||
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">应急兜底(API 仍然返回 prompt_too_long 时触发)</text>
|
||||
|
||||
<!-- 应急: reactiveCompact -->
|
||||
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
|
||||
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">应急</text>
|
||||
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
|
||||
<text x="135" y="528" fill="#9a3412" font-size="10">API 返回 413 / prompt_too_long → 字节级裁剪</text>
|
||||
<text x="135" y="544" fill="#c2410c" font-size="9">保留最后 5 条 + 摘要,比 autoCompact 更激进</text>
|
||||
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.6 KiB |
50
s08_context_compact/images/layer1-budget.en.svg
Normal file
@@ -0,0 +1,50 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — Large Result Persistence</text>
|
||||
|
||||
<!-- Pain Point -->
|
||||
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">Pain Point</text>
|
||||
<text x="105" y="72" fill="#991b1b" font-size="11">Model read 30 files in one turn; total tool_result adds up to 500KB, filling the entire context window</text>
|
||||
|
||||
<!-- Before -->
|
||||
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">Before</text>
|
||||
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
|
||||
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
|
||||
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
|
||||
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">Total 510KB → over budget</text>
|
||||
|
||||
<!-- Arrow -->
|
||||
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- After -->
|
||||
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">After</text>
|
||||
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: <persisted-output></text>
|
||||
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
|
||||
<text x="395" y="178" fill="#166534" font-size="9">Preview: (first 2000 chars) ...</text>
|
||||
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">Total 18KB → normal</text>
|
||||
|
||||
<!-- How it works -->
|
||||
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">How</text>
|
||||
<text x="70" y="234" fill="#475569" font-size="10">1. Sum the size of all tool_result in the latest turn</text>
|
||||
<text x="70" y="250" fill="#475569" font-size="10">2. Over 200KB → sort by size, persist the largest to .task_outputs/tool-results/</text>
|
||||
<text x="70" y="266" fill="#475569" font-size="10">3. Keep only <persisted-output> marker + first 2000 chars preview in context</text>
|
||||
|
||||
<!-- Result summary -->
|
||||
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="35" y="312" fill="#166534" font-size="11">Result: No data lost (full data on disk), context drops from 510KB to ~18KB, 0 API calls</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 3.5 KiB |
50
s08_context_compact/images/layer1-budget.ja.svg
Normal file
@@ -0,0 +1,50 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — 大結果の永続化</text>
|
||||
|
||||
<!-- ペインポイント -->
|
||||
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">ペインポイント</text>
|
||||
<text x="100" y="72" fill="#991b1b" font-size="11">モデルが一度に 30 ファイルを読み込み、単一ターンの tool_result が合計 500KB に達し、コンテキストウィンドウを圧迫</text>
|
||||
|
||||
<!-- 圧縮前 -->
|
||||
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">圧縮前</text>
|
||||
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
|
||||
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
|
||||
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
|
||||
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">合計 510KB → 予算超過</text>
|
||||
|
||||
<!-- 矢印 -->
|
||||
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 圧縮後 -->
|
||||
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">圧縮後</text>
|
||||
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: <persisted-output></text>
|
||||
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
|
||||
<text x="395" y="178" fill="#166534" font-size="9">Preview: (先頭 2000 文字) ...</text>
|
||||
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">合計 18KB → 正常</text>
|
||||
|
||||
<!-- 原理説明 -->
|
||||
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">方法</text>
|
||||
<text x="70" y="234" fill="#475569" font-size="10">1. 最終ターンの全 tool_result の合計サイズを集計</text>
|
||||
<text x="70" y="250" fill="#475569" font-size="10">2. 200KB 超過 → サイズ順にソートし、最大のものから .task_outputs/tool-results/ に永続化</text>
|
||||
<text x="70" y="266" fill="#475569" font-size="10">3. コンテキストには <persisted-output> マーカー + 先頭 2000 文字のプレビューのみ残す</text>
|
||||
|
||||
<!-- 変更サマリー -->
|
||||
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="35" y="312" fill="#166534" font-size="11">結果:情報は失われていない(ディスクに完全なデータあり)、コンテキストは 510KB → ~18KB に削減、0 回 API 呼び出し</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 3.7 KiB |
50
s08_context_compact/images/layer1-budget.svg
Normal file
@@ -0,0 +1,50 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — 大结果落盘</text>
|
||||
|
||||
<!-- 痛点 -->
|
||||
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">痛点</text>
|
||||
<text x="75" y="72" fill="#991b1b" font-size="11">模型一次读了 30 个文件,单轮 tool_result 加起来 500KB,直接把上下文窗口打满</text>
|
||||
|
||||
<!-- Before -->
|
||||
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">压缩前</text>
|
||||
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
|
||||
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
|
||||
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
|
||||
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">合计 510KB → 超预算</text>
|
||||
|
||||
<!-- Arrow -->
|
||||
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- After -->
|
||||
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">压缩后</text>
|
||||
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: <persisted-output></text>
|
||||
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
|
||||
<text x="395" y="178" fill="#166534" font-size="9">Preview: (前 2000 字符) ...</text>
|
||||
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">合计 18KB → 正常</text>
|
||||
|
||||
<!-- 原理说明 -->
|
||||
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">怎么做</text>
|
||||
<text x="85" y="234" fill="#475569" font-size="10">1. 统计最后一轮所有 tool_result 的总大小</text>
|
||||
<text x="85" y="250" fill="#475569" font-size="10">2. 超过 200KB → 按大小排序,从最大的开始落盘到 .task_outputs/tool-results/</text>
|
||||
<text x="85" y="266" fill="#475569" font-size="10">3. 上下文里只留 <persisted-output> 标记 + 前 2000 字符预览</text>
|
||||
|
||||
<!-- 变化摘要 -->
|
||||
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="35" y="312" fill="#166534" font-size="11">结果:信息没丢(磁盘有完整数据),上下文从 510KB 降到 ~18KB,0 次 API 调用</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 3.5 KiB |
57
s08_context_compact/images/micro-compact.en.svg
Normal file
@@ -0,0 +1,57 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — Old Result Placeholder Replacement</text>
|
||||
|
||||
<!-- Pain Point -->
|
||||
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">Pain Point</text>
|
||||
<text x="110" y="70" fill="#991b1b" font-size="11">Agent read 10 files in a row; the full content of reads 1-7 is still sitting in context, taking space but no longer useful</text>
|
||||
|
||||
<!-- Before -->
|
||||
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">Before (all 10 tool_result complete)</text>
|
||||
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (full content, 3200 chars)...</text>
|
||||
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (full content, 1800 chars)...</text>
|
||||
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (full content, 4500 chars)...</text>
|
||||
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (full content, 2800 chars)</text>
|
||||
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 old results waste ~25K chars</text>
|
||||
|
||||
<!-- Arrow -->
|
||||
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- After -->
|
||||
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">After (keep only latest 3 complete)</text>
|
||||
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
|
||||
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (full content, 2800 chars)</text>
|
||||
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">Keep only latest 3; first 7 become placeholders</text>
|
||||
|
||||
<!-- How -->
|
||||
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">How (teaching version)</text>
|
||||
<text x="155" y="248" fill="#475569" font-size="10">Iterate through tool_result, keep only latest 3 complete, replace older ones with placeholders.</text>
|
||||
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">Real CC</text>
|
||||
<text x="95" y="264" fill="#475569" font-size="10">Clears old results via API cache_edits (without breaking prompt cache prefix), only for COMPACTABLE_TOOLS:</text>
|
||||
<text x="95" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write. Teaching version uses text placeholders to simulate the same effect.</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.4 KiB |
57
s08_context_compact/images/micro-compact.ja.svg
Normal file
@@ -0,0 +1,57 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — 旧結果のプレースホルダー置換</text>
|
||||
|
||||
<!-- ペインポイント -->
|
||||
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">ペインポイント</text>
|
||||
<text x="115" y="70" fill="#991b1b" font-size="11">Agent が連続で 10 ファイルを読み込み、1〜7 回目の完全なファイル内容がコンテキストに残ったまま、場所を占有しつつ既に不要</text>
|
||||
|
||||
<!-- 圧縮前 -->
|
||||
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">圧縮前(10 件の tool_result がすべて完全)</text>
|
||||
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (完全な内容, 3200 文字)...</text>
|
||||
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (完全な内容, 1800 文字)...</text>
|
||||
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (完全な内容, 4500 文字)...</text>
|
||||
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完全な内容, 2800 文字)</text>
|
||||
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 件の旧結果が ~25K 文字を無駄に占有</text>
|
||||
|
||||
<!-- 矢印 -->
|
||||
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 圧縮後 -->
|
||||
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">圧縮後(最新 3 件のみ完全保持)</text>
|
||||
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
|
||||
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完全な内容, 2800 文字)</text>
|
||||
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">最新 3 件のみ保持、前 7 件はプレースホルダー化</text>
|
||||
|
||||
<!-- 原理 -->
|
||||
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">方法(教学版)</text>
|
||||
<text x="130" y="248" fill="#475569" font-size="10">tool_result を走査し、最新 3 件のみ完全保持、古いものはプレースホルダーに置換。</text>
|
||||
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">実際の CC</text>
|
||||
<text x="110" y="264" fill="#475569" font-size="10">API cache_edits で旧結果をクリア(prompt cache プレフィックスを破壊しない)、COMPACTABLE_TOOLS のみ対象:</text>
|
||||
<text x="110" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write。教学版はテキストプレースホルダーで同様の効果を模擬。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.7 KiB |
57
s08_context_compact/images/micro-compact.svg
Normal file
@@ -0,0 +1,57 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — 旧结果占位替换</text>
|
||||
|
||||
<!-- 痛点 -->
|
||||
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">痛点</text>
|
||||
<text x="75" y="70" fill="#991b1b" font-size="11">Agent 连续读了 10 个文件,第 1-7 次的完整文件内容还躺在上下文里,占着位置但早就没用了</text>
|
||||
|
||||
<!-- Before -->
|
||||
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">压缩前(10 条 tool_result 全部完整)</text>
|
||||
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (完整内容, 3200 字符)...</text>
|
||||
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (完整内容, 1800 字符)...</text>
|
||||
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (完整内容, 4500 字符)...</text>
|
||||
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完整内容, 2800 字符)</text>
|
||||
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 条旧结果白占 ~25K 字符</text>
|
||||
|
||||
<!-- Arrow -->
|
||||
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- After -->
|
||||
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">压缩后(只保留最近 3 条完整)</text>
|
||||
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
|
||||
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完整内容, 2800 字符)</text>
|
||||
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">只保留最近 3 条,前 7 条变占位</text>
|
||||
|
||||
<!-- 原理 -->
|
||||
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">怎么做(教学版)</text>
|
||||
<text x="115" y="248" fill="#475569" font-size="10">遍历 tool_result,只保留最近 3 条完整,更旧的替换为占位符。</text>
|
||||
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">真实 CC</text>
|
||||
<text x="95" y="264" fill="#475569" font-size="10">通过 API cache_edits 清除旧结果(不破坏 prompt cache 前缀),仅对 COMPACTABLE_TOOLS 生效:</text>
|
||||
<text x="95" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write。教学版用文本占位模拟同样效果。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.4 KiB |