Follow up PR #265: refine chapters, diagrams, and add S20 (#283)

* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building incrementally on the previous. Key fixes across chapters: - s01-s04: agent loop, tool dispatch, permission pipeline, hooks - s05-s08: todo write, subagent, skill loading, context compact - s09-s11: memory system, system prompt assembly, error recovery - s12-s14: task graph, background tasks, cron scheduler All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS, json.dumps cache, real-state context, can_start dep protection, etc.). * feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform chapters. Each chapter inherits all previous fixes and adds one mechanism: - s15: agent teams (TeamCreate, teammate threads, shared task list) - s16: team protocols (plan approval, shutdown handshake, consume_inbox) - s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox) - s18: worktree isolation (git worktree, bind_task, cwd switching, safety) - s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache) All appendix source code references verified against CC source. Config priority corrected: claude.ai < plugin < user < project < local. * fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash - s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02) - s06-s08: todo_write validates content/status required fields (inherited from s05) - s09: extract_memories uses pre-compression snapshot instead of compacted messages - s16: submit_plan docstring clarifies protocol-only (not code-level gate) - s17-s19: match_response restores type mismatch validation (from s16) - s17-s19: claim_task deps list handles missing dep files without crashing * fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation - s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task, non-interactive/SDK defaults to TodoWrite. Fix env var name to CLAUDE_CODE_ENABLE_TASKS (not TODO_V2). - s14/s15: add _validate_cron_field with per-field range checks (minute 0-59, hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi. Replace old try/except validation that only caught exceptions. - s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree, not just create_worktree. * fix: align s16-s19 teaching tool consistency * fix pr265 chapter diagrams * Add comprehensive s20 harness chapter * Fix chapter smoke test regressions * Clarify README tutorial track transition --------- Co-authored-by: Haoran <bill-billion@outlook.com>
2026-06-21 04:33:36 +08:00 · 2026-05-20 21:45:38 +08:00
parent c354cf7721
commit 1baf1aca5a
174 changed files with 35833 additions and 353 deletions
--- a/s08_context_compact/README.md
+++ b/s08_context_compact/README.md
@@ -0,0 +1,293 @@
+# s08: Context Compact — 上下文总会满，要有办法腾地方
+
+[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
+
+s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
+> *"上下文总会满, 要有办法腾地方"* — 四层压缩策略, 便宜的先跑贵的后跑。
+>
+> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。
+
+---
+
+## 问题
+
+Agent 跑着跑着，不动了。
+
+手里有 bash、有 read、有 write，能力是够的。但它读了一个 1000 行的文件（~4000 token），又读了 30 个文件，跑了 20 条命令。每条命令的输出、每个文件的内容，全都堆在 `messages` 列表里。
+
+上下文窗口是有限的。满了之后，API 直接拒绝：`prompt_too_long`。
+
+不压缩，Agent 根本没法在大项目里干活。
+
+---
+
+## 解决方案
+
+![Compact Overview](images/compact-overview.svg)
+
+保留 s07 的 hook 结构、技能加载、子 Agent 等骨架，省略部分工具细节以聚焦压缩。核心变动：每轮 LLM 调用前插入三层预处理器（0 API），token 仍超阈值时触发 LLM 摘要（1 API），API 报错时应急裁剪。
+
+核心设计：便宜的先跑，贵的后跑。
+
+---
+
+## 工作原理
+
+![四层压缩管线](images/compaction-layers.svg)
+
+### L1: snip_compact — 裁掉无关的旧对话
+
+Agent 跑了 80 轮对话，`messages` 攒了 160 条。最前面的"帮我创建 hello.py"和当前工作几乎无关了，但全占着位置。
+
+消息数超过 50 条 → 保留头部 3 条（初始上下文）和尾部 47 条（当前工作），中间裁掉：
+
+```python
+def snip_compact(messages, max_messages=50):
+    if len(messages) <= max_messages:
+        return messages
+    keep_head, keep_tail = 3, max_messages - 3
+    snipped = len(messages) - keep_head - keep_tail
+    placeholder = {"role": "user",
+                   "content": f"[snipped {snipped} messages from conversation middle]"}
+    return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
+```
+
+裁掉了整条消息，但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。
+
+### L2: micro_compact — 旧工具结果占位
+
+![旧结果占位](images/micro-compact.svg)
+
+Agent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里，早就不需要了，但占着大量空间。
+
+只保留最近 3 条 `tool_result` 的完整内容，更旧的替换为一行占位符：
+
+```python
+KEEP_RECENT_TOOL_RESULTS = 3
+
+def micro_compact(messages):
+    tool_results = collect_tool_result_blocks(messages)
+    if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
+        return messages
+    for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
+        if len(block.get("content", "")) > 120:
+            block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
+    return messages
+```
+
+旧结果清掉了，但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。
+
+### L3: tool_result_budget — 大结果落盘
+
+![大结果落盘](images/layer1-budget.svg)
+
+模型一次读了 5 个大文件，单条 user 消息里所有 `tool_result` 加起来 500KB。
+
+统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序，从最大的开始落盘到 `.task_outputs/tool-results/`，上下文里只留 `<persisted-output>` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上，需要时可以重新读。
+
+```python
+def tool_result_budget(messages, max_bytes=200_000):
+    last = messages[-1]
+    blocks = [(i, b) for i, b in enumerate(last["content"])
+              if b.get("type") == "tool_result"]
+    total = sum(len(str(b.get("content", ""))) for _, b in blocks)
+    if total <= max_bytes:
+        return messages
+    ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
+    for idx, block in ranked:
+        if total <= max_bytes:
+            break
+        block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
+        total = recalculate_total(blocks)
+    return messages
+```
+
+前三层都是纯文本/结构操作，0 API 调用，但也无法"理解"对话内容。上下文可能仍然太大。→ L4。
+
+### L4: compact_history — LLM 全量摘要
+
+![LLM 全量摘要](images/auto-compact.svg)
+
+前三层全跑完了，但在超大项目中连续工作 30 分钟后，token 仍然超过阈值。
+
+三步流程：
+
+1. **保存 transcript**：完整对话写入 `.transcripts/`，JSONL 格式。transcript 保留了可恢复记录，但模型的活跃上下文里只剩摘要。对模型当下推理来说，细节已经不在上下文中了。教学代码没有提供 transcript 检索工具。
+2. **LLM 生成摘要**：把对话历史发给 LLM，要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。
+3. **替换消息列表**：所有旧消息被替换为一条摘要。教学版只保留摘要；真实 Claude Code 会在 compact 后重新附加部分最近文件、计划、agent/skill/tool 等上下文。
+
+```python
+def compact_history(messages):
+    transcript_path = write_transcript(messages)  # 先保存完整对话
+    summary = summarize_history(messages)          # LLM 生成摘要
+    return [{"role": "user",
+             "content": f"[Compacted]\n\n{summary}"}]
+```
+
+**熔断器**：连续失败 3 次后停止重试，防止死循环浪费 API 调用。
+
+### 应急: reactive_compact
+
+有时候 API 还是返回 `prompt_too_long`（413），上下文增长速度快于压缩触发速度时。
+
+这时触发 **reactive_compact**：比 compact_history 更激进，从尾部回退，以字节级精度裁剪到 API 可接受的大小，只保留最后 5 条消息 + 摘要。
+
+```python
+def reactive_compact(messages):
+    transcript = write_transcript(messages)
+    summary = summarize_history(messages)
+    tail = messages[-5:]
+    return [{"role": "user",
+             "content": f"[Reactive compact]\n\n{summary}"}, *tail]
+```
+
+reactive compact 有重试上限（默认 1 次）。再失败就抛出异常，不无限循环。完整的错误恢复逻辑留给 s11。
+
+### 合起来跑
+
+```python
+def agent_loop(messages):
+    reactive_retries = 0
+    while True:
+        # 三个预处理器（0 API 调用）
+        # 顺序：budget 先跑，确保大内容落盘后再做占位和裁剪
+        messages[:] = tool_result_budget(messages)    # L3: 大结果落盘
+        messages[:] = snip_compact(messages)          # L1: 裁中间
+        messages[:] = micro_compact(messages)         # L2: 旧结果占位
+
+        # 还不够？LLM 摘要（1 API 调用）
+        if estimate_token_count(messages) > THRESHOLD:
+            messages[:] = compact_history(messages)
+
+        try:
+            response = client.messages.create(...)
+        except PromptTooLongError:
+            if reactive_retries < MAX_REACTIVE_RETRIES:
+                messages[:] = reactive_compact(messages)  # 应急
+                reactive_retries += 1
+                continue
+            raise  # 超过重试上限，抛出异常
+        # ... 工具执行 ...
+
+        # compact 工具：模型主动调用时触发 compact_history
+        if block.name == "compact":
+            messages[:] = compact_history(messages)
+            results.append({..., "content": "[Compacted. History summarized.]"})
+            messages.append({"role": "user", "content": results})
+            break  # 结束当前 turn，用压缩后的上下文开始新一轮
+```
+
+**顺序不能换。** L3（budget）在 L2（micro）前面，因为 micro 会把旧的大 tool_result 替换成一行占位符，budget 必须在那之前把完整内容落盘。这也是为什么 CC 源码把 `applyToolResultBudget` 放在最前面。
+
+---
+
+## 相对 s07 的变更
+
+| 组件 | 之前 (s07) | 之后 (s08) |
+|------|-----------|-----------|
+| 上下文管理 | 无（上下文无限膨胀） | 四层压缩管线 + 应急 |
+| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
+| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
+| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history |
+| 设计原则 | — | 便宜的先跑，贵的后跑 |
+
+---
+
+## 试一下
+
+```sh
+cd learn-claude-code
+python s08_context_compact/code.py
+```
+
+试试这些 prompt：
+
+1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`（连续读多个文件，观察 L2 压缩旧结果）
+2. `Read every file in s08_context_compact/`（一次性读大量内容，观察 L3 落盘）
+3. 反复对话 20+ 轮，观察是否出现 `[auto compact]` 或 `[reactive compact]`
+
+观察重点：每次工具执行后，旧 tool_result 是否被压缩？连续对话后 token 超阈值时，是否自动触发了摘要？
+
+---
+
+## 接下来
+
+上下文压缩让 Agent 能跑很久不会崩。但每次压缩后，用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent 有选择地记住重要的事？
+
+s09 Memory → 三个子系统：选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。
+
+<details>
+<summary>深入 CC 源码</summary>
+
+> 以下基于 CC 源码 `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` 的分析。
+
+### 执行顺序对照
+
+教学版为了讲解方便按 L1/L2/L3/L4 编号，但实际执行顺序和编号不完全对应：
+
+| 维度 | 教学版 | Claude Code |
+|------|--------|-------------|
+| 执行顺序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto（`query.ts:379-468`） |
+| snip_compact | 保留头 3 + 尾 47 | CC 仅主线程启用；实现不在开源仓库中（`HISTORY_SNIP` feature gate），但接口可见：`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`，还暴露了 `SnipTool` 工具让模型主动调用。教学版的 3/47 是简化参数 |
+| micro_compact | 文本占位符替换 | 两条路径：time-based 直接清内容，cached 走 API `cache_edits`（legacy path 已移除） |
+| micro_compact 白名单 | 按位置（最近 3 条） | time-based 按时间阈值触发；cached 按计数触发（`microCompact.ts`） |
+| tool_result_budget | 200KB 字符 | 200,000 字符（`toolLimits.ts:49`） |
+| compact_history 阈值 | 字符数估算 | 精确 token：`contextWindow - maxOutputTokens - 13_000` |
+| 摘要要求 | 5 类信息 | 9 个部分 + `<analysis>`/`<summary>` 双标签 |
+| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 |
+| PTL retry | 有（简化） | `truncateHeadForPTLRetry()` 按消息组回退（`compact.ts:243-290`） |
+| 后压缩恢复 | 无（教学版只保留摘要） | 自动重新读取最近文件、计划、agent/skill/tool 等 |
+| 熔断器 | 3 次 | 3 次（`autoCompact.ts:70`） |
+| reactive 重试 | 1 次 | CC 有更精细的分级重试 |
+
+### 执行顺序详解
+
+CC 源码 `query.ts` 中的真实顺序：
+
+1. `applyToolResultBudget`（L379）：先处理大结果，确保完整内容落盘
+2. `snipCompact`（L403）：裁中间消息
+3. `microcompact`（L414）：旧结果占位
+4. `contextCollapse`（L441）：独立的上下文管理系统（教学版无）
+5. `autoCompact`（L454）：LLM 全量摘要
+
+教学版的 budget → snip → micro 顺序与此一致。教学版没有 contextCollapse 机制。
+
+### 完整常量参考
+
+| 常量 | 值 | 源文件 |
+|------|-----|--------|
+| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
+| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
+| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
+| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
+| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
+| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
+| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts` |
+| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
+
+### contextCollapse 和 sessionMemoryCompact
+
+CC 源码中还有两个机制本教学版没有展开：
+
+- **contextCollapse**：独立的上下文管理系统，启用时抑制 proactive autocompact（`autoCompact.ts:215-222`），由 collapse 的 commit/blocking 流程接管上下文管理。但 manual `/compact` 和 reactive fallback 仍是独立路径，不受 contextCollapse 影响。
+- **sessionMemoryCompact**：compact_history 之前，CC 会先尝试用已有的 session memory（s09 会讲到）做轻量摘要，不调 LLM。这个机制等学完 s09 之后回头看会更清楚。
+
+### 压缩 prompt 长什么样？
+
+CC 的压缩 prompt 有两个硬性要求：
+
+1. **绝对禁止调用工具**：开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`，末尾还会再 REMINDER 一次
+2. **先分析再总结**：模型需要先在 `<analysis>` 标签里理清思路，然后在 `<summary>` 标签里输出正式摘要。analysis 在格式化时被剥离
+
+### 教学版的简化是刻意的
+
+- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限
+- token 用字符数估算 → 精确 tokenizer 不在教学范围内
+- 后压缩恢复省略 → 教学版只保留摘要，不自动重新附加文件
+- 两个辅助机制不展开 → 属于 10% 的细节
+
+核心设计思想，便宜的先跑贵的后跑，完整保留。
+
+</details>
+
+<!-- translation-sync: zh@v1, en@v1, ja@v1 -->