Follow up PR #265: refine chapters, diagrams, and add S20 (#283)

* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience

Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building
incrementally on the previous. Key fixes across chapters:

- s01-s04: agent loop, tool dispatch, permission pipeline, hooks
- s05-s08: todo write, subagent, skill loading, context compact
- s09-s11: memory system, system prompt assembly, error recovery
- s12-s14: task graph, background tasks, cron scheduler

All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS,
json.dumps cache, real-state context, can_start dep protection, etc.).

* feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools

Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform
chapters. Each chapter inherits all previous fixes and adds one mechanism:

- s15: agent teams (TeamCreate, teammate threads, shared task list)
- s16: team protocols (plan approval, shutdown handshake, consume_inbox)
- s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox)
- s18: worktree isolation (git worktree, bind_task, cwd switching, safety)
- s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache)

All appendix source code references verified against CC source. Config priority
corrected: claude.ai < plugin < user < project < local.

* fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash

- s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02)
- s06-s08: todo_write validates content/status required fields (inherited from s05)
- s09: extract_memories uses pre-compression snapshot instead of compacted messages
- s16: submit_plan docstring clarifies protocol-only (not code-level gate)
- s17-s19: match_response restores type mismatch validation (from s16)
- s17-s19: claim_task deps list handles missing dep files without crashing

* fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation

- s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task,
  non-interactive/SDK defaults to TodoWrite. Fix env var name to
  CLAUDE_CODE_ENABLE_TASKS (not TODO_V2).
- s14/s15: add _validate_cron_field with per-field range checks (minute 0-59,
  hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi.
  Replace old try/except validation that only caught exceptions.
- s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree,
  not just create_worktree.

* fix: align s16-s19 teaching tool consistency

* fix pr265 chapter diagrams

* Add comprehensive s20 harness chapter

* Fix chapter smoke test regressions

* Clarify README tutorial track transition

---------

Co-authored-by: Haoran <bill-billion@outlook.com>
This commit is contained in:
gui-yue
2026-05-20 21:45:38 +08:00
committed by GitHub
parent c354cf7721
commit 1baf1aca5a
174 changed files with 35833 additions and 353 deletions

View File

@@ -0,0 +1,277 @@
# s11: Error Recovery — Errors aren't the end, they're the start of a retry
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s20
> *"Errors aren't the end, they're the start of a retry"* — escalate tokens, compact context, switch models.
>
> **Harness layer**: Resilience — classify and recover when the main loop hits errors.
---
## The Problem
The Agent is running along and then errors out:
```
Error: 529 overloaded
```
The Agent crashes. It doesn't retry, doesn't switch models, doesn't reduce context — it just crashes.
In production, API errors are the norm. The three most common failure modes: **truncated output** (the model runs out of tokens mid-sentence), **context overflow** (still too long even after compaction), and **transient failures** (429 rate limiting / 529 overload). An Agent that doesn't handle errors is like a car that stalls at the slightest touch.
---
## Solution
![Error Recovery Overview](images/error-recovery-overview.en.svg)
The loop and prompt assembly from s10 are fully preserved. The only change: the LLM call is wrapped in try/except, with different recovery paths based on error type. After recovery, `continue` loops back to the top to call the LLM again.
The three most common recovery patterns (the teaching version only handles 429/529; real systems also cover connection errors, timeouts, cloud vendor credential caches, etc. CC actually has 13+ reason codes; see the Deep Dive for the rest):
| Pattern | Trigger | Recovery Action |
|----------|---------|-----------------|
| Output truncated | `max_tokens` | Escalate 8K→64K / continuation prompt |
| Context overflow | `prompt_too_long` | Reactive compact → retry |
| Transient failure | 429 / 529 | Exponential backoff + jitter, fallback model on consecutive 529 |
---
## How It Works
### Path 1: Output Truncated
The model runs out of tokens mid-sentence — `max_tokens` is exhausted. The default 8000 tokens isn't enough for a complete response.
On the first occurrence, escalate `max_tokens` from 8K to 64K (8x the space) and retry the same request — the truncated output is NOT appended to messages, keeping the original request intact. If 64K is still not enough, save the truncated output and inject a continuation prompt telling the model to pick up where it left off, up to 3 times:
```python
if response.stop_reason == "max_tokens":
# First escalation: don't append truncated output, retry same request
if not state.has_escalated:
max_tokens = ESCALATED_MAX_TOKENS
state.has_escalated = True
continue # messages unchanged, same request with more tokens
# 64K still truncated: save output + continuation prompt
messages.append({"role": "assistant", "content": response.content})
if state.recovery_count < MAX_RECOVERY_RETRIES:
messages.append({"role": "user", "content":
"Output token limit hit. Resume directly — "
"no apology, no recap. Pick up mid-thought."})
state.recovery_count += 1
continue
return # still truncated after 3 continuations
# Normal: append after max_tokens check
messages.append({"role": "assistant", "content": response.content})
```
Escalation gets one chance; continuation gets up to 3. After that, exit — further continuations won't produce meaningful output.
### Path 2: Context Overflow
The LLM says "your context is too long" (`prompt_too_long`). All four compaction layers from s08 have already run, and it's still over the limit.
Trigger reactive compact — more aggressive than auto compact. The teaching version keeps only the last 5 messages to simulate compaction; real CC generates a compact summary via LLM, then retries with the compacted message list. Retry after compacting. But if it's still over the limit after one compaction, the only option is to exit — compacting again won't make it any smaller:
```python
except PromptTooLongError:
if not state.has_attempted_reactive_compact:
messages[:] = reactive_compact(messages)
state.has_attempted_reactive_compact = True
continue
return # Already compacted and still over limit — must exit
```
### Path 3: Transient Failures
Network blips, 429 rate limiting, 529 overload — these aren't bugs, they're normal in distributed systems.
Both 429 and 529 use exponential backoff + jitter: wait 0.5 seconds on the first attempt, 1 second on the second, 2 seconds on the third, up to 10 retries. Random jitter prevents concurrent requests from all retrying at the same instant. Three consecutive 529 overload errors → switch to the fallback model (if `FALLBACK_MODEL_ID` environment variable is configured):
```python
def retry_delay(attempt, retry_after=None):
if retry_after:
return retry_after
base = min(500 * (2 ** attempt), 32000) / 1000
return base + random.uniform(0, base * 0.25)
def with_retry(fn, state, max_retries=10):
for attempt in range(max_retries):
try:
return fn()
except (RateLimitError, OverloadedError):
delay = retry_delay(attempt)
time.sleep(delay)
if is_overloaded:
state.consecutive_529 += 1
if state.consecutive_529 >= 3 and FALLBACK_MODEL:
state.current_model = FALLBACK_MODEL
raise MaxRetriesExceeded()
```
Backoff formula: `min(500 × 2^attempt, 32000) + random(0~25%)`. If the server returns a `Retry-After` header, that value takes priority.
### Putting It All Together
```python
def agent_loop(messages, context):
system = get_system_prompt(context)
state = RecoveryState()
max_tokens = 8000
while True:
try:
response = with_retry(
lambda: client.messages.create(
model=state.current_model, system=system,
messages=messages, tools=TOOLS,
max_tokens=max_tokens),
state)
except Exception as e:
if is_prompt_too_long_error(e):
if not state.has_attempted_reactive_compact:
messages[:] = reactive_compact(messages)
state.has_attempted_reactive_compact = True
continue
return
log_error(e)
return
# max_tokens check BEFORE appending to messages
if response.stop_reason == "max_tokens":
if not state.has_escalated:
max_tokens = 64000
state.has_escalated = True
continue # retry same request, messages unchanged
# save truncated output + continuation prompt
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
continue
# Normal completion
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return
# ... tool execution ...
```
The outer try/except catches API exceptions (prompt_too_long, etc.), `with_retry` handles transient errors (429/529), and `stop_reason` checks handle truncation. Three recovery mechanisms, each handling its own error type.
---
## Changes from s10
| Component | Before (s10) | After (s11) |
|-----------|-------------|-------------|
| Error handling | None (crashes on any error) | Three recovery patterns + exponential backoff |
| New constants | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500, FALLBACK_MODEL |
| New functions | — | with_retry, retry_delay, reactive_compact, is_prompt_too_long_error, RecoveryState |
| Tools | bash, read_file, write_file (3) | bash, read_file, write_file (3) — unchanged |
| Loop | Bare LLM call | Wrapped in try/except + continue retry |
---
## Try It
```sh
cd learn-claude-code
python s11_error_recovery/code.py
```
Try these prompts:
1. Ask the Agent to generate a very long piece of code, and observe whether it automatically continues after truncation (look for the `[max_tokens] escalating` log)
2. Read many files consecutively to bloat the context, and observe reactive compact
3. If you encounter 429/529, observe the exponential backoff log output
---
## What's Next
The Agent can now automatically recover from errors. But the tasks it handles are still one-shot — you give it a task, it finishes, it's done.
What if the Agent could manage a **task list** — with dependencies, persisted to disk, resumable across sessions? A TODO list is not a task system.
s12 Task System → Tasks form a dependency graph with state and persistence. This is the foundation for multi-Agent collaboration.
<details>
<summary>Deep Dive into CC Source</summary>
> The following is based on CC source code: `query.ts` (1729 lines), `services/api/withRetry.ts` (822 lines), `query/tokenBudget.ts` (93 lines), and `utils/tokenBudget.ts` (73 lines).
### 1. A Dozen-Plus Reason/Transition Codes (Not Just 3)
The teaching version covers 3 of the most common recovery patterns. CC actually has a dozen-plus reason/transition codes, evaluated after every LLM call:
| Reason/Transition | Teaching Version | CC Behavior |
|---|---|---|
| `completed` | Normal completion | Return result |
| `next_turn` | Normal tool call | Continue to next tool execution round |
| `max_output_tokens_escalate` | Path 1 | 8K→64K escalation |
| `max_output_tokens_recovery` | Path 1 continuation | Continuation prompt (up to 3 times) |
| `reactive_compact_retry` | Path 2 | Reactive compact → retry |
| `prompt_too_long` | Path 2 | Same as above |
| `collapse_drain_retry` | Not covered | Context collapse — commit staged content first |
| `model_error` | Not covered | Retry |
| `image_error` | Not covered | `ImageSizeError` / `ImageResizeError` handled specifically |
| `aborted_streaming` | Not covered | Streaming abort recovery |
| `aborted_tools` | Not covered | Tool abort |
| `stop_hook_blocking` | Not covered | Inject blocking error → model self-corrects |
| `stop_hook_prevented` | Not covered | Hooks prevent execution |
| `hook_stopped` | Not covered | Hook stopped execution |
| `token_budget_continuation` | Not covered | Continue when token usage < 90% |
| `blocking_limit` | Not covered | Blocking limit reached |
| `max_turns` | Not covered | Maximum turns reached |
The teaching version only expands on the first 5 (most common); each of the rest has its own dedicated handling logic.
### 2. Precise Exponential Backoff Formula
CC's backoff delay (`withRetry.ts:530-548`):
```
delay = min(500 × 2^(attempt-1), 32000) + random(0~25%)
```
| Attempt | Base Delay | + Jitter |
|---------|-----------|----------|
| 1 | 500ms | 0-125ms |
| 2 | 1000ms | 0-250ms |
| 4 | 4000ms | 0-1000ms |
| 7+ | 32000ms (cap) | 0-8000ms |
If the server returns a `Retry-After` header, that value takes priority.
### 3. Original CONTINUATION Prompt
CC's continuation prompt (`query.ts:1225-1227`):
```
Output token limit hit. Resume directly — no apology, no recap of what
you were doing. Pick up mid-thought if that is where the cut happened.
Break remaining work into smaller pieces.
```
Token budget nudge prompt (`tokenBudget.ts:72`):
```
Stopped at {pct}% of token target. Keep working — do not summarize.
```
### 4. Streaming Error Handling
In CC's streaming path, recoverable errors (413, max_tokens, media errors) are **withheld from display** during streaming (`query.ts:788-822`) — SDK consumers don't see them, only the recovery logic does. After streaming ends, the system determines whether recovery is needed.
### 5. 529 → Fallback Model Switch
After 3 consecutive 529 overload errors (`MAX_529_RETRIES = 3`), CC automatically switches to the fallback model (e.g., Opus → Sonnet). On switch, all pending messages and tool results are cleared, and the user sees "Switched to {model} due to high demand".
### 6. Diminishing Returns Detection
Token budget "continuations" aren't unlimited. When there are 3 consecutive continuations with a token increment < 500, the system determines "continuing won't produce meaningful output" and stops continuation (`tokenBudget.ts:60-62`).
</details>
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->

View File

@@ -0,0 +1,277 @@
# s11: Error Recovery — エラーは終わりではなく、リトライの始まり
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s20
> *"エラーは終わりではなく、リトライの始まり"* — トークン拡張、コンテキスト圧縮、モデル切り替え。
>
> **Harness 層**: 耐障害性 — メインループのエラーを分類し復旧。
---
## 課題
Agent が動いている途中でエラーが出た:
```
Error: 529 overloaded
```
Agent がクラッシュした。リトライもしない、モデルも切り替えない、コンテキストも減らさない——そのままクラッシュ。
本番環境では API エラーが日常茶飯事。最も一般的な 3 つの障害パターン:**出力の切り詰め**(モデルが途中まで出力して token が尽きた)、**コンテキスト超過**(圧縮後も長すぎる)、**一時的障害**429 レート制限 / 529 過負荷)。エラーを処理しない Agent は、一度触れただけで止まる車のようなものだ。
---
## 解決策
![Error Recovery Overview](images/error-recovery-overview.ja.svg)
s10 のループ、prompt 組み立てはすべてそのまま。唯一の変更点LLM 呼び出しを try/except で包み、エラータイプに応じて異なる復旧パスに振り分ける。復旧後は `continue` でループ先頭に戻り、再度 LLM を呼び出す。
最も一般的な 3 つの復旧パターン(教学版は 429/529 のみ対応実際のシステムは接続エラー、タイムアウト、クラウドベンダーの認証キャッシュ等もカバー。CC には実際 13 以上の reason code があるが、残りは Deep dive で解説):
| パターン | トリガー | 復旧アクション |
|----------|----------|---------------|
| 出力切り詰め | `max_tokens` | 8K→64K に拡張 / 続きのプロンプト注入 |
| コンテキスト超過 | `prompt_too_long` | reactive compact → リトライ |
| 一時的障害 | 429 / 529 | 指数バックオフ + ジッター、連続 529 でフォールバックモデルに切り替え可能 |
---
## 仕組み
### パス 1: 出力が切り詰められた
モデルが途中まで出力して、`max_tokens` に達した。デフォルトの 8000 token では完全な回答を出力しきれない。
初回発生時、`max_tokens` を 8K から 64K に拡張8 倍の空間)し、同じリクエストをリトライする——この時、切り詰められた出力は messages に追加せず、元のリクエストをそのまま維持する。64K でも足りない場合にのみ、切り詰められた出力を保存し、続きのプロンプトを注入してモデルに先ほどの続きを出力させる。最大 3 回まで:
```python
if response.stop_reason == "max_tokens":
# First escalation: don't append truncated output, retry same request
if not state.has_escalated:
max_tokens = ESCALATED_MAX_TOKENS
state.has_escalated = True
continue # messages unchanged, same request with more tokens
# 64K still truncated: save output + continuation prompt
messages.append({"role": "assistant", "content": response.content})
if state.recovery_count < MAX_RECOVERY_RETRIES:
messages.append({"role": "user", "content":
"Output token limit hit. Resume directly — "
"no apology, no recap. Pick up mid-thought."})
state.recovery_count += 1
continue
return # still truncated after 3 continuations
# Normal: append after max_tokens check
messages.append({"role": "assistant", "content": response.content})
```
拡張は 1 回だけ、続きの出力は最大 3 回。超過したら終了——これ以上続けても実質的な出力は得られない。
### パス 2: コンテキスト超過
LLM が「コンテキストが長すぎる」と返す(`prompt_too_long`。s08 の 4 層圧縮をすべて実行したのに、まだ超えている。
reactive compact をトリガー——auto compact よりも積極的。教学版は最後の 5 メッセージだけを残して圧縮をシミュレート;実際の CC は LLM で compact サマリを生成してからリトライする。圧縮後にリトライ。ただし、一度圧縮してもまだ超過している場合は終了するしかない——再度圧縮しても小さくはならない:
```python
except PromptTooLongError:
if not state.has_attempted_reactive_compact:
messages[:] = reactive_compact(messages)
state.has_attempted_reactive_compact = True
continue
return # 圧縮済みでも超過、終了するしかない
```
### パス 3: 一時的障害
ネットワークの揺らぎ、429 レート制限、529 過負荷——これらはバグではなく、分散システムの日常だ。
429 と 529 は統一して指数バックオフ + ジッターを使用1 回目は 0.5 秒待機、2 回目は 1 秒、3 回目は 2 秒、最大 10 回。ランダムジッターを加えることで、並行リクエストが同時にリトライするのを防ぐ。3 回連続で 529 過負荷 → フォールバックモデルに切り替え(`FALLBACK_MODEL_ID` 環境変数が設定されている場合):
```python
def retry_delay(attempt, retry_after=None):
if retry_after:
return retry_after
base = min(500 * (2 ** attempt), 32000) / 1000
return base + random.uniform(0, base * 0.25)
def with_retry(fn, state, max_retries=10):
for attempt in range(max_retries):
try:
return fn()
except (RateLimitError, OverloadedError):
delay = retry_delay(attempt)
time.sleep(delay)
if is_overloaded:
state.consecutive_529 += 1
if state.consecutive_529 >= 3 and FALLBACK_MODEL:
state.current_model = FALLBACK_MODEL
raise MaxRetriesExceeded()
```
バックオフの公式:`min(500 × 2^attempt, 32000) + random(0~25%)`。サーバーが `Retry-After` ヘッダーを返した場合、その値を優先して使用する。
### 統合して実行
```python
def agent_loop(messages, context):
system = get_system_prompt(context)
state = RecoveryState()
max_tokens = 8000
while True:
try:
response = with_retry(
lambda: client.messages.create(
model=state.current_model, system=system,
messages=messages, tools=TOOLS,
max_tokens=max_tokens),
state)
except Exception as e:
if is_prompt_too_long_error(e):
if not state.has_attempted_reactive_compact:
messages[:] = reactive_compact(messages)
state.has_attempted_reactive_compact = True
continue
return
log_error(e)
return
# max_tokens check BEFORE appending to messages
if response.stop_reason == "max_tokens":
if not state.has_escalated:
max_tokens = 64000
state.has_escalated = True
continue # retry same request, messages unchanged
# save truncated output + continuation prompt
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
continue
# Normal completion
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return
# ... tool execution ...
```
外側の try/except が API 例外prompt_too_long 等)を捕捉し、`with_retry` が一時的エラー429/529を処理し、`stop_reason` のチェックが切り詰めを処理する。3 つの復旧メカニズムがそれぞれ異なるエラータイプを担当する。
---
## s10 からの変更点
| コンポーネント | 変更前 (s10) | 変更後 (s11) |
|---------------|-------------|-------------|
| エラー処理 | なし(エラーで即クラッシュ) | 3 つの復旧パターン + 指数バックオフ |
| 新規定数 | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500, FALLBACK_MODEL |
| 新規関数 | — | with_retry, retry_delay, reactive_compact, is_prompt_too_long_error, RecoveryState |
| ツール | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 変更なし |
| ループ | LLM を直接呼び出し | try/except で包み + continue でリトライ |
---
## 試してみる
```sh
cd learn-claude-code
python s11_error_recovery/code.py
```
以下の prompt を試してみよう:
1. Agent に長いコードを生成させ、切り詰め後に自動で続きが出力されるか観察する(`[max_tokens] escalating` ログを確認)
2. 連続して大量のファイルを読み込みコンテキストを肥大化させ、reactive compact の動作を観察する
3. 429/529 が発生した場合、指数バックオフのログ出力を観察する
---
## 次のステップ
Agent はエラーから自動的に復旧できるようになった。しかし、まだ処理するタスクは「使い捨て」だ——タスクを与えると実行し、終わる。
Agent に**タスクリスト**を管理させられないだろうか——依存関係があり、ディスクに永続化され、セッションをまたいで復旧できるTODO リストはタスクシステムではない。
s12 Task System → タスクとは依存関係があり、状態があり、永続化されたグラフだ。これはマルチ Agent 協調の基盤となる。
<details>
<summary>CC ソースコード深掘り</summary>
> 以下は CC ソースコード `query.ts`1729 行)、`services/api/withRetry.ts`822 行)、`query/tokenBudget.ts`93 行)、`utils/tokenBudget.ts`73 行)の分析に基づく。
### 一、十数種の reason/transition3 つだけではない)
教学版では最も一般的な 3 つの復旧パターンを解説した。CC には実際十数種の reason/transition があり、毎回の LLM 呼び出し後に判定される:
| reason/transition | 教学版の対応 | CC の動作 |
|---|---|---|
| `completed` | 正常終了 | 結果を返す |
| `next_turn` | 通常のツール呼び出し | 次のツール実行ラウンドへ |
| `max_output_tokens_escalate` | パス 1 | 8K→64K に拡張 |
| `max_output_tokens_recovery` | パス 1 続き出力 | 続きのプロンプト注入(最大 3 回) |
| `reactive_compact_retry` | パス 2 | reactive compact → リトライ |
| `prompt_too_long` | パス 2 | 同上 |
| `collapse_drain_retry` | 未展開 | context collapse 時にまず保留中の内容をコミット |
| `model_error` | 未展開 | リトライ |
| `image_error` | 未展開 | `ImageSizeError` / `ImageResizeError` の専用処理 |
| `aborted_streaming` | 未展開 | ストリーミング中断の復旧 |
| `aborted_tools` | 未展開 | ツール中断 |
| `stop_hook_blocking` | 未展開 | blocking error を注入 → モデルが自己修正 |
| `stop_hook_prevented` | 未展開 | hooks によるブロック |
| `hook_stopped` | 未展開 | hook による実行停止 |
| `token_budget_continuation` | 未展開 | token 使用量 < 90% の時に継続 |
| `blocking_limit` | 未展開 | ブロック制限 |
| `max_turns` | 未展開 | 最大ターン数に到達 |
教学版では最初の 5 つ(最も一般的なもの)だけを展開した。残りはそれぞれ専用の処理ロジックを持つ。
### 二、指数バックオフの正確な公式
CC のバックオフ遅延(`withRetry.ts:530-548`
```
delay = min(500 × 2^(attempt-1), 32000) + random(0~25%)
```
| 試行 | 基本遅延 | + ジッター |
|------|---------|-----------|
| 1 | 500ms | 0-125ms |
| 2 | 1000ms | 0-250ms |
| 4 | 4000ms | 0-1000ms |
| 7+ | 32000ms上限 | 0-8000ms |
サーバーが `Retry-After` ヘッダーを返した場合、その値を優先して使用する。
### 三、CONTINUATION プロンプト原文
CC の続き出力プロンプト(`query.ts:1225-1227`
```
Output token limit hit. Resume directly — no apology, no recap of what
you were doing. Pick up mid-thought if that is where the cut happened.
Break remaining work into smaller pieces.
```
Token budget のナッジプロンプト(`tokenBudget.ts:72`
```
Stopped at {pct}% of token target. Keep working — do not summarize.
```
### 四、ストリーミングエラー処理
CC のストリーミングパスでは、復旧可能なエラー413、max_tokens、media errorはストリーミング中**表示を保留される**`query.ts:788-822`——SDK コンシューマーには見えず、復旧ロジックだけが認識できる。ストリーミング終了後に復旧が必要かどうかを判断する。
### 五、529 → フォールバックモデル切り替え
3 回連続で 529 過負荷エラーが発生した後(`MAX_529_RETRIES = 3`、CC は自動的にフォールバックモデルに切り替えるOpus → Sonnet。切り替え時にすべての保留中のメッセージと tool 結果をクリアし、ユーザーに "Switched to {model} due to high demand" と表示する。
### 六、収穫逓減の検出
Token budget の「継続」は無限ではない。連続 3 回の continuation で token 増分が 500 未満の場合、システムは「続けても実質的な出力は得られない」と判断し、continuation を停止する(`tokenBudget.ts:60-62`)。
</details>
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->

View File

@@ -0,0 +1,277 @@
# s11: Error Recovery — 错误不是结束,是重试的开始
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s20
> *"错误不是终点, 是重试的起点"* — 升级 token、压缩上下文、切换模型。
>
> **Harness 层**: 韧性 — 主循环遇到错误时分类并恢复。
---
## 问题
Agent 跑着跑着报错了:
```
Error: 529 overloaded
```
Agent 崩溃了。它没有重试,没有换模型,没有减少上下文——直接崩溃。
生产环境中 API 错误是常态。三种最常见的故障模式:**输出被截断**(模型话说一半 token 用完了)、**上下文超限**(压缩后还是太长)、**临时故障**429 限流 / 529 过载)。一个不处理错误的 Agent 就像一个一碰就熄火的车。
---
## 解决方案
![Error Recovery Overview](images/error-recovery-overview.svg)
s10 的循环、prompt 组装全部保留。唯一的变动LLM 调用包裹在 try/except 里,根据错误类型走不同的恢复路径。恢复后 `continue` 回到循环开头重新调用 LLM。
三种最常见的恢复模式(教学版只处理 429/529真实系统还覆盖连接错误、超时、云厂商认证缓存等。CC 实际有 13+ reason code其余见 Deep dive
| 模式 | 触发 | 恢复动作 |
|------|------|---------|
| 输出截断 | `max_tokens` | 升级 8K→64K / 续写提示 |
| 上下文超限 | `prompt_too_long` | reactive compact → 重试 |
| 临时故障 | 429 / 529 | 指数退避 + 抖动,连续 529 可切换备用模型 |
---
## 工作原理
### 路径 1: 输出被截断
模型话说一半,`max_tokens` 用完了。默认 8000 token 不够它输出完整回答。
第一次发生时,直接把 `max_tokens` 从 8K 升级到 64K8 倍空间),重试同一请求——此时不追加截断输出到 messages保持原始请求不变。如果 64K 还是不够,才保存截断输出并注入续写提示让模型接着刚才的话继续说,最多 3 次:
```python
if response.stop_reason == "max_tokens":
# First escalation: don't append truncated output, retry same request
if not state.has_escalated:
max_tokens = ESCALATED_MAX_TOKENS
state.has_escalated = True
continue # messages unchanged, same request with more tokens
# 64K still truncated: save output + continuation prompt
messages.append({"role": "assistant", "content": response.content})
if state.recovery_count < MAX_RECOVERY_RETRIES:
messages.append({"role": "user", "content":
"Output token limit hit. Resume directly — "
"no apology, no recap. Pick up mid-thought."})
state.recovery_count += 1
continue
return # still truncated after 3 continuations
# Normal: append after max_tokens check
messages.append({"role": "assistant", "content": response.content})
```
升级只有一次机会,续写最多 3 次。超过就退出——继续续写也不会有实质产出。
### 路径 2: 上下文超限
LLM 说"你的上下文太长了"`prompt_too_long`。s08 的四层压缩全跑过了,还是超。
触发 reactive compact——比 auto compact 更激进。教学版只保留最后 5 条消息模拟压缩效果;真实实现会调用 LLM 生成 compact 摘要再重试。压缩后重试。但如果压缩过一次还是超限,只能退出——再压缩也不会变小:
```python
except PromptTooLongError:
if not state.has_attempted_reactive_compact:
messages[:] = reactive_compact(messages)
state.has_attempted_reactive_compact = True
continue
return # 压缩过了还是超限,只能退出
```
### 路径 3: 临时故障
网络抖动、429 限流、529 过载——这些不是 bug是分布式系统的常态。
429 和 529 统一走指数退避 + 抖动:第一次等 0.5 秒,第二次等 1 秒,第三次等 2 秒,最多 10 次。加随机抖动让并发请求不在同一时刻重试。连续 3 次 529 过载 → 切换到备用模型(若配置了 `FALLBACK_MODEL_ID` 环境变量):
```python
def retry_delay(attempt, retry_after=None):
if retry_after:
return retry_after
base = min(500 * (2 ** attempt), 32000) / 1000
return base + random.uniform(0, base * 0.25)
def with_retry(fn, state, max_retries=10):
for attempt in range(max_retries):
try:
return fn()
except (RateLimitError, OverloadedError):
delay = retry_delay(attempt)
time.sleep(delay)
if is_overloaded:
state.consecutive_529 += 1
if state.consecutive_529 >= 3 and FALLBACK_MODEL:
state.current_model = FALLBACK_MODEL
raise MaxRetriesExceeded()
```
退避公式:`min(500 × 2^attempt, 32000) + random(0~25%)`。如果服务器返回 `Retry-After` header优先用那个值。
### 合起来跑
```python
def agent_loop(messages, context):
system = get_system_prompt(context)
state = RecoveryState()
max_tokens = 8000
while True:
try:
response = with_retry(
lambda: client.messages.create(
model=state.current_model, system=system,
messages=messages, tools=TOOLS,
max_tokens=max_tokens),
state)
except Exception as e:
if is_prompt_too_long_error(e):
if not state.has_attempted_reactive_compact:
messages[:] = reactive_compact(messages)
state.has_attempted_reactive_compact = True
continue
return
log_error(e)
return
# max_tokens check BEFORE appending to messages
if response.stop_reason == "max_tokens":
if not state.has_escalated:
max_tokens = 64000
state.has_escalated = True
continue # retry same request, messages unchanged
# save truncated output + continuation prompt
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
continue
# Normal completion
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return
# ... tool execution ...
```
外层 try/except 捕获 API 异常prompt_too_long 等),`with_retry` 处理瞬态错误429/529`stop_reason` 检查处理截断。三种恢复机制各管各的错误类型。
---
## 相对 s10 的变更
| 组件 | 之前 (s10) | 之后 (s11) |
|------|-----------|-----------|
| 错误处理 | 无(一碰就崩溃) | 三种恢复模式 + 指数退避 |
| 新常量 | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500, FALLBACK_MODEL |
| 新函数 | — | with_retry, retry_delay, reactive_compact, is_prompt_too_long_error, RecoveryState |
| 工具 | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 不变 |
| 循环 | 裸调用 LLM | try/except 包裹 + continue 重试 |
---
## 试一下
```sh
cd learn-claude-code
python s11_error_recovery/code.py
```
试试这些 prompt
1. 让 Agent 生成一段很长的代码,观察截断后是否自动续写(看 `[max_tokens] escalating` 日志)
2. 连续读取大量文件撑大上下文,观察 reactive compact
3. 如果遇到 429/529观察指数退避的日志输出
---
## 接下来
Agent 现在能在错误中自动恢复了。但它处理的任务仍然是"一次性"的——你给它一个任务,它做完,结束。
能不能让 Agent 管理一个**任务列表**——有依赖关系、持久化到磁盘、跨会话能恢复TODO 列表不是任务系统。
s12 Task System → 任务是有依赖、有状态、持久化的图。这是多 Agent 协作的基础。
<details>
<summary>深入 CC 源码</summary>
> 以下基于 CC 源码 `query.ts`1729 行)、`services/api/withRetry.ts`822 行)、`query/tokenBudget.ts`93 行)、`utils/tokenBudget.ts`73 行)的分析。
### 一、十几种 reason/transition不只是 3 条)
教学版讲了 3 种最常见的恢复模式。CC 实际有十几种 reason/transition每轮 LLM 调用后都会判断:
| reason/transition | 教学版对应 | CC 行为 |
|---|---|---|
| `completed` | 正常完成 | 返回结果 |
| `next_turn` | 正常工具调用 | 继续下一轮工具执行 |
| `max_output_tokens_escalate` | 路径 1 | 8K→64K 升级 |
| `max_output_tokens_recovery` | 路径 1 续写 | 续写提示(最多 3 次) |
| `reactive_compact_retry` | 路径 2 | reactive compact → 重试 |
| `prompt_too_long` | 路径 2 | 同上 |
| `collapse_drain_retry` | 未展开 | context collapse 先提交暂存 |
| `model_error` | 未展开 | 重试 |
| `image_error` | 未展开 | `ImageSizeError` / `ImageResizeError` 专门处理 |
| `aborted_streaming` | 未展开 | 流式中止恢复 |
| `aborted_tools` | 未展开 | 工具中止 |
| `stop_hook_blocking` | 未展开 | 注入 blocking error → 模型自纠 |
| `stop_hook_prevented` | 未展开 | hooks 阻止 |
| `hook_stopped` | 未展开 | hook 停止执行 |
| `token_budget_continuation` | 未展开 | token 用量 < 90% 时继续 |
| `blocking_limit` | 未展开 | 阻塞限制 |
| `max_turns` | 未展开 | 达到最大轮次 |
教学版只展开了前 5 种(最常见的),其余各有专门处理逻辑。
### 二、指数退避的精确公式
CC 的退避延迟(`withRetry.ts:530-548`
```
delay = min(500 × 2^(attempt-1), 32000) + random(0~25%)
```
| 尝试 | 基础延迟 | + 抖动 |
|------|---------|--------|
| 1 | 500ms | 0-125ms |
| 2 | 1000ms | 0-250ms |
| 4 | 4000ms | 0-1000ms |
| 7+ | 32000ms上限 | 0-8000ms |
如果服务器返回 `Retry-After` header优先用那个值。
### 三、CONTINUATION 提示原文
CC 的续写提示(`query.ts:1225-1227`
```
Output token limit hit. Resume directly — no apology, no recap of what
you were doing. Pick up mid-thought if that is where the cut happened.
Break remaining work into smaller pieces.
```
Token budget 的 nudge 提示(`tokenBudget.ts:72`
```
Stopped at {pct}% of token target. Keep working — do not summarize.
```
### 四、流式错误处理
CC 的流式路径中可恢复的错误413、max_tokens、media error在 streaming 期间**被暂扣不展示**`query.ts:788-822`——SDK 消费者看不到,只有恢复逻辑能看到。等 streaming 结束后才判断是否需要恢复。
### 五、529 → Fallback Model 切换
连续 3 次 529 过载错误后(`MAX_529_RETRIES = 3`CC 自动切换到 fallback model如 Opus → Sonnet。切换时清除所有 pending 消息和 tool 结果,给用户展示 "Switched to {model} due to high demand"。
### 六、Diminishing Returns 检测
Token budget 的"继续"不是无限的。当连续 3 次 continuation 且 token 增量 < 500 时,系统判断"继续也没有实质性产出",停止 continuation`tokenBudget.ts:60-62`)。
</details>
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->

361
s11_error_recovery/code.py Normal file
View File

@@ -0,0 +1,361 @@
#!/usr/bin/env python3
"""
s11: Error Recovery — three recovery paths + exponential backoff.
Run: python s11_error_recovery/code.py
Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY
Changes from s10:
- LLM call wrapped in try/except with three recovery paths
- Path 1: max_tokens -> escalate 8K->64K (no append on first escalation),
then continuation prompt (max 3)
- Path 2: prompt_too_long -> reactive compact -> retry (once)
- Path 3: 429/529 -> exponential backoff with jitter (max 10),
fallback model on consecutive 529
- with_retry wrapper for transient errors
- RecoveryState tracks escalation / compact / 529 / model
ASCII flow:
messages -> prompt assembly -> compress+load -> [try] LLM [except] -> tools -> loop
| |
stop_reason error type
max_tokens? prompt_too_long? -> compact
escalate / 429/529? -> backoff
continue other? -> log + exit
"""
import os, subprocess, time, random, json
from pathlib import Path
try:
import readline
readline.parse_and_bind('set bind-tty-special-chars off')
except ImportError:
pass
from anthropic import Anthropic
from dotenv import load_dotenv
load_dotenv(override=True)
if os.getenv("ANTHROPIC_BASE_URL"):
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
WORKDIR = Path.cwd()
MEMORY_DIR = WORKDIR / ".memory"
MEMORY_INDEX = MEMORY_DIR / "MEMORY.md"
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
PRIMARY_MODEL = os.environ["MODEL_ID"]
FALLBACK_MODEL = os.getenv("FALLBACK_MODEL_ID")
# ── Constants ──
ESCALATED_MAX_TOKENS = 64000
DEFAULT_MAX_TOKENS = 8000
MAX_RECOVERY_RETRIES = 3
MAX_RETRIES = 10
BASE_DELAY_MS = 500
MAX_CONSECUTIVE_529 = 3
CONTINUATION_PROMPT = (
"Output token limit hit. Resume directly — "
"no apology, no recap. Pick up mid-thought."
)
# ── Prompt Assembly (from s10, synced) ──
PROMPT_SECTIONS = {
"identity": "You are a coding agent. Act, don't explain.",
"tools": "Available tools: bash, read_file, write_file.",
"workspace": f"Working directory: {WORKDIR}",
"memory": "Relevant memories are injected below when available.",
}
def assemble_system_prompt(context: dict) -> str:
sections = [PROMPT_SECTIONS["identity"],
PROMPT_SECTIONS["tools"],
PROMPT_SECTIONS["workspace"]]
memories = context.get("memories", "")
if memories:
sections.append(f"Relevant memories:\n{memories}")
return "\n\n".join(sections)
_last_context_key, _last_prompt = None, None
def get_system_prompt(context: dict) -> str:
global _last_context_key, _last_prompt
key = json.dumps(context, sort_keys=True, ensure_ascii=False, default=str)
if key == _last_context_key and _last_prompt:
print(" \033[90m[cache hit] system prompt unchanged\033[0m")
return _last_prompt
_last_context_key = key
_last_prompt = assemble_system_prompt(context)
loaded = ["identity", "tools", "workspace"]
if context.get("memories"):
loaded.append("memory")
print(f" \033[32m[assembled] sections: {', '.join(loaded)}\033[0m")
return _last_prompt
# ── Tools (unchanged) ──
def safe_path(p: str) -> Path:
path = (WORKDIR / p).resolve()
if not path.is_relative_to(WORKDIR):
raise ValueError(f"Path escapes workspace: {p}")
return path
def run_bash(command: str) -> str:
try:
r = subprocess.run(command, shell=True, cwd=WORKDIR,
capture_output=True, text=True, timeout=120)
out = (r.stdout + r.stderr).strip()
return out[:50000] if out else "(no output)"
except subprocess.TimeoutExpired:
return "Error: Timeout (120s)"
def run_read(path: str, limit: int | None = None) -> str:
try:
lines = safe_path(path).read_text().splitlines()
if limit and limit < len(lines):
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
return "\n".join(lines)
except Exception as e:
return f"Error: {e}"
def run_write(path: str, content: str) -> str:
try:
file_path = safe_path(path)
file_path.parent.mkdir(parents=True, exist_ok=True)
file_path.write_text(content)
return f"Wrote {len(content)} bytes to {path}"
except Exception as e:
return f"Error: {e}"
TOOLS = [
{"name": "bash", "description": "Run a shell command.",
"input_schema": {"type": "object",
"properties": {"command": {"type": "string"}},
"required": ["command"]}},
{"name": "read_file", "description": "Read file contents.",
"input_schema": {"type": "object",
"properties": {"path": {"type": "string"},
"limit": {"type": "integer"}},
"required": ["path"]}},
{"name": "write_file", "description": "Write content to a file.",
"input_schema": {"type": "object",
"properties": {"path": {"type": "string"},
"content": {"type": "string"}},
"required": ["path", "content"]}},
]
TOOL_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write}
# ── Error Recovery (s11 new) ──
class RecoveryState:
"""Track recovery attempts across the loop."""
def __init__(self):
self.has_escalated = False
self.recovery_count = 0
self.consecutive_529 = 0
self.has_attempted_reactive_compact = False
self.current_model = PRIMARY_MODEL
def retry_delay(attempt, retry_after=None):
"""Exponential backoff with jitter. Retry-After takes priority."""
if retry_after:
return retry_after
base = min(BASE_DELAY_MS * (2 ** attempt), 32000) / 1000
jitter = random.uniform(0, base * 0.25)
return base + jitter
def with_retry(fn, state: RecoveryState):
"""Exponential backoff for transient errors (429/529).
Non-transient errors are re-raised for the outer handler."""
for attempt in range(MAX_RETRIES):
try:
result = fn()
state.consecutive_529 = 0
return result
except Exception as e:
name = type(e).__name__
msg = str(e).lower()
# 429 rate limit -> exponential backoff
if "ratelimit" in name.lower() or "429" in msg:
delay = retry_delay(attempt)
print(f" \033[33m[429 rate limit] retry {attempt+1}/{MAX_RETRIES},"
f" wait {delay:.1f}s\033[0m")
time.sleep(delay)
continue
# 529 overloaded -> exponential backoff + fallback model
if "overloaded" in name.lower() or "529" in msg or "overloaded" in msg:
state.consecutive_529 += 1
if state.consecutive_529 >= MAX_CONSECUTIVE_529:
if FALLBACK_MODEL:
state.current_model = FALLBACK_MODEL
state.consecutive_529 = 0
print(f" \033[31m[529 x{MAX_CONSECUTIVE_529}]"
f" switching to {FALLBACK_MODEL}\033[0m")
else:
state.consecutive_529 = 0
print(f" \033[31m[529 x{MAX_CONSECUTIVE_529}]"
f" no FALLBACK_MODEL_ID configured, continuing retry\033[0m")
delay = retry_delay(attempt)
print(f" \033[33m[529 overloaded] retry {attempt+1}/{MAX_RETRIES},"
f" wait {delay:.1f}s\033[0m")
time.sleep(delay)
continue
# Not transient -> re-raise for outer try/except
raise
raise RuntimeError(f"Max retries ({MAX_RETRIES}) exceeded")
def is_prompt_too_long_error(e: Exception) -> bool:
"""Check whether an API error indicates prompt/context too long."""
msg = str(e).lower()
return (("prompt" in msg and "long" in msg)
or "prompt_is_too_long" in msg
or "context_length_exceeded" in msg
or "max_context_window" in msg)
def reactive_compact(messages: list) -> list:
"""Emergency compact — teaching version keeps last N messages.
Real CC generates a compact summary via LLM, then retries with
the compacted message list. Teaching version simplifies to tail
retention since s08/s09 already cover LLM-based compact."""
print(" \033[31m[reactive compact] trimming to last 5 messages\033[0m")
tail = messages[-5:]
return [{"role": "user",
"content": "[Reactive compact] Earlier conversation trimmed. "
"Continue from where you left off."}, *tail]
# ── Context ──
def update_context(context: dict, messages: list) -> dict:
"""Derive context from real state: which tools exist, whether memory files exist."""
memories = ""
if MEMORY_INDEX.exists():
content = MEMORY_INDEX.read_text().strip()
if content:
memories = content
return {
"enabled_tools": list(TOOL_HANDLERS.keys()),
"workspace": str(WORKDIR),
"memories": memories,
}
# ── Agent Loop ──
def agent_loop(messages: list, context: dict):
"""Main loop with error recovery wrapping LLM calls."""
system = get_system_prompt(context)
state = RecoveryState()
max_tokens = DEFAULT_MAX_TOKENS
while True:
# ── LLM call: with_retry handles 429/529, outer handles rest ──
try:
response = with_retry(
lambda mt=max_tokens, mdl=state.current_model:
client.messages.create(
model=mdl, system=system, messages=messages,
tools=TOOLS, max_tokens=mt),
state)
except Exception as e:
# Path 2: prompt_too_long -> reactive compact (once)
if is_prompt_too_long_error(e):
if not state.has_attempted_reactive_compact:
messages[:] = reactive_compact(messages)
state.has_attempted_reactive_compact = True
continue
print(" \033[31m[unrecoverable] still too long after compact\033[0m")
messages.append({"role": "assistant", "content": [
{"type": "text",
"text": "[Error] Context too large, cannot continue."}]})
return
# Unrecoverable
name = type(e).__name__
print(f" \033[31m[unrecoverable] {name}: {str(e)[:100]}\033[0m")
messages.append({"role": "assistant", "content": [
{"type": "text", "text": f"[Error] {name}: {str(e)[:200]}"}]})
return
# ── Path 1: max_tokens -> escalate or continue ──
if response.stop_reason == "max_tokens":
# First escalation: don't append truncated output, retry same request
if not state.has_escalated:
max_tokens = ESCALATED_MAX_TOKENS
state.has_escalated = True
print(f" \033[33m[max_tokens] escalating"
f" {DEFAULT_MAX_TOKENS} -> {ESCALATED_MAX_TOKENS}\033[0m")
continue
# 64K still truncated: save truncated output + continuation prompt
messages.append({"role": "assistant", "content": response.content})
if state.recovery_count < MAX_RECOVERY_RETRIES:
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
state.recovery_count += 1
print(f" \033[33m[max_tokens] continuation"
f" {state.recovery_count}/{MAX_RECOVERY_RETRIES}\033[0m")
continue
print(" \033[31m[max_tokens] recovery limit reached\033[0m")
return
# Normal completion: append assistant response
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return
# ── Tool execution ──
results = []
for block in response.content:
if block.type != "tool_use":
continue
print(f"\033[36m> {block.name}\033[0m")
handler = TOOL_HANDLERS.get(block.name)
output = handler(**block.input) if handler else f"Unknown: {block.name}"
print(str(output)[:200])
results.append({"type": "tool_result",
"tool_use_id": block.id, "content": output})
messages.append({"role": "user", "content": results})
context = update_context(context, messages)
system = get_system_prompt(context)
if __name__ == "__main__":
print("s11: error recovery")
print("Enter a question, press Enter to send. Type q to quit.\n")
history = []
context = update_context({}, [])
while True:
try:
query = input("\033[36ms11 >> \033[0m")
except (EOFError, KeyboardInterrupt):
break
if query.strip().lower() in ("q", "exit", ""):
break
history.append({"role": "user", "content": query})
agent_loop(history, context)
context = update_context(context, history)
for block in history[-1]["content"]:
if getattr(block, "type", None) == "text":
print(block.text)
print()

View File

@@ -0,0 +1,98 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 440" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#dc2626"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
</marker>
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
<linearGradient id="l1" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
</linearGradient>
<linearGradient id="l2" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
</linearGradient>
<linearGradient id="l3" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
</linearGradient>
</defs>
<rect width="760" height="440" fill="#fafbfc" rx="8"/>
<!-- Title -->
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Error Recovery — try/except wrapping LLM calls, three recovery modes</text>
<!-- Legend -->
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s10 retained</text>
<rect x="140" y="56" width="12" height="10" rx="2" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="158" y="66" fill="#d97706" font-size="10" font-weight="600">s11 new</text>
<!-- ===== s10 loop (compact) ===== -->
<rect x="30" y="92" width="80" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="70" y="116" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">messages</text>
<line x1="110" y1="112" x2="128" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="131" y="86" width="90" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="176" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">prompt assembly</text>
<text x="176" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s10)</text>
<line x1="221" y1="112" x2="239" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="242" y="86" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="292" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">compress + load</text>
<text x="292" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s08-s09)</text>
<line x1="342" y1="112" x2="360" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- LLM (wrapped in try/except) -->
<rect x="363" y="86" width="80" height="52" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
<text x="403" y="108" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">LLM</text>
<text x="403" y="122" fill="#dc2626" font-size="8" text-anchor="middle">try/except</text>
<line x1="443" y1="112" x2="461" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="464" y="86" width="110" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="519" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
<text x="519" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">bash · read · write</text>
<!-- Arrow: LLM → Recovery -->
<path d="M 403 138 L 403 178" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
<text x="415" y="164" fill="#dc2626" font-size="9">error</text>
<!-- ===== Recovery Section ===== -->
<rect x="20" y="182" width="720" height="22" rx="4" fill="#f1f5f9"/>
<text x="55" y="197" fill="#64748b" font-size="11" font-weight="600">Error Recovery (classify, recover, retry LLM)</text>
<!-- Layer 1: max_tokens -->
<rect x="40" y="210" width="680" height="48" rx="7" fill="url(#l1)" stroke="#d97706" stroke-width="1.5"/>
<text x="60" y="230" fill="#92400e" font-size="12" font-weight="600">Path 1</text>
<text x="112" y="230" fill="#92400e" font-size="11" font-weight="700">max_tokens</text>
<text x="200" y="230" fill="#92400e" font-size="11">Output truncated → escalate 8K→64K (once) / continuation prompt (max 3)</text>
<text x="200" y="246" fill="#b45309" font-size="9">Trigger: stop_reason == "max_tokens" · Cost: 0-1 API · Recover then continue</text>
<!-- Layer 2: prompt_too_long -->
<rect x="40" y="266" width="680" height="48" rx="7" fill="url(#l2)" stroke="#ea580c" stroke-width="1.5"/>
<text x="60" y="286" fill="#9a3412" font-size="12" font-weight="600">Path 2</text>
<text x="112" y="286" fill="#9a3412" font-size="11" font-weight="700">prompt_too_long</text>
<text x="230" y="286" fill="#9a3412" font-size="11">Context overflow → reactive compact → retry (one chance)</text>
<text x="200" y="302" fill="#c2410c" font-size="9">Trigger: API returns 413 · Cost: 1 API · Still over after compact → exit</text>
<!-- Layer 3: 429/529 -->
<rect x="40" y="322" width="680" height="48" rx="7" fill="url(#l3)" stroke="#dc2626" stroke-width="1.5"/>
<text x="60" y="342" fill="#991b1b" font-size="12" font-weight="600">Path 3</text>
<text x="112" y="342" fill="#991b1b" font-size="11" font-weight="700">429/529</text>
<text x="170" y="342" fill="#991b1b" font-size="11">Transient failure → exponential backoff + jitter (max 10) / 3×529 → switch model</text>
<text x="200" y="358" fill="#b91c1c" font-size="9">Trigger: RateLimitError / OverloadedError · Formula: min(500×2^n, 32s) + jitter</text>
<!-- ===== Bottom notes ===== -->
<rect x="40" y="388" width="680" height="40" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<text x="60" y="406" fill="#475569" font-size="10">Three most common recovery modes. CC has 13+ reason codes (image_error, aborted_streaming, etc.), each with dedicated handling.</text>
<text x="60" y="422" fill="#94a3b8" font-size="9">All paths after recovery → continue back to LLM · Normal flow: tool results → messages → loop</text>
</svg>

After

Width:  |  Height:  |  Size: 6.5 KiB

View File

@@ -0,0 +1,98 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 440" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#dc2626"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
</marker>
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
<linearGradient id="l1" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
</linearGradient>
<linearGradient id="l2" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
</linearGradient>
<linearGradient id="l3" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
</linearGradient>
</defs>
<rect width="760" height="440" fill="#fafbfc" rx="8"/>
<!-- Title -->
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Error Recovery — try/except で LLM 呼び出しをラップ、3 つの復旧モード</text>
<!-- Legend -->
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s10 維持</text>
<rect x="140" y="56" width="12" height="10" rx="2" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="158" y="66" fill="#d97706" font-size="10" font-weight="600">s11 新規</text>
<!-- ===== s10 loop (compact) ===== -->
<rect x="30" y="92" width="80" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="70" y="116" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">messages</text>
<line x1="110" y1="112" x2="128" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="131" y="86" width="90" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="176" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">prompt assembly</text>
<text x="176" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s10)</text>
<line x1="221" y1="112" x2="239" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="242" y="86" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="292" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">compress + load</text>
<text x="292" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s08-s09)</text>
<line x1="342" y1="112" x2="360" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- LLM (wrapped in try/except) -->
<rect x="363" y="86" width="80" height="52" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
<text x="403" y="108" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">LLM</text>
<text x="403" y="122" fill="#dc2626" font-size="8" text-anchor="middle">try/except</text>
<line x1="443" y1="112" x2="461" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="464" y="86" width="110" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="519" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
<text x="519" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">bash · read · write</text>
<!-- Arrow: LLM → Recovery -->
<path d="M 403 138 L 403 178" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
<text x="415" y="164" fill="#dc2626" font-size="9">エラー</text>
<!-- ===== Recovery Section ===== -->
<rect x="20" y="182" width="720" height="22" rx="4" fill="#f1f5f9"/>
<text x="55" y="197" fill="#64748b" font-size="11" font-weight="600">エラー復旧(分類処理、復旧後 LLM に戻りリトライ)</text>
<!-- Layer 1: max_tokens -->
<rect x="40" y="210" width="680" height="48" rx="7" fill="url(#l1)" stroke="#d97706" stroke-width="1.5"/>
<text x="60" y="230" fill="#92400e" font-size="12" font-weight="600">パス 1</text>
<text x="112" y="230" fill="#92400e" font-size="11" font-weight="700">max_tokens</text>
<text x="200" y="230" fill="#92400e" font-size="11">出力が途切れた → 8K→64K に拡張1 回)/ 続行プロンプト(最大 3 回)</text>
<text x="200" y="246" fill="#b45309" font-size="9">トリガー: stop_reason == "max_tokens" · コスト: 0-1 API · 復旧後 continue</text>
<!-- Layer 2: prompt_too_long -->
<rect x="40" y="266" width="680" height="48" rx="7" fill="url(#l2)" stroke="#ea580c" stroke-width="1.5"/>
<text x="60" y="286" fill="#9a3412" font-size="12" font-weight="600">パス 2</text>
<text x="112" y="286" fill="#9a3412" font-size="11" font-weight="700">prompt_too_long</text>
<text x="230" y="286" fill="#9a3412" font-size="11">コンテキスト超過 → reactive compact → リトライ1 回のみ)</text>
<text x="200" y="302" fill="#c2410c" font-size="9">トリガー: API が 413 返却 · コスト: 1 API · 圧縮後も超過 → 終了</text>
<!-- Layer 3: 429/529 -->
<rect x="40" y="322" width="680" height="48" rx="7" fill="url(#l3)" stroke="#dc2626" stroke-width="1.5"/>
<text x="60" y="342" fill="#991b1b" font-size="12" font-weight="600">パス 3</text>
<text x="112" y="342" fill="#991b1b" font-size="11" font-weight="700">429/529</text>
<text x="170" y="342" fill="#991b1b" font-size="11">一時障害 → 指数バックオフ + ジッター(最大 10 回)/ 3 回 529 → モデル切替</text>
<text x="200" y="358" fill="#b91c1c" font-size="9">トリガー: RateLimitError / OverloadedError · 式: min(500×2^n, 32s) + jitter</text>
<!-- ===== Bottom notes ===== -->
<rect x="40" y="388" width="680" height="40" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<text x="60" y="406" fill="#475569" font-size="10">最も一般的な 3 つの復旧モード。CC は実際に 13+ の reason code を持ちimage_error, aborted_streaming 等)、それぞれ専用の処理がある。</text>
<text x="60" y="422" fill="#94a3b8" font-size="9">全パス復旧後 → continue で LLM に戻る · 正常フロー: ツール結果 → messages → ループ</text>
</svg>

After

Width:  |  Height:  |  Size: 6.7 KiB

View File

@@ -0,0 +1,98 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 440" font-family="system-ui, -apple-system, sans-serif">
<defs>
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#dc2626"/>
</linearGradient>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
</marker>
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
</marker>
<linearGradient id="l1" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
</linearGradient>
<linearGradient id="l2" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
</linearGradient>
<linearGradient id="l3" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
</linearGradient>
</defs>
<rect width="760" height="440" fill="#fafbfc" rx="8"/>
<!-- Title -->
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Error Recovery — try/except 包裹 LLM 调用,三种恢复模式</text>
<!-- Legend -->
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s10 保留</text>
<rect x="140" y="56" width="12" height="10" rx="2" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
<text x="158" y="66" fill="#d97706" font-size="10" font-weight="600">s11 新增</text>
<!-- ===== s10 loop (compact) ===== -->
<rect x="30" y="92" width="80" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="70" y="116" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">messages</text>
<line x1="110" y1="112" x2="128" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="131" y="86" width="90" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="176" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">prompt assembly</text>
<text x="176" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s10)</text>
<line x1="221" y1="112" x2="239" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="242" y="86" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="292" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">compress + load</text>
<text x="292" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s08-s09)</text>
<line x1="342" y1="112" x2="360" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- LLM (wrapped in try/except) -->
<rect x="363" y="86" width="80" height="52" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
<text x="403" y="108" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">LLM</text>
<text x="403" y="122" fill="#dc2626" font-size="8" text-anchor="middle">try/except</text>
<line x1="443" y1="112" x2="461" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
<rect x="464" y="86" width="110" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
<text x="519" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
<text x="519" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">bash · read · write</text>
<!-- Arrow: LLM → Recovery -->
<path d="M 403 138 L 403 178" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
<text x="415" y="164" fill="#dc2626" font-size="9">报错</text>
<!-- ===== Recovery Section ===== -->
<rect x="20" y="182" width="720" height="22" rx="4" fill="#f1f5f9"/>
<text x="55" y="197" fill="#64748b" font-size="11" font-weight="600">错误恢复(分类处理,恢复后回到 LLM 重试)</text>
<!-- Layer 1: max_tokens -->
<rect x="40" y="210" width="680" height="48" rx="7" fill="url(#l1)" stroke="#d97706" stroke-width="1.5"/>
<text x="60" y="230" fill="#92400e" font-size="12" font-weight="600">路径 1</text>
<text x="112" y="230" fill="#92400e" font-size="11" font-weight="700">max_tokens</text>
<text x="200" y="230" fill="#92400e" font-size="11">输出被截断 → 升级 8K→64K一次/ 续写提示(最多 3 次)</text>
<text x="200" y="246" fill="#b45309" font-size="9">触发: stop_reason == "max_tokens" · 代价: 0-1 API · 恢复后 continue</text>
<!-- Layer 2: prompt_too_long -->
<rect x="40" y="266" width="680" height="48" rx="7" fill="url(#l2)" stroke="#ea580c" stroke-width="1.5"/>
<text x="60" y="286" fill="#9a3412" font-size="12" font-weight="600">路径 2</text>
<text x="112" y="286" fill="#9a3412" font-size="11" font-weight="700">prompt_too_long</text>
<text x="230" y="286" fill="#9a3412" font-size="11">上下文超限 → reactive compact → 重试(一次机会)</text>
<text x="200" y="302" fill="#c2410c" font-size="9">触发: API 返回 413 · 代价: 1 API · 压缩过还是超 → 退出</text>
<!-- Layer 3: 429/529 -->
<rect x="40" y="322" width="680" height="48" rx="7" fill="url(#l3)" stroke="#dc2626" stroke-width="1.5"/>
<text x="60" y="342" fill="#991b1b" font-size="12" font-weight="600">路径 3</text>
<text x="112" y="342" fill="#991b1b" font-size="11" font-weight="700">429/529</text>
<text x="170" y="342" fill="#991b1b" font-size="11">临时故障 → 指数退避 + 抖动(最多 10 次)/ 3 次 529 → 切换模型</text>
<text x="200" y="358" fill="#b91c1c" font-size="9">触发: RateLimitError / OverloadedError · 公式: min(500×2^n, 32s) + jitter</text>
<!-- ===== Bottom notes ===== -->
<rect x="40" y="388" width="680" height="40" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
<text x="60" y="406" fill="#475569" font-size="10">三种最常见的恢复模式。CC 实际有 13+ reason codeimage_error、aborted_streaming 等),各有专门处理。</text>
<text x="60" y="422" fill="#94a3b8" font-size="9">所有路径恢复后 → continue 回到 LLM · 正常流程: 工具结果 → messages → 循环</text>
</svg>

After

Width:  |  Height:  |  Size: 6.5 KiB