mirror of
https://github.com/shareAI-lab/analysis_claude_code.git
synced 2026-06-21 04:33:36 +08:00
* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building incrementally on the previous. Key fixes across chapters: - s01-s04: agent loop, tool dispatch, permission pipeline, hooks - s05-s08: todo write, subagent, skill loading, context compact - s09-s11: memory system, system prompt assembly, error recovery - s12-s14: task graph, background tasks, cron scheduler All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS, json.dumps cache, real-state context, can_start dep protection, etc.). * feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform chapters. Each chapter inherits all previous fixes and adds one mechanism: - s15: agent teams (TeamCreate, teammate threads, shared task list) - s16: team protocols (plan approval, shutdown handshake, consume_inbox) - s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox) - s18: worktree isolation (git worktree, bind_task, cwd switching, safety) - s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache) All appendix source code references verified against CC source. Config priority corrected: claude.ai < plugin < user < project < local. * fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash - s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02) - s06-s08: todo_write validates content/status required fields (inherited from s05) - s09: extract_memories uses pre-compression snapshot instead of compacted messages - s16: submit_plan docstring clarifies protocol-only (not code-level gate) - s17-s19: match_response restores type mismatch validation (from s16) - s17-s19: claim_task deps list handles missing dep files without crashing * fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation - s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task, non-interactive/SDK defaults to TodoWrite. Fix env var name to CLAUDE_CODE_ENABLE_TASKS (not TODO_V2). - s14/s15: add _validate_cron_field with per-field range checks (minute 0-59, hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi. Replace old try/except validation that only caught exceptions. - s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree, not just create_worktree. * fix: align s16-s19 teaching tool consistency * fix pr265 chapter diagrams * Add comprehensive s20 harness chapter * Fix chapter smoke test regressions * Clarify README tutorial track transition --------- Co-authored-by: Haoran <bill-billion@outlook.com>
This commit is contained in:
277
s11_error_recovery/README.en.md
Normal file
277
s11_error_recovery/README.en.md
Normal file
@@ -0,0 +1,277 @@
|
||||
# s11: Error Recovery — Errors aren't the end, they're the start of a retry
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s20
|
||||
> *"Errors aren't the end, they're the start of a retry"* — escalate tokens, compact context, switch models.
|
||||
>
|
||||
> **Harness layer**: Resilience — classify and recover when the main loop hits errors.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
The Agent is running along and then errors out:
|
||||
|
||||
```
|
||||
Error: 529 overloaded
|
||||
```
|
||||
|
||||
The Agent crashes. It doesn't retry, doesn't switch models, doesn't reduce context — it just crashes.
|
||||
|
||||
In production, API errors are the norm. The three most common failure modes: **truncated output** (the model runs out of tokens mid-sentence), **context overflow** (still too long even after compaction), and **transient failures** (429 rate limiting / 529 overload). An Agent that doesn't handle errors is like a car that stalls at the slightest touch.
|
||||
|
||||
---
|
||||
|
||||
## Solution
|
||||
|
||||

|
||||
|
||||
The loop and prompt assembly from s10 are fully preserved. The only change: the LLM call is wrapped in try/except, with different recovery paths based on error type. After recovery, `continue` loops back to the top to call the LLM again.
|
||||
|
||||
The three most common recovery patterns (the teaching version only handles 429/529; real systems also cover connection errors, timeouts, cloud vendor credential caches, etc. CC actually has 13+ reason codes; see the Deep Dive for the rest):
|
||||
|
||||
| Pattern | Trigger | Recovery Action |
|
||||
|----------|---------|-----------------|
|
||||
| Output truncated | `max_tokens` | Escalate 8K→64K / continuation prompt |
|
||||
| Context overflow | `prompt_too_long` | Reactive compact → retry |
|
||||
| Transient failure | 429 / 529 | Exponential backoff + jitter, fallback model on consecutive 529 |
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
### Path 1: Output Truncated
|
||||
|
||||
The model runs out of tokens mid-sentence — `max_tokens` is exhausted. The default 8000 tokens isn't enough for a complete response.
|
||||
|
||||
On the first occurrence, escalate `max_tokens` from 8K to 64K (8x the space) and retry the same request — the truncated output is NOT appended to messages, keeping the original request intact. If 64K is still not enough, save the truncated output and inject a continuation prompt telling the model to pick up where it left off, up to 3 times:
|
||||
|
||||
```python
|
||||
if response.stop_reason == "max_tokens":
|
||||
# First escalation: don't append truncated output, retry same request
|
||||
if not state.has_escalated:
|
||||
max_tokens = ESCALATED_MAX_TOKENS
|
||||
state.has_escalated = True
|
||||
continue # messages unchanged, same request with more tokens
|
||||
# 64K still truncated: save output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if state.recovery_count < MAX_RECOVERY_RETRIES:
|
||||
messages.append({"role": "user", "content":
|
||||
"Output token limit hit. Resume directly — "
|
||||
"no apology, no recap. Pick up mid-thought."})
|
||||
state.recovery_count += 1
|
||||
continue
|
||||
return # still truncated after 3 continuations
|
||||
# Normal: append after max_tokens check
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
```
|
||||
|
||||
Escalation gets one chance; continuation gets up to 3. After that, exit — further continuations won't produce meaningful output.
|
||||
|
||||
### Path 2: Context Overflow
|
||||
|
||||
The LLM says "your context is too long" (`prompt_too_long`). All four compaction layers from s08 have already run, and it's still over the limit.
|
||||
|
||||
Trigger reactive compact — more aggressive than auto compact. The teaching version keeps only the last 5 messages to simulate compaction; real CC generates a compact summary via LLM, then retries with the compacted message list. Retry after compacting. But if it's still over the limit after one compaction, the only option is to exit — compacting again won't make it any smaller:
|
||||
|
||||
```python
|
||||
except PromptTooLongError:
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return # Already compacted and still over limit — must exit
|
||||
```
|
||||
|
||||
### Path 3: Transient Failures
|
||||
|
||||
Network blips, 429 rate limiting, 529 overload — these aren't bugs, they're normal in distributed systems.
|
||||
|
||||
Both 429 and 529 use exponential backoff + jitter: wait 0.5 seconds on the first attempt, 1 second on the second, 2 seconds on the third, up to 10 retries. Random jitter prevents concurrent requests from all retrying at the same instant. Three consecutive 529 overload errors → switch to the fallback model (if `FALLBACK_MODEL_ID` environment variable is configured):
|
||||
|
||||
```python
|
||||
def retry_delay(attempt, retry_after=None):
|
||||
if retry_after:
|
||||
return retry_after
|
||||
base = min(500 * (2 ** attempt), 32000) / 1000
|
||||
return base + random.uniform(0, base * 0.25)
|
||||
|
||||
def with_retry(fn, state, max_retries=10):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
return fn()
|
||||
except (RateLimitError, OverloadedError):
|
||||
delay = retry_delay(attempt)
|
||||
time.sleep(delay)
|
||||
if is_overloaded:
|
||||
state.consecutive_529 += 1
|
||||
if state.consecutive_529 >= 3 and FALLBACK_MODEL:
|
||||
state.current_model = FALLBACK_MODEL
|
||||
raise MaxRetriesExceeded()
|
||||
```
|
||||
|
||||
Backoff formula: `min(500 × 2^attempt, 32000) + random(0~25%)`. If the server returns a `Retry-After` header, that value takes priority.
|
||||
|
||||
### Putting It All Together
|
||||
|
||||
```python
|
||||
def agent_loop(messages, context):
|
||||
system = get_system_prompt(context)
|
||||
state = RecoveryState()
|
||||
max_tokens = 8000
|
||||
|
||||
while True:
|
||||
try:
|
||||
response = with_retry(
|
||||
lambda: client.messages.create(
|
||||
model=state.current_model, system=system,
|
||||
messages=messages, tools=TOOLS,
|
||||
max_tokens=max_tokens),
|
||||
state)
|
||||
except Exception as e:
|
||||
if is_prompt_too_long_error(e):
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return
|
||||
log_error(e)
|
||||
return
|
||||
|
||||
# max_tokens check BEFORE appending to messages
|
||||
if response.stop_reason == "max_tokens":
|
||||
if not state.has_escalated:
|
||||
max_tokens = 64000
|
||||
state.has_escalated = True
|
||||
continue # retry same request, messages unchanged
|
||||
# save truncated output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
|
||||
continue
|
||||
# Normal completion
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
# ... tool execution ...
|
||||
```
|
||||
|
||||
The outer try/except catches API exceptions (prompt_too_long, etc.), `with_retry` handles transient errors (429/529), and `stop_reason` checks handle truncation. Three recovery mechanisms, each handling its own error type.
|
||||
|
||||
---
|
||||
|
||||
## Changes from s10
|
||||
|
||||
| Component | Before (s10) | After (s11) |
|
||||
|-----------|-------------|-------------|
|
||||
| Error handling | None (crashes on any error) | Three recovery patterns + exponential backoff |
|
||||
| New constants | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500, FALLBACK_MODEL |
|
||||
| New functions | — | with_retry, retry_delay, reactive_compact, is_prompt_too_long_error, RecoveryState |
|
||||
| Tools | bash, read_file, write_file (3) | bash, read_file, write_file (3) — unchanged |
|
||||
| Loop | Bare LLM call | Wrapped in try/except + continue retry |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s11_error_recovery/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. Ask the Agent to generate a very long piece of code, and observe whether it automatically continues after truncation (look for the `[max_tokens] escalating` log)
|
||||
2. Read many files consecutively to bloat the context, and observe reactive compact
|
||||
3. If you encounter 429/529, observe the exponential backoff log output
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
The Agent can now automatically recover from errors. But the tasks it handles are still one-shot — you give it a task, it finishes, it's done.
|
||||
|
||||
What if the Agent could manage a **task list** — with dependencies, persisted to disk, resumable across sessions? A TODO list is not a task system.
|
||||
|
||||
s12 Task System → Tasks form a dependency graph with state and persistence. This is the foundation for multi-Agent collaboration.
|
||||
|
||||
<details>
|
||||
<summary>Deep Dive into CC Source</summary>
|
||||
|
||||
> The following is based on CC source code: `query.ts` (1729 lines), `services/api/withRetry.ts` (822 lines), `query/tokenBudget.ts` (93 lines), and `utils/tokenBudget.ts` (73 lines).
|
||||
|
||||
### 1. A Dozen-Plus Reason/Transition Codes (Not Just 3)
|
||||
|
||||
The teaching version covers 3 of the most common recovery patterns. CC actually has a dozen-plus reason/transition codes, evaluated after every LLM call:
|
||||
|
||||
| Reason/Transition | Teaching Version | CC Behavior |
|
||||
|---|---|---|
|
||||
| `completed` | Normal completion | Return result |
|
||||
| `next_turn` | Normal tool call | Continue to next tool execution round |
|
||||
| `max_output_tokens_escalate` | Path 1 | 8K→64K escalation |
|
||||
| `max_output_tokens_recovery` | Path 1 continuation | Continuation prompt (up to 3 times) |
|
||||
| `reactive_compact_retry` | Path 2 | Reactive compact → retry |
|
||||
| `prompt_too_long` | Path 2 | Same as above |
|
||||
| `collapse_drain_retry` | Not covered | Context collapse — commit staged content first |
|
||||
| `model_error` | Not covered | Retry |
|
||||
| `image_error` | Not covered | `ImageSizeError` / `ImageResizeError` handled specifically |
|
||||
| `aborted_streaming` | Not covered | Streaming abort recovery |
|
||||
| `aborted_tools` | Not covered | Tool abort |
|
||||
| `stop_hook_blocking` | Not covered | Inject blocking error → model self-corrects |
|
||||
| `stop_hook_prevented` | Not covered | Hooks prevent execution |
|
||||
| `hook_stopped` | Not covered | Hook stopped execution |
|
||||
| `token_budget_continuation` | Not covered | Continue when token usage < 90% |
|
||||
| `blocking_limit` | Not covered | Blocking limit reached |
|
||||
| `max_turns` | Not covered | Maximum turns reached |
|
||||
|
||||
The teaching version only expands on the first 5 (most common); each of the rest has its own dedicated handling logic.
|
||||
|
||||
### 2. Precise Exponential Backoff Formula
|
||||
|
||||
CC's backoff delay (`withRetry.ts:530-548`):
|
||||
|
||||
```
|
||||
delay = min(500 × 2^(attempt-1), 32000) + random(0~25%)
|
||||
```
|
||||
|
||||
| Attempt | Base Delay | + Jitter |
|
||||
|---------|-----------|----------|
|
||||
| 1 | 500ms | 0-125ms |
|
||||
| 2 | 1000ms | 0-250ms |
|
||||
| 4 | 4000ms | 0-1000ms |
|
||||
| 7+ | 32000ms (cap) | 0-8000ms |
|
||||
|
||||
If the server returns a `Retry-After` header, that value takes priority.
|
||||
|
||||
### 3. Original CONTINUATION Prompt
|
||||
|
||||
CC's continuation prompt (`query.ts:1225-1227`):
|
||||
|
||||
```
|
||||
Output token limit hit. Resume directly — no apology, no recap of what
|
||||
you were doing. Pick up mid-thought if that is where the cut happened.
|
||||
Break remaining work into smaller pieces.
|
||||
```
|
||||
|
||||
Token budget nudge prompt (`tokenBudget.ts:72`):
|
||||
|
||||
```
|
||||
Stopped at {pct}% of token target. Keep working — do not summarize.
|
||||
```
|
||||
|
||||
### 4. Streaming Error Handling
|
||||
|
||||
In CC's streaming path, recoverable errors (413, max_tokens, media errors) are **withheld from display** during streaming (`query.ts:788-822`) — SDK consumers don't see them, only the recovery logic does. After streaming ends, the system determines whether recovery is needed.
|
||||
|
||||
### 5. 529 → Fallback Model Switch
|
||||
|
||||
After 3 consecutive 529 overload errors (`MAX_529_RETRIES = 3`), CC automatically switches to the fallback model (e.g., Opus → Sonnet). On switch, all pending messages and tool results are cleared, and the user sees "Switched to {model} due to high demand".
|
||||
|
||||
### 6. Diminishing Returns Detection
|
||||
|
||||
Token budget "continuations" aren't unlimited. When there are 3 consecutive continuations with a token increment < 500, the system determines "continuing won't produce meaningful output" and stops continuation (`tokenBudget.ts:60-62`).
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
277
s11_error_recovery/README.ja.md
Normal file
277
s11_error_recovery/README.ja.md
Normal file
@@ -0,0 +1,277 @@
|
||||
# s11: Error Recovery — エラーは終わりではなく、リトライの始まり
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s20
|
||||
> *"エラーは終わりではなく、リトライの始まり"* — トークン拡張、コンテキスト圧縮、モデル切り替え。
|
||||
>
|
||||
> **Harness 層**: 耐障害性 — メインループのエラーを分類し復旧。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
Agent が動いている途中でエラーが出た:
|
||||
|
||||
```
|
||||
Error: 529 overloaded
|
||||
```
|
||||
|
||||
Agent がクラッシュした。リトライもしない、モデルも切り替えない、コンテキストも減らさない——そのままクラッシュ。
|
||||
|
||||
本番環境では API エラーが日常茶飯事。最も一般的な 3 つの障害パターン:**出力の切り詰め**(モデルが途中まで出力して token が尽きた)、**コンテキスト超過**(圧縮後も長すぎる)、**一時的障害**(429 レート制限 / 529 過負荷)。エラーを処理しない Agent は、一度触れただけで止まる車のようなものだ。
|
||||
|
||||
---
|
||||
|
||||
## 解決策
|
||||
|
||||

|
||||
|
||||
s10 のループ、prompt 組み立てはすべてそのまま。唯一の変更点:LLM 呼び出しを try/except で包み、エラータイプに応じて異なる復旧パスに振り分ける。復旧後は `continue` でループ先頭に戻り、再度 LLM を呼び出す。
|
||||
|
||||
最も一般的な 3 つの復旧パターン(教学版は 429/529 のみ対応;実際のシステムは接続エラー、タイムアウト、クラウドベンダーの認証キャッシュ等もカバー。CC には実際 13 以上の reason code があるが、残りは Deep dive で解説):
|
||||
|
||||
| パターン | トリガー | 復旧アクション |
|
||||
|----------|----------|---------------|
|
||||
| 出力切り詰め | `max_tokens` | 8K→64K に拡張 / 続きのプロンプト注入 |
|
||||
| コンテキスト超過 | `prompt_too_long` | reactive compact → リトライ |
|
||||
| 一時的障害 | 429 / 529 | 指数バックオフ + ジッター、連続 529 でフォールバックモデルに切り替え可能 |
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||
### パス 1: 出力が切り詰められた
|
||||
|
||||
モデルが途中まで出力して、`max_tokens` に達した。デフォルトの 8000 token では完全な回答を出力しきれない。
|
||||
|
||||
初回発生時、`max_tokens` を 8K から 64K に拡張(8 倍の空間)し、同じリクエストをリトライする——この時、切り詰められた出力は messages に追加せず、元のリクエストをそのまま維持する。64K でも足りない場合にのみ、切り詰められた出力を保存し、続きのプロンプトを注入してモデルに先ほどの続きを出力させる。最大 3 回まで:
|
||||
|
||||
```python
|
||||
if response.stop_reason == "max_tokens":
|
||||
# First escalation: don't append truncated output, retry same request
|
||||
if not state.has_escalated:
|
||||
max_tokens = ESCALATED_MAX_TOKENS
|
||||
state.has_escalated = True
|
||||
continue # messages unchanged, same request with more tokens
|
||||
# 64K still truncated: save output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if state.recovery_count < MAX_RECOVERY_RETRIES:
|
||||
messages.append({"role": "user", "content":
|
||||
"Output token limit hit. Resume directly — "
|
||||
"no apology, no recap. Pick up mid-thought."})
|
||||
state.recovery_count += 1
|
||||
continue
|
||||
return # still truncated after 3 continuations
|
||||
# Normal: append after max_tokens check
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
```
|
||||
|
||||
拡張は 1 回だけ、続きの出力は最大 3 回。超過したら終了——これ以上続けても実質的な出力は得られない。
|
||||
|
||||
### パス 2: コンテキスト超過
|
||||
|
||||
LLM が「コンテキストが長すぎる」と返す(`prompt_too_long`)。s08 の 4 層圧縮をすべて実行したのに、まだ超えている。
|
||||
|
||||
reactive compact をトリガー——auto compact よりも積極的。教学版は最後の 5 メッセージだけを残して圧縮をシミュレート;実際の CC は LLM で compact サマリを生成してからリトライする。圧縮後にリトライ。ただし、一度圧縮してもまだ超過している場合は終了するしかない——再度圧縮しても小さくはならない:
|
||||
|
||||
```python
|
||||
except PromptTooLongError:
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return # 圧縮済みでも超過、終了するしかない
|
||||
```
|
||||
|
||||
### パス 3: 一時的障害
|
||||
|
||||
ネットワークの揺らぎ、429 レート制限、529 過負荷——これらはバグではなく、分散システムの日常だ。
|
||||
|
||||
429 と 529 は統一して指数バックオフ + ジッターを使用:1 回目は 0.5 秒待機、2 回目は 1 秒、3 回目は 2 秒、最大 10 回。ランダムジッターを加えることで、並行リクエストが同時にリトライするのを防ぐ。3 回連続で 529 過負荷 → フォールバックモデルに切り替え(`FALLBACK_MODEL_ID` 環境変数が設定されている場合):
|
||||
|
||||
```python
|
||||
def retry_delay(attempt, retry_after=None):
|
||||
if retry_after:
|
||||
return retry_after
|
||||
base = min(500 * (2 ** attempt), 32000) / 1000
|
||||
return base + random.uniform(0, base * 0.25)
|
||||
|
||||
def with_retry(fn, state, max_retries=10):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
return fn()
|
||||
except (RateLimitError, OverloadedError):
|
||||
delay = retry_delay(attempt)
|
||||
time.sleep(delay)
|
||||
if is_overloaded:
|
||||
state.consecutive_529 += 1
|
||||
if state.consecutive_529 >= 3 and FALLBACK_MODEL:
|
||||
state.current_model = FALLBACK_MODEL
|
||||
raise MaxRetriesExceeded()
|
||||
```
|
||||
|
||||
バックオフの公式:`min(500 × 2^attempt, 32000) + random(0~25%)`。サーバーが `Retry-After` ヘッダーを返した場合、その値を優先して使用する。
|
||||
|
||||
### 統合して実行
|
||||
|
||||
```python
|
||||
def agent_loop(messages, context):
|
||||
system = get_system_prompt(context)
|
||||
state = RecoveryState()
|
||||
max_tokens = 8000
|
||||
|
||||
while True:
|
||||
try:
|
||||
response = with_retry(
|
||||
lambda: client.messages.create(
|
||||
model=state.current_model, system=system,
|
||||
messages=messages, tools=TOOLS,
|
||||
max_tokens=max_tokens),
|
||||
state)
|
||||
except Exception as e:
|
||||
if is_prompt_too_long_error(e):
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return
|
||||
log_error(e)
|
||||
return
|
||||
|
||||
# max_tokens check BEFORE appending to messages
|
||||
if response.stop_reason == "max_tokens":
|
||||
if not state.has_escalated:
|
||||
max_tokens = 64000
|
||||
state.has_escalated = True
|
||||
continue # retry same request, messages unchanged
|
||||
# save truncated output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
|
||||
continue
|
||||
# Normal completion
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
# ... tool execution ...
|
||||
```
|
||||
|
||||
外側の try/except が API 例外(prompt_too_long 等)を捕捉し、`with_retry` が一時的エラー(429/529)を処理し、`stop_reason` のチェックが切り詰めを処理する。3 つの復旧メカニズムがそれぞれ異なるエラータイプを担当する。
|
||||
|
||||
---
|
||||
|
||||
## s10 からの変更点
|
||||
|
||||
| コンポーネント | 変更前 (s10) | 変更後 (s11) |
|
||||
|---------------|-------------|-------------|
|
||||
| エラー処理 | なし(エラーで即クラッシュ) | 3 つの復旧パターン + 指数バックオフ |
|
||||
| 新規定数 | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500, FALLBACK_MODEL |
|
||||
| 新規関数 | — | with_retry, retry_delay, reactive_compact, is_prompt_too_long_error, RecoveryState |
|
||||
| ツール | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 変更なし |
|
||||
| ループ | LLM を直接呼び出し | try/except で包み + continue でリトライ |
|
||||
|
||||
---
|
||||
|
||||
## 試してみる
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s11_error_recovery/code.py
|
||||
```
|
||||
|
||||
以下の prompt を試してみよう:
|
||||
|
||||
1. Agent に長いコードを生成させ、切り詰め後に自動で続きが出力されるか観察する(`[max_tokens] escalating` ログを確認)
|
||||
2. 連続して大量のファイルを読み込みコンテキストを肥大化させ、reactive compact の動作を観察する
|
||||
3. 429/529 が発生した場合、指数バックオフのログ出力を観察する
|
||||
|
||||
---
|
||||
|
||||
## 次のステップ
|
||||
|
||||
Agent はエラーから自動的に復旧できるようになった。しかし、まだ処理するタスクは「使い捨て」だ——タスクを与えると実行し、終わる。
|
||||
|
||||
Agent に**タスクリスト**を管理させられないだろうか——依存関係があり、ディスクに永続化され、セッションをまたいで復旧できる?TODO リストはタスクシステムではない。
|
||||
|
||||
s12 Task System → タスクとは依存関係があり、状態があり、永続化されたグラフだ。これはマルチ Agent 協調の基盤となる。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコード深掘り</summary>
|
||||
|
||||
> 以下は CC ソースコード `query.ts`(1729 行)、`services/api/withRetry.ts`(822 行)、`query/tokenBudget.ts`(93 行)、`utils/tokenBudget.ts`(73 行)の分析に基づく。
|
||||
|
||||
### 一、十数種の reason/transition(3 つだけではない)
|
||||
|
||||
教学版では最も一般的な 3 つの復旧パターンを解説した。CC には実際十数種の reason/transition があり、毎回の LLM 呼び出し後に判定される:
|
||||
|
||||
| reason/transition | 教学版の対応 | CC の動作 |
|
||||
|---|---|---|
|
||||
| `completed` | 正常終了 | 結果を返す |
|
||||
| `next_turn` | 通常のツール呼び出し | 次のツール実行ラウンドへ |
|
||||
| `max_output_tokens_escalate` | パス 1 | 8K→64K に拡張 |
|
||||
| `max_output_tokens_recovery` | パス 1 続き出力 | 続きのプロンプト注入(最大 3 回) |
|
||||
| `reactive_compact_retry` | パス 2 | reactive compact → リトライ |
|
||||
| `prompt_too_long` | パス 2 | 同上 |
|
||||
| `collapse_drain_retry` | 未展開 | context collapse 時にまず保留中の内容をコミット |
|
||||
| `model_error` | 未展開 | リトライ |
|
||||
| `image_error` | 未展開 | `ImageSizeError` / `ImageResizeError` の専用処理 |
|
||||
| `aborted_streaming` | 未展開 | ストリーミング中断の復旧 |
|
||||
| `aborted_tools` | 未展開 | ツール中断 |
|
||||
| `stop_hook_blocking` | 未展開 | blocking error を注入 → モデルが自己修正 |
|
||||
| `stop_hook_prevented` | 未展開 | hooks によるブロック |
|
||||
| `hook_stopped` | 未展開 | hook による実行停止 |
|
||||
| `token_budget_continuation` | 未展開 | token 使用量 < 90% の時に継続 |
|
||||
| `blocking_limit` | 未展開 | ブロック制限 |
|
||||
| `max_turns` | 未展開 | 最大ターン数に到達 |
|
||||
|
||||
教学版では最初の 5 つ(最も一般的なもの)だけを展開した。残りはそれぞれ専用の処理ロジックを持つ。
|
||||
|
||||
### 二、指数バックオフの正確な公式
|
||||
|
||||
CC のバックオフ遅延(`withRetry.ts:530-548`):
|
||||
|
||||
```
|
||||
delay = min(500 × 2^(attempt-1), 32000) + random(0~25%)
|
||||
```
|
||||
|
||||
| 試行 | 基本遅延 | + ジッター |
|
||||
|------|---------|-----------|
|
||||
| 1 | 500ms | 0-125ms |
|
||||
| 2 | 1000ms | 0-250ms |
|
||||
| 4 | 4000ms | 0-1000ms |
|
||||
| 7+ | 32000ms(上限) | 0-8000ms |
|
||||
|
||||
サーバーが `Retry-After` ヘッダーを返した場合、その値を優先して使用する。
|
||||
|
||||
### 三、CONTINUATION プロンプト原文
|
||||
|
||||
CC の続き出力プロンプト(`query.ts:1225-1227`):
|
||||
|
||||
```
|
||||
Output token limit hit. Resume directly — no apology, no recap of what
|
||||
you were doing. Pick up mid-thought if that is where the cut happened.
|
||||
Break remaining work into smaller pieces.
|
||||
```
|
||||
|
||||
Token budget のナッジプロンプト(`tokenBudget.ts:72`):
|
||||
|
||||
```
|
||||
Stopped at {pct}% of token target. Keep working — do not summarize.
|
||||
```
|
||||
|
||||
### 四、ストリーミングエラー処理
|
||||
|
||||
CC のストリーミングパスでは、復旧可能なエラー(413、max_tokens、media error)はストリーミング中**表示を保留される**(`query.ts:788-822`)——SDK コンシューマーには見えず、復旧ロジックだけが認識できる。ストリーミング終了後に復旧が必要かどうかを判断する。
|
||||
|
||||
### 五、529 → フォールバックモデル切り替え
|
||||
|
||||
3 回連続で 529 過負荷エラーが発生した後(`MAX_529_RETRIES = 3`)、CC は自動的にフォールバックモデルに切り替える(例:Opus → Sonnet)。切り替え時にすべての保留中のメッセージと tool 結果をクリアし、ユーザーに "Switched to {model} due to high demand" と表示する。
|
||||
|
||||
### 六、収穫逓減の検出
|
||||
|
||||
Token budget の「継続」は無限ではない。連続 3 回の continuation で token 増分が 500 未満の場合、システムは「続けても実質的な出力は得られない」と判断し、continuation を停止する(`tokenBudget.ts:60-62`)。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
277
s11_error_recovery/README.md
Normal file
277
s11_error_recovery/README.md
Normal file
@@ -0,0 +1,277 @@
|
||||
# s11: Error Recovery — 错误不是结束,是重试的开始
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s20
|
||||
> *"错误不是终点, 是重试的起点"* — 升级 token、压缩上下文、切换模型。
|
||||
>
|
||||
> **Harness 层**: 韧性 — 主循环遇到错误时分类并恢复。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
Agent 跑着跑着报错了:
|
||||
|
||||
```
|
||||
Error: 529 overloaded
|
||||
```
|
||||
|
||||
Agent 崩溃了。它没有重试,没有换模型,没有减少上下文——直接崩溃。
|
||||
|
||||
生产环境中 API 错误是常态。三种最常见的故障模式:**输出被截断**(模型话说一半 token 用完了)、**上下文超限**(压缩后还是太长)、**临时故障**(429 限流 / 529 过载)。一个不处理错误的 Agent 就像一个一碰就熄火的车。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
s10 的循环、prompt 组装全部保留。唯一的变动:LLM 调用包裹在 try/except 里,根据错误类型走不同的恢复路径。恢复后 `continue` 回到循环开头重新调用 LLM。
|
||||
|
||||
三种最常见的恢复模式(教学版只处理 429/529;真实系统还覆盖连接错误、超时、云厂商认证缓存等。CC 实际有 13+ reason code,其余见 Deep dive):
|
||||
|
||||
| 模式 | 触发 | 恢复动作 |
|
||||
|------|------|---------|
|
||||
| 输出截断 | `max_tokens` | 升级 8K→64K / 续写提示 |
|
||||
| 上下文超限 | `prompt_too_long` | reactive compact → 重试 |
|
||||
| 临时故障 | 429 / 529 | 指数退避 + 抖动,连续 529 可切换备用模型 |
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||
### 路径 1: 输出被截断
|
||||
|
||||
模型话说一半,`max_tokens` 用完了。默认 8000 token 不够它输出完整回答。
|
||||
|
||||
第一次发生时,直接把 `max_tokens` 从 8K 升级到 64K(8 倍空间),重试同一请求——此时不追加截断输出到 messages,保持原始请求不变。如果 64K 还是不够,才保存截断输出并注入续写提示让模型接着刚才的话继续说,最多 3 次:
|
||||
|
||||
```python
|
||||
if response.stop_reason == "max_tokens":
|
||||
# First escalation: don't append truncated output, retry same request
|
||||
if not state.has_escalated:
|
||||
max_tokens = ESCALATED_MAX_TOKENS
|
||||
state.has_escalated = True
|
||||
continue # messages unchanged, same request with more tokens
|
||||
# 64K still truncated: save output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if state.recovery_count < MAX_RECOVERY_RETRIES:
|
||||
messages.append({"role": "user", "content":
|
||||
"Output token limit hit. Resume directly — "
|
||||
"no apology, no recap. Pick up mid-thought."})
|
||||
state.recovery_count += 1
|
||||
continue
|
||||
return # still truncated after 3 continuations
|
||||
# Normal: append after max_tokens check
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
```
|
||||
|
||||
升级只有一次机会,续写最多 3 次。超过就退出——继续续写也不会有实质产出。
|
||||
|
||||
### 路径 2: 上下文超限
|
||||
|
||||
LLM 说"你的上下文太长了"(`prompt_too_long`)。s08 的四层压缩全跑过了,还是超。
|
||||
|
||||
触发 reactive compact——比 auto compact 更激进。教学版只保留最后 5 条消息模拟压缩效果;真实实现会调用 LLM 生成 compact 摘要再重试。压缩后重试。但如果压缩过一次还是超限,只能退出——再压缩也不会变小:
|
||||
|
||||
```python
|
||||
except PromptTooLongError:
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return # 压缩过了还是超限,只能退出
|
||||
```
|
||||
|
||||
### 路径 3: 临时故障
|
||||
|
||||
网络抖动、429 限流、529 过载——这些不是 bug,是分布式系统的常态。
|
||||
|
||||
429 和 529 统一走指数退避 + 抖动:第一次等 0.5 秒,第二次等 1 秒,第三次等 2 秒,最多 10 次。加随机抖动让并发请求不在同一时刻重试。连续 3 次 529 过载 → 切换到备用模型(若配置了 `FALLBACK_MODEL_ID` 环境变量):
|
||||
|
||||
```python
|
||||
def retry_delay(attempt, retry_after=None):
|
||||
if retry_after:
|
||||
return retry_after
|
||||
base = min(500 * (2 ** attempt), 32000) / 1000
|
||||
return base + random.uniform(0, base * 0.25)
|
||||
|
||||
def with_retry(fn, state, max_retries=10):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
return fn()
|
||||
except (RateLimitError, OverloadedError):
|
||||
delay = retry_delay(attempt)
|
||||
time.sleep(delay)
|
||||
if is_overloaded:
|
||||
state.consecutive_529 += 1
|
||||
if state.consecutive_529 >= 3 and FALLBACK_MODEL:
|
||||
state.current_model = FALLBACK_MODEL
|
||||
raise MaxRetriesExceeded()
|
||||
```
|
||||
|
||||
退避公式:`min(500 × 2^attempt, 32000) + random(0~25%)`。如果服务器返回 `Retry-After` header,优先用那个值。
|
||||
|
||||
### 合起来跑
|
||||
|
||||
```python
|
||||
def agent_loop(messages, context):
|
||||
system = get_system_prompt(context)
|
||||
state = RecoveryState()
|
||||
max_tokens = 8000
|
||||
|
||||
while True:
|
||||
try:
|
||||
response = with_retry(
|
||||
lambda: client.messages.create(
|
||||
model=state.current_model, system=system,
|
||||
messages=messages, tools=TOOLS,
|
||||
max_tokens=max_tokens),
|
||||
state)
|
||||
except Exception as e:
|
||||
if is_prompt_too_long_error(e):
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return
|
||||
log_error(e)
|
||||
return
|
||||
|
||||
# max_tokens check BEFORE appending to messages
|
||||
if response.stop_reason == "max_tokens":
|
||||
if not state.has_escalated:
|
||||
max_tokens = 64000
|
||||
state.has_escalated = True
|
||||
continue # retry same request, messages unchanged
|
||||
# save truncated output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
|
||||
continue
|
||||
# Normal completion
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
# ... tool execution ...
|
||||
```
|
||||
|
||||
外层 try/except 捕获 API 异常(prompt_too_long 等),`with_retry` 处理瞬态错误(429/529),`stop_reason` 检查处理截断。三种恢复机制各管各的错误类型。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s10 的变更
|
||||
|
||||
| 组件 | 之前 (s10) | 之后 (s11) |
|
||||
|------|-----------|-----------|
|
||||
| 错误处理 | 无(一碰就崩溃) | 三种恢复模式 + 指数退避 |
|
||||
| 新常量 | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500, FALLBACK_MODEL |
|
||||
| 新函数 | — | with_retry, retry_delay, reactive_compact, is_prompt_too_long_error, RecoveryState |
|
||||
| 工具 | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 不变 |
|
||||
| 循环 | 裸调用 LLM | try/except 包裹 + continue 重试 |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s11_error_recovery/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. 让 Agent 生成一段很长的代码,观察截断后是否自动续写(看 `[max_tokens] escalating` 日志)
|
||||
2. 连续读取大量文件撑大上下文,观察 reactive compact
|
||||
3. 如果遇到 429/529,观察指数退避的日志输出
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
Agent 现在能在错误中自动恢复了。但它处理的任务仍然是"一次性"的——你给它一个任务,它做完,结束。
|
||||
|
||||
能不能让 Agent 管理一个**任务列表**——有依赖关系、持久化到磁盘、跨会话能恢复?TODO 列表不是任务系统。
|
||||
|
||||
s12 Task System → 任务是有依赖、有状态、持久化的图。这是多 Agent 协作的基础。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `query.ts`(1729 行)、`services/api/withRetry.ts`(822 行)、`query/tokenBudget.ts`(93 行)、`utils/tokenBudget.ts`(73 行)的分析。
|
||||
|
||||
### 一、十几种 reason/transition(不只是 3 条)
|
||||
|
||||
教学版讲了 3 种最常见的恢复模式。CC 实际有十几种 reason/transition,每轮 LLM 调用后都会判断:
|
||||
|
||||
| reason/transition | 教学版对应 | CC 行为 |
|
||||
|---|---|---|
|
||||
| `completed` | 正常完成 | 返回结果 |
|
||||
| `next_turn` | 正常工具调用 | 继续下一轮工具执行 |
|
||||
| `max_output_tokens_escalate` | 路径 1 | 8K→64K 升级 |
|
||||
| `max_output_tokens_recovery` | 路径 1 续写 | 续写提示(最多 3 次) |
|
||||
| `reactive_compact_retry` | 路径 2 | reactive compact → 重试 |
|
||||
| `prompt_too_long` | 路径 2 | 同上 |
|
||||
| `collapse_drain_retry` | 未展开 | context collapse 先提交暂存 |
|
||||
| `model_error` | 未展开 | 重试 |
|
||||
| `image_error` | 未展开 | `ImageSizeError` / `ImageResizeError` 专门处理 |
|
||||
| `aborted_streaming` | 未展开 | 流式中止恢复 |
|
||||
| `aborted_tools` | 未展开 | 工具中止 |
|
||||
| `stop_hook_blocking` | 未展开 | 注入 blocking error → 模型自纠 |
|
||||
| `stop_hook_prevented` | 未展开 | hooks 阻止 |
|
||||
| `hook_stopped` | 未展开 | hook 停止执行 |
|
||||
| `token_budget_continuation` | 未展开 | token 用量 < 90% 时继续 |
|
||||
| `blocking_limit` | 未展开 | 阻塞限制 |
|
||||
| `max_turns` | 未展开 | 达到最大轮次 |
|
||||
|
||||
教学版只展开了前 5 种(最常见的),其余各有专门处理逻辑。
|
||||
|
||||
### 二、指数退避的精确公式
|
||||
|
||||
CC 的退避延迟(`withRetry.ts:530-548`):
|
||||
|
||||
```
|
||||
delay = min(500 × 2^(attempt-1), 32000) + random(0~25%)
|
||||
```
|
||||
|
||||
| 尝试 | 基础延迟 | + 抖动 |
|
||||
|------|---------|--------|
|
||||
| 1 | 500ms | 0-125ms |
|
||||
| 2 | 1000ms | 0-250ms |
|
||||
| 4 | 4000ms | 0-1000ms |
|
||||
| 7+ | 32000ms(上限) | 0-8000ms |
|
||||
|
||||
如果服务器返回 `Retry-After` header,优先用那个值。
|
||||
|
||||
### 三、CONTINUATION 提示原文
|
||||
|
||||
CC 的续写提示(`query.ts:1225-1227`):
|
||||
|
||||
```
|
||||
Output token limit hit. Resume directly — no apology, no recap of what
|
||||
you were doing. Pick up mid-thought if that is where the cut happened.
|
||||
Break remaining work into smaller pieces.
|
||||
```
|
||||
|
||||
Token budget 的 nudge 提示(`tokenBudget.ts:72`):
|
||||
|
||||
```
|
||||
Stopped at {pct}% of token target. Keep working — do not summarize.
|
||||
```
|
||||
|
||||
### 四、流式错误处理
|
||||
|
||||
CC 的流式路径中,可恢复的错误(413、max_tokens、media error)在 streaming 期间**被暂扣不展示**(`query.ts:788-822`)——SDK 消费者看不到,只有恢复逻辑能看到。等 streaming 结束后才判断是否需要恢复。
|
||||
|
||||
### 五、529 → Fallback Model 切换
|
||||
|
||||
连续 3 次 529 过载错误后(`MAX_529_RETRIES = 3`),CC 自动切换到 fallback model(如 Opus → Sonnet)。切换时清除所有 pending 消息和 tool 结果,给用户展示 "Switched to {model} due to high demand"。
|
||||
|
||||
### 六、Diminishing Returns 检测
|
||||
|
||||
Token budget 的"继续"不是无限的。当连续 3 次 continuation 且 token 增量 < 500 时,系统判断"继续也没有实质性产出",停止 continuation(`tokenBudget.ts:60-62`)。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
361
s11_error_recovery/code.py
Normal file
361
s11_error_recovery/code.py
Normal file
@@ -0,0 +1,361 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s11: Error Recovery — three recovery paths + exponential backoff.
|
||||
|
||||
Run: python s11_error_recovery/code.py
|
||||
Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY
|
||||
|
||||
Changes from s10:
|
||||
- LLM call wrapped in try/except with three recovery paths
|
||||
- Path 1: max_tokens -> escalate 8K->64K (no append on first escalation),
|
||||
then continuation prompt (max 3)
|
||||
- Path 2: prompt_too_long -> reactive compact -> retry (once)
|
||||
- Path 3: 429/529 -> exponential backoff with jitter (max 10),
|
||||
fallback model on consecutive 529
|
||||
- with_retry wrapper for transient errors
|
||||
- RecoveryState tracks escalation / compact / 529 / model
|
||||
|
||||
ASCII flow:
|
||||
messages -> prompt assembly -> compress+load -> [try] LLM [except] -> tools -> loop
|
||||
| |
|
||||
stop_reason error type
|
||||
max_tokens? prompt_too_long? -> compact
|
||||
escalate / 429/529? -> backoff
|
||||
continue other? -> log + exit
|
||||
"""
|
||||
|
||||
import os, subprocess, time, random, json
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
MEMORY_DIR = WORKDIR / ".memory"
|
||||
MEMORY_INDEX = MEMORY_DIR / "MEMORY.md"
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
PRIMARY_MODEL = os.environ["MODEL_ID"]
|
||||
FALLBACK_MODEL = os.getenv("FALLBACK_MODEL_ID")
|
||||
|
||||
# ── Constants ──
|
||||
|
||||
ESCALATED_MAX_TOKENS = 64000
|
||||
DEFAULT_MAX_TOKENS = 8000
|
||||
MAX_RECOVERY_RETRIES = 3
|
||||
MAX_RETRIES = 10
|
||||
BASE_DELAY_MS = 500
|
||||
MAX_CONSECUTIVE_529 = 3
|
||||
CONTINUATION_PROMPT = (
|
||||
"Output token limit hit. Resume directly — "
|
||||
"no apology, no recap. Pick up mid-thought."
|
||||
)
|
||||
|
||||
# ── Prompt Assembly (from s10, synced) ──
|
||||
|
||||
PROMPT_SECTIONS = {
|
||||
"identity": "You are a coding agent. Act, don't explain.",
|
||||
"tools": "Available tools: bash, read_file, write_file.",
|
||||
"workspace": f"Working directory: {WORKDIR}",
|
||||
"memory": "Relevant memories are injected below when available.",
|
||||
}
|
||||
|
||||
|
||||
def assemble_system_prompt(context: dict) -> str:
|
||||
sections = [PROMPT_SECTIONS["identity"],
|
||||
PROMPT_SECTIONS["tools"],
|
||||
PROMPT_SECTIONS["workspace"]]
|
||||
memories = context.get("memories", "")
|
||||
if memories:
|
||||
sections.append(f"Relevant memories:\n{memories}")
|
||||
return "\n\n".join(sections)
|
||||
|
||||
|
||||
_last_context_key, _last_prompt = None, None
|
||||
|
||||
|
||||
def get_system_prompt(context: dict) -> str:
|
||||
global _last_context_key, _last_prompt
|
||||
key = json.dumps(context, sort_keys=True, ensure_ascii=False, default=str)
|
||||
if key == _last_context_key and _last_prompt:
|
||||
print(" \033[90m[cache hit] system prompt unchanged\033[0m")
|
||||
return _last_prompt
|
||||
_last_context_key = key
|
||||
_last_prompt = assemble_system_prompt(context)
|
||||
|
||||
loaded = ["identity", "tools", "workspace"]
|
||||
if context.get("memories"):
|
||||
loaded.append("memory")
|
||||
print(f" \033[32m[assembled] sections: {', '.join(loaded)}\033[0m")
|
||||
return _last_prompt
|
||||
|
||||
|
||||
# ── Tools (unchanged) ──
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object",
|
||||
"properties": {"command": {"type": "string"}},
|
||||
"required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object",
|
||||
"properties": {"path": {"type": "string"},
|
||||
"limit": {"type": "integer"}},
|
||||
"required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object",
|
||||
"properties": {"path": {"type": "string"},
|
||||
"content": {"type": "string"}},
|
||||
"required": ["path", "content"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write}
|
||||
|
||||
|
||||
# ── Error Recovery (s11 new) ──
|
||||
|
||||
class RecoveryState:
|
||||
"""Track recovery attempts across the loop."""
|
||||
def __init__(self):
|
||||
self.has_escalated = False
|
||||
self.recovery_count = 0
|
||||
self.consecutive_529 = 0
|
||||
self.has_attempted_reactive_compact = False
|
||||
self.current_model = PRIMARY_MODEL
|
||||
|
||||
|
||||
def retry_delay(attempt, retry_after=None):
|
||||
"""Exponential backoff with jitter. Retry-After takes priority."""
|
||||
if retry_after:
|
||||
return retry_after
|
||||
base = min(BASE_DELAY_MS * (2 ** attempt), 32000) / 1000
|
||||
jitter = random.uniform(0, base * 0.25)
|
||||
return base + jitter
|
||||
|
||||
|
||||
def with_retry(fn, state: RecoveryState):
|
||||
"""Exponential backoff for transient errors (429/529).
|
||||
Non-transient errors are re-raised for the outer handler."""
|
||||
for attempt in range(MAX_RETRIES):
|
||||
try:
|
||||
result = fn()
|
||||
state.consecutive_529 = 0
|
||||
return result
|
||||
except Exception as e:
|
||||
name = type(e).__name__
|
||||
msg = str(e).lower()
|
||||
|
||||
# 429 rate limit -> exponential backoff
|
||||
if "ratelimit" in name.lower() or "429" in msg:
|
||||
delay = retry_delay(attempt)
|
||||
print(f" \033[33m[429 rate limit] retry {attempt+1}/{MAX_RETRIES},"
|
||||
f" wait {delay:.1f}s\033[0m")
|
||||
time.sleep(delay)
|
||||
continue
|
||||
|
||||
# 529 overloaded -> exponential backoff + fallback model
|
||||
if "overloaded" in name.lower() or "529" in msg or "overloaded" in msg:
|
||||
state.consecutive_529 += 1
|
||||
if state.consecutive_529 >= MAX_CONSECUTIVE_529:
|
||||
if FALLBACK_MODEL:
|
||||
state.current_model = FALLBACK_MODEL
|
||||
state.consecutive_529 = 0
|
||||
print(f" \033[31m[529 x{MAX_CONSECUTIVE_529}]"
|
||||
f" switching to {FALLBACK_MODEL}\033[0m")
|
||||
else:
|
||||
state.consecutive_529 = 0
|
||||
print(f" \033[31m[529 x{MAX_CONSECUTIVE_529}]"
|
||||
f" no FALLBACK_MODEL_ID configured, continuing retry\033[0m")
|
||||
delay = retry_delay(attempt)
|
||||
print(f" \033[33m[529 overloaded] retry {attempt+1}/{MAX_RETRIES},"
|
||||
f" wait {delay:.1f}s\033[0m")
|
||||
time.sleep(delay)
|
||||
continue
|
||||
|
||||
# Not transient -> re-raise for outer try/except
|
||||
raise
|
||||
raise RuntimeError(f"Max retries ({MAX_RETRIES}) exceeded")
|
||||
|
||||
|
||||
def is_prompt_too_long_error(e: Exception) -> bool:
|
||||
"""Check whether an API error indicates prompt/context too long."""
|
||||
msg = str(e).lower()
|
||||
return (("prompt" in msg and "long" in msg)
|
||||
or "prompt_is_too_long" in msg
|
||||
or "context_length_exceeded" in msg
|
||||
or "max_context_window" in msg)
|
||||
|
||||
|
||||
def reactive_compact(messages: list) -> list:
|
||||
"""Emergency compact — teaching version keeps last N messages.
|
||||
Real CC generates a compact summary via LLM, then retries with
|
||||
the compacted message list. Teaching version simplifies to tail
|
||||
retention since s08/s09 already cover LLM-based compact."""
|
||||
print(" \033[31m[reactive compact] trimming to last 5 messages\033[0m")
|
||||
tail = messages[-5:]
|
||||
return [{"role": "user",
|
||||
"content": "[Reactive compact] Earlier conversation trimmed. "
|
||||
"Continue from where you left off."}, *tail]
|
||||
|
||||
|
||||
# ── Context ──
|
||||
|
||||
def update_context(context: dict, messages: list) -> dict:
|
||||
"""Derive context from real state: which tools exist, whether memory files exist."""
|
||||
memories = ""
|
||||
if MEMORY_INDEX.exists():
|
||||
content = MEMORY_INDEX.read_text().strip()
|
||||
if content:
|
||||
memories = content
|
||||
return {
|
||||
"enabled_tools": list(TOOL_HANDLERS.keys()),
|
||||
"workspace": str(WORKDIR),
|
||||
"memories": memories,
|
||||
}
|
||||
|
||||
|
||||
# ── Agent Loop ──
|
||||
|
||||
def agent_loop(messages: list, context: dict):
|
||||
"""Main loop with error recovery wrapping LLM calls."""
|
||||
system = get_system_prompt(context)
|
||||
state = RecoveryState()
|
||||
max_tokens = DEFAULT_MAX_TOKENS
|
||||
|
||||
while True:
|
||||
# ── LLM call: with_retry handles 429/529, outer handles rest ──
|
||||
try:
|
||||
response = with_retry(
|
||||
lambda mt=max_tokens, mdl=state.current_model:
|
||||
client.messages.create(
|
||||
model=mdl, system=system, messages=messages,
|
||||
tools=TOOLS, max_tokens=mt),
|
||||
state)
|
||||
except Exception as e:
|
||||
# Path 2: prompt_too_long -> reactive compact (once)
|
||||
if is_prompt_too_long_error(e):
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
print(" \033[31m[unrecoverable] still too long after compact\033[0m")
|
||||
messages.append({"role": "assistant", "content": [
|
||||
{"type": "text",
|
||||
"text": "[Error] Context too large, cannot continue."}]})
|
||||
return
|
||||
|
||||
# Unrecoverable
|
||||
name = type(e).__name__
|
||||
print(f" \033[31m[unrecoverable] {name}: {str(e)[:100]}\033[0m")
|
||||
messages.append({"role": "assistant", "content": [
|
||||
{"type": "text", "text": f"[Error] {name}: {str(e)[:200]}"}]})
|
||||
return
|
||||
|
||||
# ── Path 1: max_tokens -> escalate or continue ──
|
||||
if response.stop_reason == "max_tokens":
|
||||
# First escalation: don't append truncated output, retry same request
|
||||
if not state.has_escalated:
|
||||
max_tokens = ESCALATED_MAX_TOKENS
|
||||
state.has_escalated = True
|
||||
print(f" \033[33m[max_tokens] escalating"
|
||||
f" {DEFAULT_MAX_TOKENS} -> {ESCALATED_MAX_TOKENS}\033[0m")
|
||||
continue
|
||||
# 64K still truncated: save truncated output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if state.recovery_count < MAX_RECOVERY_RETRIES:
|
||||
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
|
||||
state.recovery_count += 1
|
||||
print(f" \033[33m[max_tokens] continuation"
|
||||
f" {state.recovery_count}/{MAX_RECOVERY_RETRIES}\033[0m")
|
||||
continue
|
||||
print(" \033[31m[max_tokens] recovery limit reached\033[0m")
|
||||
return
|
||||
|
||||
# Normal completion: append assistant response
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
# ── Tool execution ──
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
print(f"\033[36m> {block.name}\033[0m")
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
print(str(output)[:200])
|
||||
results.append({"type": "tool_result",
|
||||
"tool_use_id": block.id, "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
context = update_context(context, messages)
|
||||
system = get_system_prompt(context)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s11: error recovery")
|
||||
print("Enter a question, press Enter to send. Type q to quit.\n")
|
||||
history = []
|
||||
context = update_context({}, [])
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms11 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history, context)
|
||||
context = update_context(context, history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
98
s11_error_recovery/images/error-recovery-overview.en.svg
Normal file
98
s11_error_recovery/images/error-recovery-overview.en.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 440" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="l1" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="l2" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="l3" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="440" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Error Recovery — try/except wrapping LLM calls, three recovery modes</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s10 retained</text>
|
||||
<rect x="140" y="56" width="12" height="10" rx="2" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="158" y="66" fill="#d97706" font-size="10" font-weight="600">s11 new</text>
|
||||
|
||||
<!-- ===== s10 loop (compact) ===== -->
|
||||
<rect x="30" y="92" width="80" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="70" y="116" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">messages</text>
|
||||
|
||||
<line x1="110" y1="112" x2="128" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="131" y="86" width="90" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="176" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">prompt assembly</text>
|
||||
<text x="176" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s10)</text>
|
||||
|
||||
<line x1="221" y1="112" x2="239" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="242" y="86" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="292" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">compress + load</text>
|
||||
<text x="292" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s08-s09)</text>
|
||||
|
||||
<line x1="342" y1="112" x2="360" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- LLM (wrapped in try/except) -->
|
||||
<rect x="363" y="86" width="80" height="52" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="403" y="108" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="403" y="122" fill="#dc2626" font-size="8" text-anchor="middle">try/except</text>
|
||||
|
||||
<line x1="443" y1="112" x2="461" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="464" y="86" width="110" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="519" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="519" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">bash · read · write</text>
|
||||
|
||||
<!-- Arrow: LLM → Recovery -->
|
||||
<path d="M 403 138 L 403 178" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
|
||||
<text x="415" y="164" fill="#dc2626" font-size="9">error</text>
|
||||
|
||||
<!-- ===== Recovery Section ===== -->
|
||||
<rect x="20" y="182" width="720" height="22" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="197" fill="#64748b" font-size="11" font-weight="600">Error Recovery (classify, recover, retry LLM)</text>
|
||||
|
||||
<!-- Layer 1: max_tokens -->
|
||||
<rect x="40" y="210" width="680" height="48" rx="7" fill="url(#l1)" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="60" y="230" fill="#92400e" font-size="12" font-weight="600">Path 1</text>
|
||||
<text x="112" y="230" fill="#92400e" font-size="11" font-weight="700">max_tokens</text>
|
||||
<text x="200" y="230" fill="#92400e" font-size="11">Output truncated → escalate 8K→64K (once) / continuation prompt (max 3)</text>
|
||||
<text x="200" y="246" fill="#b45309" font-size="9">Trigger: stop_reason == "max_tokens" · Cost: 0-1 API · Recover then continue</text>
|
||||
|
||||
<!-- Layer 2: prompt_too_long -->
|
||||
<rect x="40" y="266" width="680" height="48" rx="7" fill="url(#l2)" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="60" y="286" fill="#9a3412" font-size="12" font-weight="600">Path 2</text>
|
||||
<text x="112" y="286" fill="#9a3412" font-size="11" font-weight="700">prompt_too_long</text>
|
||||
<text x="230" y="286" fill="#9a3412" font-size="11">Context overflow → reactive compact → retry (one chance)</text>
|
||||
<text x="200" y="302" fill="#c2410c" font-size="9">Trigger: API returns 413 · Cost: 1 API · Still over after compact → exit</text>
|
||||
|
||||
<!-- Layer 3: 429/529 -->
|
||||
<rect x="40" y="322" width="680" height="48" rx="7" fill="url(#l3)" stroke="#dc2626" stroke-width="1.5"/>
|
||||
<text x="60" y="342" fill="#991b1b" font-size="12" font-weight="600">Path 3</text>
|
||||
<text x="112" y="342" fill="#991b1b" font-size="11" font-weight="700">429/529</text>
|
||||
<text x="170" y="342" fill="#991b1b" font-size="11">Transient failure → exponential backoff + jitter (max 10) / 3×529 → switch model</text>
|
||||
<text x="200" y="358" fill="#b91c1c" font-size="9">Trigger: RateLimitError / OverloadedError · Formula: min(500×2^n, 32s) + jitter</text>
|
||||
|
||||
<!-- ===== Bottom notes ===== -->
|
||||
<rect x="40" y="388" width="680" height="40" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="60" y="406" fill="#475569" font-size="10">Three most common recovery modes. CC has 13+ reason codes (image_error, aborted_streaming, etc.), each with dedicated handling.</text>
|
||||
<text x="60" y="422" fill="#94a3b8" font-size="9">All paths after recovery → continue back to LLM · Normal flow: tool results → messages → loop</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.5 KiB |
98
s11_error_recovery/images/error-recovery-overview.ja.svg
Normal file
98
s11_error_recovery/images/error-recovery-overview.ja.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 440" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="l1" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="l2" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="l3" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="440" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Error Recovery — try/except で LLM 呼び出しをラップ、3 つの復旧モード</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s10 維持</text>
|
||||
<rect x="140" y="56" width="12" height="10" rx="2" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="158" y="66" fill="#d97706" font-size="10" font-weight="600">s11 新規</text>
|
||||
|
||||
<!-- ===== s10 loop (compact) ===== -->
|
||||
<rect x="30" y="92" width="80" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="70" y="116" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">messages</text>
|
||||
|
||||
<line x1="110" y1="112" x2="128" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="131" y="86" width="90" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="176" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">prompt assembly</text>
|
||||
<text x="176" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s10)</text>
|
||||
|
||||
<line x1="221" y1="112" x2="239" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="242" y="86" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="292" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">compress + load</text>
|
||||
<text x="292" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s08-s09)</text>
|
||||
|
||||
<line x1="342" y1="112" x2="360" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- LLM (wrapped in try/except) -->
|
||||
<rect x="363" y="86" width="80" height="52" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="403" y="108" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="403" y="122" fill="#dc2626" font-size="8" text-anchor="middle">try/except</text>
|
||||
|
||||
<line x1="443" y1="112" x2="461" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="464" y="86" width="110" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="519" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="519" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">bash · read · write</text>
|
||||
|
||||
<!-- Arrow: LLM → Recovery -->
|
||||
<path d="M 403 138 L 403 178" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
|
||||
<text x="415" y="164" fill="#dc2626" font-size="9">エラー</text>
|
||||
|
||||
<!-- ===== Recovery Section ===== -->
|
||||
<rect x="20" y="182" width="720" height="22" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="197" fill="#64748b" font-size="11" font-weight="600">エラー復旧(分類処理、復旧後 LLM に戻りリトライ)</text>
|
||||
|
||||
<!-- Layer 1: max_tokens -->
|
||||
<rect x="40" y="210" width="680" height="48" rx="7" fill="url(#l1)" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="60" y="230" fill="#92400e" font-size="12" font-weight="600">パス 1</text>
|
||||
<text x="112" y="230" fill="#92400e" font-size="11" font-weight="700">max_tokens</text>
|
||||
<text x="200" y="230" fill="#92400e" font-size="11">出力が途切れた → 8K→64K に拡張(1 回)/ 続行プロンプト(最大 3 回)</text>
|
||||
<text x="200" y="246" fill="#b45309" font-size="9">トリガー: stop_reason == "max_tokens" · コスト: 0-1 API · 復旧後 continue</text>
|
||||
|
||||
<!-- Layer 2: prompt_too_long -->
|
||||
<rect x="40" y="266" width="680" height="48" rx="7" fill="url(#l2)" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="60" y="286" fill="#9a3412" font-size="12" font-weight="600">パス 2</text>
|
||||
<text x="112" y="286" fill="#9a3412" font-size="11" font-weight="700">prompt_too_long</text>
|
||||
<text x="230" y="286" fill="#9a3412" font-size="11">コンテキスト超過 → reactive compact → リトライ(1 回のみ)</text>
|
||||
<text x="200" y="302" fill="#c2410c" font-size="9">トリガー: API が 413 返却 · コスト: 1 API · 圧縮後も超過 → 終了</text>
|
||||
|
||||
<!-- Layer 3: 429/529 -->
|
||||
<rect x="40" y="322" width="680" height="48" rx="7" fill="url(#l3)" stroke="#dc2626" stroke-width="1.5"/>
|
||||
<text x="60" y="342" fill="#991b1b" font-size="12" font-weight="600">パス 3</text>
|
||||
<text x="112" y="342" fill="#991b1b" font-size="11" font-weight="700">429/529</text>
|
||||
<text x="170" y="342" fill="#991b1b" font-size="11">一時障害 → 指数バックオフ + ジッター(最大 10 回)/ 3 回 529 → モデル切替</text>
|
||||
<text x="200" y="358" fill="#b91c1c" font-size="9">トリガー: RateLimitError / OverloadedError · 式: min(500×2^n, 32s) + jitter</text>
|
||||
|
||||
<!-- ===== Bottom notes ===== -->
|
||||
<rect x="40" y="388" width="680" height="40" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="60" y="406" fill="#475569" font-size="10">最も一般的な 3 つの復旧モード。CC は実際に 13+ の reason code を持ち(image_error, aborted_streaming 等)、それぞれ専用の処理がある。</text>
|
||||
<text x="60" y="422" fill="#94a3b8" font-size="9">全パス復旧後 → continue で LLM に戻る · 正常フロー: ツール結果 → messages → ループ</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.7 KiB |
98
s11_error_recovery/images/error-recovery-overview.svg
Normal file
98
s11_error_recovery/images/error-recovery-overview.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 440" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="l1" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="l2" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="l3" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="440" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Error Recovery — try/except 包裹 LLM 调用,三种恢复模式</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s10 保留</text>
|
||||
<rect x="140" y="56" width="12" height="10" rx="2" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="158" y="66" fill="#d97706" font-size="10" font-weight="600">s11 新增</text>
|
||||
|
||||
<!-- ===== s10 loop (compact) ===== -->
|
||||
<rect x="30" y="92" width="80" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="70" y="116" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">messages</text>
|
||||
|
||||
<line x1="110" y1="112" x2="128" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="131" y="86" width="90" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="176" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">prompt assembly</text>
|
||||
<text x="176" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s10)</text>
|
||||
|
||||
<line x1="221" y1="112" x2="239" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="242" y="86" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="292" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">compress + load</text>
|
||||
<text x="292" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">(s08-s09)</text>
|
||||
|
||||
<line x1="342" y1="112" x2="360" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- LLM (wrapped in try/except) -->
|
||||
<rect x="363" y="86" width="80" height="52" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="403" y="108" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="403" y="122" fill="#dc2626" font-size="8" text-anchor="middle">try/except</text>
|
||||
|
||||
<line x1="443" y1="112" x2="461" y2="112" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="464" y="86" width="110" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="519" y="108" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="519" y="122" fill="#94a3b8" font-size="8" text-anchor="middle">bash · read · write</text>
|
||||
|
||||
<!-- Arrow: LLM → Recovery -->
|
||||
<path d="M 403 138 L 403 178" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
|
||||
<text x="415" y="164" fill="#dc2626" font-size="9">报错</text>
|
||||
|
||||
<!-- ===== Recovery Section ===== -->
|
||||
<rect x="20" y="182" width="720" height="22" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="197" fill="#64748b" font-size="11" font-weight="600">错误恢复(分类处理,恢复后回到 LLM 重试)</text>
|
||||
|
||||
<!-- Layer 1: max_tokens -->
|
||||
<rect x="40" y="210" width="680" height="48" rx="7" fill="url(#l1)" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="60" y="230" fill="#92400e" font-size="12" font-weight="600">路径 1</text>
|
||||
<text x="112" y="230" fill="#92400e" font-size="11" font-weight="700">max_tokens</text>
|
||||
<text x="200" y="230" fill="#92400e" font-size="11">输出被截断 → 升级 8K→64K(一次)/ 续写提示(最多 3 次)</text>
|
||||
<text x="200" y="246" fill="#b45309" font-size="9">触发: stop_reason == "max_tokens" · 代价: 0-1 API · 恢复后 continue</text>
|
||||
|
||||
<!-- Layer 2: prompt_too_long -->
|
||||
<rect x="40" y="266" width="680" height="48" rx="7" fill="url(#l2)" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="60" y="286" fill="#9a3412" font-size="12" font-weight="600">路径 2</text>
|
||||
<text x="112" y="286" fill="#9a3412" font-size="11" font-weight="700">prompt_too_long</text>
|
||||
<text x="230" y="286" fill="#9a3412" font-size="11">上下文超限 → reactive compact → 重试(一次机会)</text>
|
||||
<text x="200" y="302" fill="#c2410c" font-size="9">触发: API 返回 413 · 代价: 1 API · 压缩过还是超 → 退出</text>
|
||||
|
||||
<!-- Layer 3: 429/529 -->
|
||||
<rect x="40" y="322" width="680" height="48" rx="7" fill="url(#l3)" stroke="#dc2626" stroke-width="1.5"/>
|
||||
<text x="60" y="342" fill="#991b1b" font-size="12" font-weight="600">路径 3</text>
|
||||
<text x="112" y="342" fill="#991b1b" font-size="11" font-weight="700">429/529</text>
|
||||
<text x="170" y="342" fill="#991b1b" font-size="11">临时故障 → 指数退避 + 抖动(最多 10 次)/ 3 次 529 → 切换模型</text>
|
||||
<text x="200" y="358" fill="#b91c1c" font-size="9">触发: RateLimitError / OverloadedError · 公式: min(500×2^n, 32s) + jitter</text>
|
||||
|
||||
<!-- ===== Bottom notes ===== -->
|
||||
<rect x="40" y="388" width="680" height="40" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="60" y="406" fill="#475569" font-size="10">三种最常见的恢复模式。CC 实际有 13+ reason code(image_error、aborted_streaming 等),各有专门处理。</text>
|
||||
<text x="60" y="422" fill="#94a3b8" font-size="9">所有路径恢复后 → continue 回到 LLM · 正常流程: 工具结果 → messages → 循环</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.5 KiB |
Reference in New Issue
Block a user