* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building incrementally on the previous. Key fixes across chapters: - s01-s04: agent loop, tool dispatch, permission pipeline, hooks - s05-s08: todo write, subagent, skill loading, context compact - s09-s11: memory system, system prompt assembly, error recovery - s12-s14: task graph, background tasks, cron scheduler All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS, json.dumps cache, real-state context, can_start dep protection, etc.). * feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform chapters. Each chapter inherits all previous fixes and adds one mechanism: - s15: agent teams (TeamCreate, teammate threads, shared task list) - s16: team protocols (plan approval, shutdown handshake, consume_inbox) - s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox) - s18: worktree isolation (git worktree, bind_task, cwd switching, safety) - s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache) All appendix source code references verified against CC source. Config priority corrected: claude.ai < plugin < user < project < local. * fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash - s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02) - s06-s08: todo_write validates content/status required fields (inherited from s05) - s09: extract_memories uses pre-compression snapshot instead of compacted messages - s16: submit_plan docstring clarifies protocol-only (not code-level gate) - s17-s19: match_response restores type mismatch validation (from s16) - s17-s19: claim_task deps list handles missing dep files without crashing * fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation - s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task, non-interactive/SDK defaults to TodoWrite. Fix env var name to CLAUDE_CODE_ENABLE_TASKS (not TODO_V2). - s14/s15: add _validate_cron_field with per-field range checks (minute 0-59, hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi. Replace old try/except validation that only caught exceptions. - s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree, not just create_worktree. * fix: align s16-s19 teaching tool consistency * fix pr265 chapter diagrams * Add comprehensive s20 harness chapter * Fix chapter smoke test regressions * Clarify README tutorial track transition --------- Co-authored-by: Haoran <bill-billion@outlook.com>
6
.gitignore
vendored
@@ -195,6 +195,12 @@ cython_debug/
|
||||
.task_outputs/
|
||||
.tasks/
|
||||
.teams/
|
||||
.mailboxes/
|
||||
.worktrees/
|
||||
.scheduled_tasks.json
|
||||
|
||||
# Accidental root npm lockfile; web/package-lock.json is tracked.
|
||||
/package-lock.json
|
||||
|
||||
# Ruff stuff:
|
||||
.ruff_cache/
|
||||
|
||||
228
README-ja.md
@@ -72,9 +72,9 @@ Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
|
||||
|
||||
- **ツールの実装。** Agent に手を与える。ファイル読み書き、シェル実行、API 呼び出し、ブラウザ制御、データベースクエリ。各ツールは Agent が環境内で取れる行動。原子的で、組み合わせ可能で、記述が明確であるように設計する。
|
||||
|
||||
- **知識のキュレーション。** Agent にドメイン専門性を与える。製品ドキュメント、アーキテクチャ決定記録、スタイルガイド、規制要件。オンデマンドで読み込み(s05)、前もって詰め込まない。Agent は何が利用可能か知った上で、必要なものを自ら取得すべき。
|
||||
- **知識のキュレーション。** Agent にドメイン専門性を与える。製品ドキュメント、アーキテクチャ決定記録、スタイルガイド、規制要件。オンデマンドで読み込み(s07)、前もって詰め込まない。Agent は何が利用可能か知った上で、必要なものを自ら取得すべき。
|
||||
|
||||
- **コンテキストの管理。** Agent にクリーンな記憶を与える。サブ Agent 隔離(s04)がノイズの漏洩を防ぐ。コンテキスト圧縮(s06)が履歴の氾濫を防ぐ。タスクシステム(s07)が目標を単一の会話を超えて永続化する。
|
||||
- **コンテキストの管理。** Agent にクリーンな記憶を与える。サブ Agent 隔離(s06)がノイズの漏洩を防ぐ。コンテキスト圧縮(s08)が履歴の氾濫を防ぐ。タスクシステム(s12)が目標を単一の会話を超えて永続化する。
|
||||
|
||||
- **権限の制御。** Agent に境界を与える。ファイルアクセスのサンドボックス化。破壊的操作への承認要求。Agent と外部システム間の信頼境界の実施。安全工学と Harness 工学の交差点。
|
||||
|
||||
@@ -106,7 +106,7 @@ Claude Code = 一つの agent loop
|
||||
|
||||
これがすべてだ。これが全アーキテクチャ。すべてのコンポーネントは Harness メカニズム -- Agent が住む世界の一部。Agent そのものは? Claude だ。モデル。Anthropic が人類の推論とコードの全幅で訓練した。Harness が Claude を賢くしたのではない。Claude は元々賢い。Harness が Claude に手と目とワークスペースを与えた。
|
||||
|
||||
これが Claude Code が理想的な教材である理由だ:**モデルを信頼し、工学的努力を Harness に集中させるとどうなるかを示している。** このリポジトリの各セッション(s01-s12)は Claude Code アーキテクチャから一つの Harness メカニズムをリバースエンジニアリングする。終了時には、Claude Code の仕組みだけでなく、あらゆるドメインのあらゆる Agent に適用される Harness 工学の普遍的原則を理解している。
|
||||
これが Claude Code が理想的な教材である理由だ:**モデルを信頼し、工学的努力を Harness に集中させるとどうなるかを示している。** このリポジトリの各セッション(s01-s20)は Claude Code アーキテクチャの Harness メカニズムを段階的に分解し、最後に組み直す。終了時には、Claude Code の仕組みだけでなく、あらゆるドメインのあらゆる Agent に適用される Harness 工学の普遍的原則を理解している。
|
||||
|
||||
教訓は「Claude Code をコピーせよ」ではない。教訓は:**最高の Agent プロダクトは、自分の仕事が Harness であって Intelligence ではないと理解しているエンジニアが作る。**
|
||||
|
||||
@@ -159,32 +159,48 @@ Claude Code = 一つの agent loop
|
||||
Agent を特定ドメインで効果的にする Harness -- の作り方を教える。
|
||||
```
|
||||
|
||||
**12 の段階的セッション、シンプルなループから分離された自律実行まで。**
|
||||
**20 の段階的セッション、シンプルなループから完全な Harness まで。**
|
||||
**各セッションは 1 つの Harness メカニズムを追加する。各メカニズムには 1 つのモットーがある。**
|
||||
|
||||
> **s01** *"One loop & Bash is all you need"* — 1つのツール + 1つのループ = エージェント
|
||||
>
|
||||
> **s02** *"ツールを足すなら、ハンドラーを1つ足すだけ"* — ループは変わらない。新ツールは dispatch map に登録するだけ
|
||||
>
|
||||
> **s03** *"計画のないエージェントは行き当たりばったり"* — まずステップを書き出し、それから実行
|
||||
> **s03** *"まず境界を決め、それから自由を与える"* — 実行してよいか、止めるか、ユーザーに聞くかを判断する
|
||||
>
|
||||
> **s04** *"大きなタスクを分割し、各サブタスクにクリーンなコンテキストを"* — サブエージェントは独立した messages[] を使い、メイン会話を汚さない
|
||||
> **s04** *"ループの外にフックし、ループは書き換えない"* — メインループを変えずに拡張できる入口を作る
|
||||
>
|
||||
> **s05** *"必要な知識を、必要な時に読み込む"* — system prompt ではなく tool_result で注入
|
||||
> **s05** *"計画のないエージェントは行き当たりばったり"* — まずステップを書き出し、それから実行
|
||||
>
|
||||
> **s06** *"コンテキストはいつか溢れる、空ける手段が要る"* — 3層圧縮で無限セッションを実現
|
||||
> **s06** *"大きなタスクを分割し、各サブタスクにクリーンなコンテキストを"* — サブ Agent が作業し、結果だけを持ち帰る
|
||||
>
|
||||
> **s07** *"大きな目標を小タスクに分解し、順序付けし、ディスクに記録する"* — ファイルベースのタスクグラフ、マルチエージェント協調の基盤
|
||||
> **s07** *"必要な知識を、必要な時に読み込む"* — スキルはまず一覧だけ、必要な時に展開する
|
||||
>
|
||||
> **s08** *"遅い操作はバックグラウンドへ、エージェントは次を考え続ける"* — デーモンスレッドがコマンド実行、完了後に通知を注入
|
||||
> **s08** *"コンテキストはいつか溢れる、空ける手段が要る"* — 4層圧縮、安い方から先に実行
|
||||
>
|
||||
> **s09** *"一人で終わらないなら、チームメイトに任せる"* — 永続チームメイト + 非同期メールボックス
|
||||
> **s09** *"覚えるべきことを覚え、忘れるべきことを忘れる"* — 3つのサブシステム:選択、抽出、整理
|
||||
>
|
||||
> **s10** *"チームメイト間には統一の通信ルールが必要"* — 1つの request-response パターンが全交渉を駆動
|
||||
> **s10** *"プロンプトは実行時に組み立てる、ハードコードではない"* — セクション分割 + オンデマンド連結
|
||||
>
|
||||
> **s11** *"チームメイトが自らボードを見て、仕事を取る"* — リーダーが逐一割り振る必要はない
|
||||
> **s11** *"エラーは終わりではない、リトライの始まりだ"* — 失敗したら再試行し、空きを作り、別の道を試す
|
||||
>
|
||||
> **s12** *"各自のディレクトリで作業し、互いに干渉しない"* — タスクは目標を管理、worktree はディレクトリを管理、IDで紐付け
|
||||
> **s12** *"大きな目標を小タスクに分解し、順序付けし、ディスクに記録する"* — ファイルベースのタスクグラフ、マルチエージェント協調の基盤
|
||||
>
|
||||
> **s13** *"遅い操作はバックグラウンドへ、エージェントは次を考え続ける"* — バックグラウンドスレッドがコマンド実行、完了後に通知を注入
|
||||
>
|
||||
> **s14** *"スケジュールで発火、人間の起動は不要"* — 時間になったら自動でタスクを動かす
|
||||
>
|
||||
> **s15** *"一人で終わらないなら、チームメイトに任せる"* — 永続チームメイト + 非同期メールボックス
|
||||
>
|
||||
> **s16** *"チームメイト間には統一の通信ルールが必要"* — 固定のリクエスト-返信形式で連携する
|
||||
>
|
||||
> **s17** *"チームメイトが自らボードを見て、仕事を取る"* — リーダーが逐一割り振る必要はない
|
||||
>
|
||||
> **s18** *"各自のディレクトリで作業し、互いに干渉しない"* — タスクは目標を管理、worktree はディレクトリを管理、IDで紐付け
|
||||
>
|
||||
> **s19** *"能力不足? MCP でプラグイン"* — 外部ツールを同じツールプールに接続する
|
||||
>
|
||||
> **s20** *"仕組みは多く、ループは一つ"* — すべての仕組みを 1 つの Harness に戻す
|
||||
|
||||
---
|
||||
|
||||
@@ -217,6 +233,35 @@ def agent_loop(messages):
|
||||
|
||||
各セッションはこのループの上に 1 つの Harness メカニズムを重ねる -- ループ自体は変わらない。ループは Agent のもの。メカニズムは Harness のもの。
|
||||
|
||||
## バージョン状況
|
||||
|
||||
このリポジトリには現在、2 つのチュートリアルトラックが共存している:
|
||||
|
||||
- **現行トラック:ルート直下の `s01-s20`**
|
||||
ルート直下の `s01_*` から `s20_*` までが新しい正規版であり、現在推奨する読書経路。各セッションには中国語原文、英語/日本語訳、実行可能な `code.py`、必要に応じた図が含まれる。
|
||||
- **旧版移行トラック:`docs/`、`agents/`、現在の `web/`**
|
||||
これらは旧 12 セッション版を保持している。既存読者、旧リンク、Web プラットフォームのために移行期間中は一時的に残している。
|
||||
|
||||
新しく読む場合は、ルート直下の `s01_agent_loop/` から `s20_comprehensive/` までを読む。旧リンクや現在の Web アプリから入った場合は、旧 12 セッション版を読んでいる可能性が高い。旧版と現行版のセッション番号は常に一致しないため、番号を混同しないこと。
|
||||
|
||||
### 旧版から現行版への対応
|
||||
|
||||
| 旧 12 セッション版 | 現行 20 セッション版 | トピック |
|
||||
|---|---|---|
|
||||
| 旧 s01 | 現行 s01 | Agent Loop |
|
||||
| 旧 s02 | 現行 s02 | Tool Use |
|
||||
| 旧 s03 | 現行 s05 | TodoWrite |
|
||||
| 旧 s04 | 現行 s06 | Subagent |
|
||||
| 旧 s05 | 現行 s07 | Skill Loading |
|
||||
| 旧 s06 | 現行 s08 | Context Compact |
|
||||
| 旧 s07 | 現行 s12 | Task System |
|
||||
| 旧 s08 | 現行 s13 | Background Tasks |
|
||||
| 旧 s09 | 現行 s15 | Agent Teams |
|
||||
| 旧 s10 | 現行 s16 | Team Protocols |
|
||||
| 旧 s11 | 現行 s17 | Autonomous Agents |
|
||||
| 旧 s12 | 現行 s18 | Worktree Isolation |
|
||||
| 現行版のみ | s03、s04、s09、s10、s11、s14、s19、s20 | Permission、Hooks、Memory、System Prompt、Error Recovery、Cron、MCP、Comprehensive Agent |
|
||||
|
||||
## スコープ (重要)
|
||||
|
||||
このリポジトリは Harness 工学の 0->1 学習プロジェクト -- Agent モデルを囲む環境の構築を学ぶ。
|
||||
@@ -232,20 +277,30 @@ def agent_loop(messages):
|
||||
|
||||
## クイックスタート
|
||||
|
||||
### 現行 20 セッション版
|
||||
|
||||
```sh
|
||||
git clone https://github.com/shareAI-lab/learn-claude-code
|
||||
cd learn-claude-code
|
||||
pip install -r requirements.txt
|
||||
cp .env.example .env # .env を編集して ANTHROPIC_API_KEY を入力
|
||||
|
||||
python agents/s01_agent_loop.py # ここから開始
|
||||
python agents/s12_worktree_task_isolation.py # 全セッションの到達点
|
||||
python agents/s_full.py # 総括: 全メカニズム統合
|
||||
python s01_agent_loop/code.py # ここから開始 — 1ループ + bash
|
||||
python s08_context_compact/code.py # コンテキスト圧縮(複雑章)
|
||||
python s20_comprehensive/code.py # 終点: 全メカニズムを 1 つのループへ
|
||||
```
|
||||
|
||||
### 旧 12 セッション移行版
|
||||
|
||||
```sh
|
||||
python agents/s01_agent_loop.py
|
||||
python agents/s12_worktree_task_isolation.py
|
||||
python agents/s_full.py
|
||||
```
|
||||
|
||||
### Web プラットフォーム
|
||||
|
||||
インタラクティブな可視化、ステップスルーアニメーション、ソースビューア、各セッションのドキュメント。
|
||||
現在の Web プラットフォームはまだ `docs/` の旧 12 セッション版を表示する。現行 20 セッション版はルート直下の `s01-s20` を読む。
|
||||
|
||||
```sh
|
||||
cd web && npm install && npm run dev # http://localhost:3000
|
||||
@@ -253,73 +308,100 @@ cd web && npm install && npm run dev # http://localhost:3000
|
||||
|
||||
## 学習パス
|
||||
|
||||
```
|
||||
フェーズ1: ループ フェーズ2: 計画と知識
|
||||
================== ==============================
|
||||
s01 エージェントループ [1] s03 TodoWrite [5]
|
||||
while + stop_reason TodoManager + nag リマインダー
|
||||
| |
|
||||
+-> s02 Tool Use [4] s04 サブエージェント [5]
|
||||
dispatch map: name->handler 子ごとに新しい messages[]
|
||||
|
|
||||
s05 Skills [5]
|
||||
SKILL.md を tool_result で注入
|
||||
|
|
||||
s06 Context Compact [5]
|
||||
3層コンテキスト圧縮
|
||||
主線:動ける → 複雑な仕事ができる → 記憶して回復できる → 長く動ける → 協作できる → 拡張して統合する
|
||||
|
||||
フェーズ3: 永続化 フェーズ4: チーム
|
||||
================== =====================
|
||||
s07 タスクシステム [8] s09 エージェントチーム [9]
|
||||
ファイルベース CRUD + 依存グラフ チームメイト + JSONL メールボックス
|
||||
| |
|
||||
s08 バックグラウンドタスク [6] s10 チームプロトコル [12]
|
||||
デーモンスレッド + 通知キュー シャットダウン + プラン承認 FSM
|
||||
|
|
||||
s11 自律エージェント [14]
|
||||
アイドルサイクル + 自動クレーム
|
||||
|
|
||||
s12 Worktree 分離 [16]
|
||||
タスク調整 + 必要時の分離実行レーン
|
||||
```mermaid
|
||||
flowchart TD
|
||||
%% カードスタイル
|
||||
classDef stage1 fill:#E3F2FD,stroke:#1976D2,stroke-width:2px,color:#0D47A1,rx:12,ry:12,text-align:left
|
||||
classDef stage2 fill:#E8F5E9,stroke:#388E3C,stroke-width:2px,color:#1B5E20,rx:12,ry:12,text-align:left
|
||||
classDef stage3 fill:#FFF3E0,stroke:#F57C00,stroke-width:2px,color:#E65100,rx:12,ry:12,text-align:left
|
||||
classDef stage4 fill:#FCE4EC,stroke:#C2185b,stroke-width:2px,color:#880E4F,rx:12,ry:12,text-align:left
|
||||
classDef stage5 fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#4A148C,rx:12,ry:12,text-align:left
|
||||
classDef stage6 fill:#E0F7FA,stroke:#0097A7,stroke-width:2px,color:#006064,rx:12,ry:12,text-align:left
|
||||
|
||||
[N] = ツール数
|
||||
%% 背景スタイル
|
||||
classDef groupBox fill:#F8F9FA,stroke:#CED4DA,stroke-width:2px,stroke-dasharray: 5 5,rx:15,ry:15,color:#495057
|
||||
|
||||
%% 第1層:1-3段階
|
||||
subgraph Phase1 ["🌱 段階 1-3:基礎能力の構築(単純から複雑へ)"]
|
||||
direction LR
|
||||
S1["<b>第1段階:Agent が動ける</b><br/>━━━━━━━━━━━━━<br/><b>s01 Agent Loop</b><br/>└─ 1つのループ + bash<br/><br/><b>s02 Tool Use</b><br/>└─ 1つのツールから複数へ<br/><br/><b>s03 Permission</b><br/>└─ 実行してよいか判断する<br/><br/><b>s04 Hooks</b><br/>└─ ツール前後に拡張入口を作る"]:::stage1
|
||||
|
||||
S2["<b>第2段階:複雑な仕事をこなす</b><br/>━━━━━━━━━━━━━<br/><b>s05 TodoWrite</b><br/>└─ 先に計画し、それから実行<br/><br/><b>s06 Subagent</b><br/>└─ サブ Agent が結果を返す<br/><br/><b>s08 Context Compact</b><br/>└─ 長いコンテキストに空きを作る"]:::stage2
|
||||
|
||||
S3["<b>第3段階:記憶して回復する</b><br/>━━━━━━━━━━━━━<br/><b>s09 Memory</b><br/>└─ 覚えるべきことを覚える<br/><br/><b>s10 System Prompt</b><br/>└─ 実行時に組み立てる<br/><br/><b>s11 Error Recovery</b><br/>└─ 再試行し、別の道へ"]:::stage3
|
||||
|
||||
S1 ==> S2 ==> S3
|
||||
end
|
||||
|
||||
%% 第2層:4-6段階
|
||||
subgraph Phase2 ["🚀 段階 4-6:高次能力の進化(長期実行、協作、統合)"]
|
||||
direction LR
|
||||
S4["<b>第4段階:長く動くタスク</b><br/>━━━━━━━━━━━━━<br/><b>s12 Task System</b><br/>└─ タスクと依存関係を保存<br/><br/><b>s13 Background Tasks</b><br/>└─ 遅い作業をバックグラウンドへ<br/><br/><b>s14 Cron Scheduler</b><br/>└─ 時間で自動実行"]:::stage4
|
||||
|
||||
S5["<b>第5段階:複数 Agent の協作</b><br/>━━━━━━━━━━━━━<br/><b>s15 Agent Teams</b><br/>└─ チームメイト + メールボックス<br/><br/><b>s16 Team Protocols</b><br/>└─ 固定のリクエスト-返信形式<br/><br/><b>s17 Autonomous Agents</b><br/>└─ ボードを見て仕事を取る<br/><br/><b>s18 Worktree Isolation</b><br/>└─ 別ディレクトリで作業"]:::stage5
|
||||
|
||||
S6["<b>第6段階:外部能力と統合</b><br/>━━━━━━━━━━━━━<br/><b>s07 Skill Loading</b><br/>└─ スキルを必要時に展開<br/><br/><b>s19 MCP Plugin</b><br/>└─ 外部ツールを同じプールへ<br/><br/><b>s20 Comprehensive Agent</b><br/>└─ すべてを1つのループへ"]:::stage6
|
||||
|
||||
S4 ==> S5 ==> S6
|
||||
end
|
||||
|
||||
%% 2つの層を接続
|
||||
Phase1 ===> Phase2
|
||||
|
||||
class Phase1,Phase2 groupBox
|
||||
```
|
||||
|
||||
## 全セッション
|
||||
|
||||
| セッション | トピック | キーコンセプト |
|
||||
|---|---|---|
|
||||
| [s01](./s01_agent_loop/) | Agent Loop | `messages` / `while True` / `stop_reason` |
|
||||
| [s02](./s02_tool_use/) | Tool Use | `TOOL_HANDLERS` / dispatch map / 並行性 |
|
||||
| [s03](./s03_permission/) | Permission | `PermissionRule` / 承認パイプライン |
|
||||
| [s04](./s04_hooks/) | Hooks | `PreToolUse` / `PostToolUse` / 拡張ポイント |
|
||||
| [s05](./s05_todo_write/) | TodoWrite | `TodoItem` / 計画してから実行 |
|
||||
| [s06](./s06_subagent/) | Subagent | `fresh messages[]` / コンテキスト分離 |
|
||||
| [s07](./s07_skill_loading/) | Skill Loading | `SkillManifest` / オンデマンド注入 |
|
||||
| [s08](./s08_context_compact/) | Context Compact | snip / micro / budget / auto 4層圧縮 |
|
||||
| [s09](./s09_memory/) | Memory | selection / extraction / consolidation |
|
||||
| [s10](./s10_system_prompt/) | System Prompt | ランタイム組立 / セクション連結 |
|
||||
| [s11](./s11_error_recovery/) | Error Recovery | token 拡張 / fallback モデル / リトライ戦略 |
|
||||
| [s12](./s12_task_system/) | Task System | `TaskRecord` / `blockedBy` / ディスク永続化 |
|
||||
| [s13](./s13_background_tasks/) | Background Tasks | スレッド実行 / 通知キュー |
|
||||
| [s14](./s14_cron_scheduler/) | Cron Scheduler | 永続スケジューリング / セッション限定トリガー |
|
||||
| [s15](./s15_agent_teams/) | Agent Teams | `MessageBus` / 受信箱 / 権限バブリング |
|
||||
| [s16](./s16_team_protocols/) | Team Protocols | シャットダウンハンドシェイク / プラン承認 |
|
||||
| [s17](./s17_autonomous_agents/) | Autonomous Agents | アイドルサイクル / 自動クレーム |
|
||||
| [s18](./s18_worktree_isolation/) | Worktree Isolation | `WorktreeRecord` / タスク-ディレクトリ紐付け |
|
||||
| [s19](./s19_mcp_plugin/) | MCP Plugin | マルチトランスポート / チャネルルーティング / ツールプール組み立て |
|
||||
| [s20](./s20_comprehensive/) | Comprehensive Agent | すべての仕組みを 1 つのループへ |
|
||||
|
||||
## プロジェクト構成
|
||||
|
||||
```
|
||||
learn-claude-code/
|
||||
|
|
||||
|-- agents/ # Python リファレンス実装 (s01-s12 + s_full 総括)
|
||||
|-- docs/{en,zh,ja}/ # メンタルモデル優先のドキュメント (3言語)
|
||||
|-- web/ # インタラクティブ学習プラットフォーム (Next.js)
|
||||
|-- skills/ # s05 の Skill ファイル
|
||||
+-- .github/workflows/ci.yml # CI: 型チェック + ビルド
|
||||
s01_agent_loop/ # セッションごとに1フォルダ
|
||||
README.md # 中国語ソース(完全なナラティブ)
|
||||
README.en.md # 英語訳
|
||||
README.ja.md # 日本語訳
|
||||
code.py # 単体実行可能なコード
|
||||
images/ # SVG ダイアグラム
|
||||
s02_tool_use/
|
||||
...
|
||||
s19_mcp_plugin/
|
||||
s20_comprehensive/ # 終点セッション
|
||||
agents/ # 旧 12 セッションの実行可能コピー + s_full.py
|
||||
skills/ # s07 で使用するスキルファイル
|
||||
docs/ # 旧 12 セッション文書、移行期間中は保持
|
||||
web/ # 現在は docs/ の旧版内容を生成・表示
|
||||
tests/
|
||||
```
|
||||
|
||||
## ドキュメント
|
||||
|
||||
メンタルモデル優先: 問題、解決策、ASCII図、最小限のコード。
|
||||
[English](./docs/en/) | [中文](./docs/zh/) | [日本語](./docs/ja/)
|
||||
|
||||
| セッション | トピック | モットー |
|
||||
|-----------|---------|---------|
|
||||
| [s01](./docs/ja/s01-the-agent-loop.md) | エージェントループ | *One loop & Bash is all you need* |
|
||||
| [s02](./docs/ja/s02-tool-use.md) | Tool Use | *ツールを足すなら、ハンドラーを1つ足すだけ* |
|
||||
| [s03](./docs/ja/s03-todo-write.md) | TodoWrite | *計画のないエージェントは行き当たりばったり* |
|
||||
| [s04](./docs/ja/s04-subagent.md) | サブエージェント | *大きなタスクを分割し、各サブタスクにクリーンなコンテキストを* |
|
||||
| [s05](./docs/ja/s05-skill-loading.md) | Skills | *必要な知識を、必要な時に読み込む* |
|
||||
| [s06](./docs/ja/s06-context-compact.md) | Context Compact | *コンテキストはいつか溢れる、空ける手段が要る* |
|
||||
| [s07](./docs/ja/s07-task-system.md) | タスクシステム | *大きな目標を小タスクに分解し、順序付けし、ディスクに記録する* |
|
||||
| [s08](./docs/ja/s08-background-tasks.md) | バックグラウンドタスク | *遅い操作はバックグラウンドへ、エージェントは次を考え続ける* |
|
||||
| [s09](./docs/ja/s09-agent-teams.md) | エージェントチーム | *一人で終わらないなら、チームメイトに任せる* |
|
||||
| [s10](./docs/ja/s10-team-protocols.md) | チームプロトコル | *チームメイト間には統一の通信ルールが必要* |
|
||||
| [s11](./docs/ja/s11-autonomous-agents.md) | 自律エージェント | *チームメイトが自らボードを見て、仕事を取る* |
|
||||
| [s12](./docs/ja/s12-worktree-task-isolation.md) | Worktree + タスク分離 | *各自のディレクトリで作業し、互いに干渉しない* |
|
||||
|
||||
## 次のステップ -- 理解から出荷へ
|
||||
|
||||
12 セッションを終えれば、Harness 工学の内部構造を完全に理解している。その知識を活かす 2 つの方法:
|
||||
20 セッションを終えれば、Harness 工学の内部構造を完全に理解している。その知識を活かす 2 つの方法:
|
||||
|
||||
### Kode Agent CLI -- オープンソース Coding Agent CLI
|
||||
|
||||
|
||||
229
README-zh.md
@@ -72,9 +72,9 @@ Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
|
||||
|
||||
- **实现工具。** 给 agent 一双手。文件读写、Shell 执行、API 调用、浏览器控制、数据库查询。每个工具都是 agent 在环境中可以采取的一个行动。设计它们时要原子化、可组合、描述清晰。
|
||||
|
||||
- **策划知识。** 给 agent 领域专长。产品文档、架构决策记录、风格指南、合规要求。按需加载(s05),不要前置塞入。Agent 应该知道有什么可用,然后自己拉取所需。
|
||||
- **策划知识。** 给 agent 领域专长。产品文档、架构决策记录、风格指南、合规要求。按需加载(s07),不要前置塞入。Agent 应该知道有什么可用,然后自己拉取所需。
|
||||
|
||||
- **管理上下文。** 给 agent 干净的记忆。子 agent 隔离(s04)防止噪声泄露。上下文压缩(s06)防止历史淹没。任务系统(s07)让目标持久化到单次对话之外。
|
||||
- **管理上下文。** 给 agent 干净的记忆。子 agent 隔离(s06)防止噪声泄露。上下文压缩(s08)防止历史淹没。任务系统(s12)让目标持久化到单次对话之外。
|
||||
|
||||
- **控制权限。** 给 agent 边界。沙箱化文件访问。对破坏性操作要求审批。在 agent 和外部系统之间实施信任边界。这是安全工程与 harness 工程的交汇点。
|
||||
|
||||
@@ -106,7 +106,7 @@ Claude Code = 一个 agent loop
|
||||
|
||||
就这些。这就是全部架构。每一个组件都是 harness 机制 -- 为 agent 构建的栖居世界的一部分。Agent 本身呢?是 Claude。一个模型。由 Anthropic 在人类推理和代码的全部广度上训练而成。Harness 没有让 Claude 变聪明。Claude 本来就聪明。Harness 给了 Claude 双手、双眼和一个工作空间。
|
||||
|
||||
这就是 Claude Code 作为教学标本的意义:**它展示了当你信任模型、把工程精力集中在 harness 上时会发生什么。** 本仓库的每一个课程(s01-s12)都在逆向工程 Claude Code 架构中的一个 harness 机制。学完之后,你理解的不只是 Claude Code 怎么工作,而是适用于任何领域、任何 agent 的 harness 工程通用原则。
|
||||
这就是 Claude Code 作为教学标本的意义:**它展示了当你信任模型、把工程精力集中在 harness 上时会发生什么。** 本仓库的课程(s01-s20)逐步拆解并重组 Claude Code 架构中的 harness 机制。学完之后,你理解的不只是 Claude Code 怎么工作,而是适用于任何领域、任何 agent 的 harness 工程通用原则。
|
||||
|
||||
启示不是 "复制 Claude Code"。启示是:**最好的 agent 产品,出自那些明白自己的工作是 harness 而非 intelligence 的工程师之手。**
|
||||
|
||||
@@ -159,32 +159,48 @@ Claude Code = 一个 agent loop
|
||||
让 agent 在特定领域高效工作的 harness。
|
||||
```
|
||||
|
||||
**12 个递进式课程, 从简单循环到隔离化的自治执行。**
|
||||
**20 个递进式课程, 从简单循环到完整 Harness。**
|
||||
**每个课程添加一个 harness 机制。每个机制有一句格言。**
|
||||
|
||||
> **s01** *"One loop & Bash is all you need"* — 一个工具 + 一个循环 = 一个 Agent
|
||||
>
|
||||
> **s02** *"加一个工具, 只加一个 handler"* — 循环不用动, 新工具注册进 dispatch map 就行
|
||||
>
|
||||
> **s03** *"没有计划的 agent 走哪算哪"* — 先列步骤再动手, 完成率翻倍
|
||||
> **s03** *"先划边界, 再给自由"* — 先判断操作能不能做,要不要问用户
|
||||
>
|
||||
> **s04** *"大任务拆小, 每个小任务干净的上下文"* — Subagent 用独立 messages[], 不污染主对话
|
||||
> **s04** *"挂在循环上, 不写进循环里"* — 在工具前后留插口,不改主循环也能扩展
|
||||
>
|
||||
> **s05** *"用到什么知识, 临时加载什么知识"* — 通过 tool_result 注入, 不塞 system prompt
|
||||
> **s05** *"没有计划的 agent 走哪算哪"* — 先列步骤再动手, 完成率翻倍
|
||||
>
|
||||
> **s06** *"上下文总会满, 要有办法腾地方"* — 三层压缩策略, 换来无限会话
|
||||
> **s06** *"大任务拆小, 每个小任务干净的上下文"* — 子 Agent 自己干活,只把结果带回来
|
||||
>
|
||||
> **s07** *"大目标要拆成小任务, 排好序, 记在磁盘上"* — 文件持久化的任务图, 为多 agent 协作打基础
|
||||
> **s07** *"用到时再加载, 别全塞 prompt 里"* — 技能先列目录,用到时再展开
|
||||
>
|
||||
> **s08** *"慢操作丢后台, agent 继续想下一步"* — 后台线程跑命令, 完成后注入通知
|
||||
> **s08** *"上下文总会满, 要有办法腾地方"* — 四层压缩策略, 便宜的先跑贵的后跑
|
||||
>
|
||||
> **s09** *"任务太大一个人干不完, 要能分给队友"* — 持久化队友 + 异步邮箱
|
||||
> **s09** *"记住该记的, 忘掉该忘的"* — 三个子系统: 筛选、提取、整理
|
||||
>
|
||||
> **s10** *"队友之间要有统一的沟通规矩"* — 一个 request-response 模式驱动所有协商
|
||||
> **s10** *"prompt 是组装出来的, 不是写死的"* — 分段 + 按需拼接
|
||||
>
|
||||
> **s11** *"队友自己看看板, 有活就认领"* — 不需要领导逐个分配, 自组织
|
||||
> **s11** *"错误不是终点, 是重试的起点"* — 出错时会重试、腾空间、换路子
|
||||
>
|
||||
> **s12** *"各干各的目录, 互不干扰"* — 任务管目标, worktree 管目录, 按 ID 绑定
|
||||
> **s12** *"大目标拆成小任务, 排好序, 持久化"* — 文件持久化的任务图, 多 agent 协作的基础
|
||||
>
|
||||
> **s13** *"慢操作丢后台, agent 继续思考"* — 后台线程跑命令, 完成后注入通知
|
||||
>
|
||||
> **s14** *"定时触发, 不需要人推"* — 按时间自动触发任务
|
||||
>
|
||||
> **s15** *"一个搞不定, 组队来"* — 持久化队友 + 异步邮箱
|
||||
>
|
||||
> **s16** *"队友之间要有约定"* — 用固定的请求-回复格式沟通
|
||||
>
|
||||
> **s17** *"队友自己看板, 有活就认领"* — 不需要领导逐个分配, 自组织
|
||||
>
|
||||
> **s18** *"各干各的目录, 互不干扰"* — 任务管目标, worktree 管目录, 按 ID 绑定
|
||||
>
|
||||
> **s19** *"能力不够? 插上 MCP"* — 把外部工具接进同一个工具池
|
||||
>
|
||||
> **s20** *"机制很多,循环一个"* — 前面所有机制回到一个完整 harness
|
||||
|
||||
---
|
||||
|
||||
@@ -217,6 +233,35 @@ def agent_loop(messages):
|
||||
|
||||
每个课程在这个循环之上叠加一个 harness 机制 -- 循环本身始终不变。循环属于 agent。机制属于 harness。
|
||||
|
||||
## 版本说明
|
||||
|
||||
本仓库现在同时保留两条教程线:
|
||||
|
||||
- **新版主线:根目录 `s01-s20`**
|
||||
根目录下的 `s01_*` 到 `s20_*` 是新的主版本,也是当前推荐阅读路径。每章包含完整叙事 README、英文/日文译本、可运行的 `code.py`,以及必要的图示。
|
||||
- **旧版过渡:`docs/`、`agents/`、当前 `web/`**
|
||||
这些仍保留旧 12 章体系,暂时用于已有读者、旧链接和 Web 平台过渡。
|
||||
|
||||
新读者请从根目录 `s01_agent_loop/` 读到 `s20_comprehensive/`。如果你是从旧链接或当前 Web 平台进入,大概率看到的是旧 12 章版本。旧版章节号和新版不完全一致,不要混用章节号。
|
||||
|
||||
### 旧版到新版的对应关系
|
||||
|
||||
| 旧 12 章版本 | 新 20 章版本 | 主题 |
|
||||
|---|---|---|
|
||||
| 旧 s01 | 新 s01 | Agent Loop |
|
||||
| 旧 s02 | 新 s02 | Tool Use |
|
||||
| 旧 s03 | 新 s05 | TodoWrite |
|
||||
| 旧 s04 | 新 s06 | Subagent |
|
||||
| 旧 s05 | 新 s07 | Skill Loading |
|
||||
| 旧 s06 | 新 s08 | Context Compact |
|
||||
| 旧 s07 | 新 s12 | Task System |
|
||||
| 旧 s08 | 新 s13 | Background Tasks |
|
||||
| 旧 s09 | 新 s15 | Agent Teams |
|
||||
| 旧 s10 | 新 s16 | Team Protocols |
|
||||
| 旧 s11 | 新 s17 | Autonomous Agents |
|
||||
| 旧 s12 | 新 s18 | Worktree Isolation |
|
||||
| 新版新增 | s03、s04、s09、s10、s11、s14、s19、s20 | Permission、Hooks、Memory、System Prompt、Error Recovery、Cron、MCP、Comprehensive Agent |
|
||||
|
||||
## 范围说明 (重要)
|
||||
|
||||
本仓库是一个 0->1 的 harness 工程学习项目 -- 构建围绕 agent 模型的工作环境。
|
||||
@@ -232,20 +277,30 @@ def agent_loop(messages):
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 新版 20 章主线
|
||||
|
||||
```sh
|
||||
git clone https://github.com/shareAI-lab/learn-claude-code
|
||||
cd learn-claude-code
|
||||
pip install -r requirements.txt
|
||||
cp .env.example .env # 编辑 .env 填入你的 ANTHROPIC_API_KEY
|
||||
|
||||
python agents/s01_agent_loop.py # 从这里开始
|
||||
python agents/s12_worktree_task_isolation.py # 完整递进终点
|
||||
python agents/s_full.py # 总纲: 全部机制合一
|
||||
python s01_agent_loop/code.py # 起点 — 一个循环 + bash
|
||||
python s08_context_compact/code.py # 上下文压缩(复杂章)
|
||||
python s20_comprehensive/code.py # 终点章: 全部机制归到一个循环
|
||||
```
|
||||
|
||||
### 旧版 12 章过渡线
|
||||
|
||||
```sh
|
||||
python agents/s01_agent_loop.py
|
||||
python agents/s12_worktree_task_isolation.py
|
||||
python agents/s_full.py
|
||||
```
|
||||
|
||||
### Web 平台
|
||||
|
||||
交互式可视化、分步动画、源码查看器, 以及每个课程的文档。
|
||||
当前 Web 平台仍读取 `docs/` 中的旧 12 章内容。新版 20 章请直接阅读根目录 `s01-s20`。
|
||||
|
||||
```sh
|
||||
cd web && npm install && npm run dev # http://localhost:3000
|
||||
@@ -253,73 +308,101 @@ cd web && npm install && npm run dev # http://localhost:3000
|
||||
|
||||
## 学习路径
|
||||
|
||||
```
|
||||
第一阶段: 循环 第二阶段: 规划与知识
|
||||
================== ==============================
|
||||
s01 Agent Loop [1] s03 TodoWrite [5]
|
||||
while + stop_reason TodoManager + nag 提醒
|
||||
| |
|
||||
+-> s02 Tool Use [4] s04 Subagent [5]
|
||||
dispatch map: name->handler 每个 Subagent 独立 messages[]
|
||||
|
|
||||
s05 Skills [5]
|
||||
SKILL.md 通过 tool_result 注入
|
||||
|
|
||||
s06 Context Compact [5]
|
||||
三层 Context Compact
|
||||
主线:能动手 → 能做复杂任务 → 能记住和恢复 → 能长期运行 → 能协作 → 能扩展并合体
|
||||
|
||||
第三阶段: 持久化 第四阶段: 团队
|
||||
================== =====================
|
||||
s07 Task System [8] s09 Agent Teams [9]
|
||||
文件持久化 CRUD + 依赖图 队友 + JSONL 邮箱
|
||||
| |
|
||||
s08 Background Tasks [6] s10 Team Protocols [12]
|
||||
守护线程 + 通知队列 关机 + 计划审批 FSM
|
||||
|
|
||||
s11 Autonomous Agents [14]
|
||||
空闲轮询 + 自动认领
|
||||
|
|
||||
s12 Worktree Isolation [16]
|
||||
Task 协调 + 按需隔离执行通道
|
||||
```mermaid
|
||||
flowchart TD
|
||||
%% 统一定义卡片样式:加入 text-align:left 保证列表不会居中乱飘
|
||||
classDef stage1 fill:#E3F2FD,stroke:#1976D2,stroke-width:2px,color:#0D47A1,rx:12,ry:12,text-align:left
|
||||
classDef stage2 fill:#E8F5E9,stroke:#388E3C,stroke-width:2px,color:#1B5E20,rx:12,ry:12,text-align:left
|
||||
classDef stage3 fill:#FFF3E0,stroke:#F57C00,stroke-width:2px,color:#E65100,rx:12,ry:12,text-align:left
|
||||
classDef stage4 fill:#FCE4EC,stroke:#C2185b,stroke-width:2px,color:#880E4F,rx:12,ry:12,text-align:left
|
||||
classDef stage5 fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#4A148C,rx:12,ry:12,text-align:left
|
||||
classDef stage6 fill:#E0F7FA,stroke:#0097A7,stroke-width:2px,color:#006064,rx:12,ry:12,text-align:left
|
||||
|
||||
%% 背景框样式
|
||||
classDef groupBox fill:#F8F9FA,stroke:#CED4DA,stroke-width:2px,stroke-dasharray: 5 5,rx:15,ry:15,color:#495057
|
||||
|
||||
%% 第一层:1-3阶段
|
||||
subgraph Phase1 ["🌱 阶段 1-3:基础能力构建(从简单到复杂)"]
|
||||
direction LR
|
||||
S1["<b>第一阶段:让 Agent 能动手</b><br/>━━━━━━━━━━━━━<br/><b>s01 Agent Loop</b><br/>└─ 一个循环 + bash<br/><br/><b>s02 Tool Use</b><br/>└─ 单个到多个工具<br/><br/><b>s03 Permission</b><br/>└─ 判断能不能做<br/><br/><b>s04 Hooks</b><br/>└─ 工具前后留扩展插口"]:::stage1
|
||||
|
||||
[N] = 工具数量
|
||||
S2["<b>第二阶段:做复杂任务</b><br/>━━━━━━━━━━━━━<br/><b>s05 TodoWrite</b><br/>└─ 先列计划,再执行<br/><br/><b>s06 Subagent</b><br/>└─ 子节点干活带回结果<br/><br/><b>s08 Context Compact</b><br/>└─ 长下文腾空间"]:::stage2
|
||||
|
||||
S3["<b>第三阶段:记住和恢复</b><br/>━━━━━━━━━━━━━<br/><b>s09 Memory</b><br/>└─ 该记记,该忘忘<br/><br/><b>s10 System Prompt</b><br/>└─ 运行时组装<br/><br/><b>s11 Error Recovery</b><br/>└─ 重试换路子"]:::stage3
|
||||
|
||||
S1 ==> S2 ==> S3
|
||||
end
|
||||
|
||||
%% 第二层:4-6阶段
|
||||
subgraph Phase2 ["🚀 阶段 4-6:高阶能力进化(长期、协作与融合)"]
|
||||
direction LR
|
||||
S4["<b>第四阶段:让任务长期运行</b><br/>━━━━━━━━━━━━━<br/><b>s12 Task System</b><br/>└─ 任务落盘记依赖<br/><br/><b>s13 Background Tasks</b><br/>└─ 慢操作丢后台<br/><br/><b>s14 Cron Scheduler</b><br/>└─ 按时自动触发"]:::stage4
|
||||
|
||||
S5["<b>第五阶段:让多个 Agent 协作</b><br/>━━━━━━━━━━━━━<br/><b>s15 Agent Teams</b><br/>└─ 队友 + 邮箱通信<br/><br/><b>s16 Team Protocols</b><br/>└─ 固定收发格式<br/><br/><b>s17 Autonomous Agents</b><br/>└─ 自己看板认领活<br/><br/><b>s18 Worktree Isolation</b><br/>└─ 隔离目录"]:::stage5
|
||||
|
||||
S6["<b>第六阶段:接外部能力合体</b><br/>━━━━━━━━━━━━━<br/><b>s07 Skill Loading</b><br/>└─ 技能按需展开<br/><br/><b>s19 MCP Plugin</b><br/>└─ 外部接进工具池<br/><br/><b>s20 Comprehensive Agent</b><br/>└─ 全机制回单循环"]:::stage6
|
||||
|
||||
S4 ==> S5 ==> S6
|
||||
end
|
||||
|
||||
%% 将两个模块连接起来,形成 Z 字形阅读流
|
||||
Phase1 ===> Phase2
|
||||
|
||||
%% 应用背景样式
|
||||
class Phase1,Phase2 groupBox
|
||||
```
|
||||
|
||||
## 全部章节
|
||||
|
||||
| 章节 | 主题 | 关键概念 |
|
||||
|---|---|---|
|
||||
| [s01](./s01_agent_loop/) | Agent Loop | `messages` / `while True` / `stop_reason` |
|
||||
| [s02](./s02_tool_use/) | Tool Use | `TOOL_HANDLERS` / dispatch map / 并发 |
|
||||
| [s03](./s03_permission/) | Permission | `PermissionRule` / 审批管线 |
|
||||
| [s04](./s04_hooks/) | Hooks | `PreToolUse` / `PostToolUse` / 扩展点 |
|
||||
| [s05](./s05_todo_write/) | TodoWrite | `TodoItem` / 先计划后执行 |
|
||||
| [s06](./s06_subagent/) | Subagent | `fresh messages[]` / 上下文隔离 |
|
||||
| [s07](./s07_skill_loading/) | Skill Loading | `SkillManifest` / 按需注入 |
|
||||
| [s08](./s08_context_compact/) | Context Compact | snip / micro / budget / auto 四层压缩 |
|
||||
| [s09](./s09_memory/) | Memory | selection / extraction / consolidation |
|
||||
| [s10](./s10_system_prompt/) | System Prompt | 运行时组装 / 分段拼接 |
|
||||
| [s11](./s11_error_recovery/) | Error Recovery | token 升级 / fallback 模型 / 重试策略 |
|
||||
| [s12](./s12_task_system/) | Task System | `TaskRecord` / `blockedBy` / 磁盘持久化 |
|
||||
| [s13](./s13_background_tasks/) | Background Tasks | 线程执行 / 通知队列 |
|
||||
| [s14](./s14_cron_scheduler/) | Cron Scheduler | 持久化调度 / 会话级触发 |
|
||||
| [s15](./s15_agent_teams/) | Agent Teams | `MessageBus` / 收件箱 / 权限冒泡 |
|
||||
| [s16](./s16_team_protocols/) | Team Protocols | 关机握手 / 计划审批 |
|
||||
| [s17](./s17_autonomous_agents/) | Autonomous Agents | 空闲循环 / 自动认领 |
|
||||
| [s18](./s18_worktree_isolation/) | Worktree Isolation | `WorktreeRecord` / 任务-目录绑定 |
|
||||
| [s19](./s19_mcp_plugin/) | MCP Plugin | 多传输 / 通道路由 / 工具池组装 |
|
||||
| [s20](./s20_comprehensive/) | Comprehensive Agent | 全部机制归到一个循环 |
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
learn-claude-code/
|
||||
|
|
||||
|-- agents/ # Python 参考实现 (s01-s12 + s_full 总纲)
|
||||
|-- docs/{en,zh,ja}/ # 心智模型优先的文档 (3 种语言)
|
||||
|-- web/ # 交互式学习平台 (Next.js)
|
||||
|-- skills/ # s05 的 Skill 文件
|
||||
+-- .github/workflows/ci.yml # CI: 类型检查 + 构建
|
||||
s01_agent_loop/ # 每章一个文件夹
|
||||
README.md # 中文源文档(完整叙事)
|
||||
README.en.md # 英文译本
|
||||
README.ja.md # 日文译本
|
||||
code.py # 独立可运行代码
|
||||
images/ # SVG 流程图
|
||||
s02_tool_use/
|
||||
...
|
||||
s19_mcp_plugin/
|
||||
s20_comprehensive/ # 终点章
|
||||
agents/ # 旧 12 章可运行副本 + s_full.py
|
||||
skills/ # s07 使用的 skill 文件
|
||||
docs/ # 旧 12 章文档,过渡期保留
|
||||
web/ # 当前仍基于 docs/ 旧版内容生成
|
||||
tests/
|
||||
```
|
||||
|
||||
## 文档
|
||||
|
||||
心智模型优先: 问题、方案、ASCII 图、最小化代码。
|
||||
[English](./docs/en/) | [中文](./docs/zh/) | [日本語](./docs/ja/)
|
||||
|
||||
| 课程 | 主题 | 格言 |
|
||||
|------|------|------|
|
||||
| [s01](./docs/zh/s01-the-agent-loop.md) | Agent Loop | *One loop & Bash is all you need* |
|
||||
| [s02](./docs/zh/s02-tool-use.md) | Tool Use | *加一个工具, 只加一个 handler* |
|
||||
| [s03](./docs/zh/s03-todo-write.md) | TodoWrite | *没有计划的 agent 走哪算哪* |
|
||||
| [s04](./docs/zh/s04-subagent.md) | Subagent | *大任务拆小, 每个小任务干净的上下文* |
|
||||
| [s05](./docs/zh/s05-skill-loading.md) | Skills | *用到什么知识, 临时加载什么知识* |
|
||||
| [s06](./docs/zh/s06-context-compact.md) | Context Compact | *上下文总会满, 要有办法腾地方* |
|
||||
| [s07](./docs/zh/s07-task-system.md) | Task System | *大目标要拆成小任务, 排好序, 记在磁盘上* |
|
||||
| [s08](./docs/zh/s08-background-tasks.md) | Background Tasks | *慢操作丢后台, agent 继续想下一步* |
|
||||
| [s09](./docs/zh/s09-agent-teams.md) | Agent Teams | *任务太大一个人干不完, 要能分给队友* |
|
||||
| [s10](./docs/zh/s10-team-protocols.md) | Team Protocols | *队友之间要有统一的沟通规矩* |
|
||||
| [s11](./docs/zh/s11-autonomous-agents.md) | Autonomous Agents | *队友自己看看板, 有活就认领* |
|
||||
| [s12](./docs/zh/s12-worktree-task-isolation.md) | Worktree + Task Isolation | *各干各的目录, 互不干扰* |
|
||||
|
||||
## 学完之后 -- 从理解到落地
|
||||
|
||||
12 个课程走完, 你已经从内到外理解了 harness 工程的运作原理。两种方式把知识变成产品:
|
||||
20 个课程走完, 你已经从内到外理解了 harness 工程的运作原理。两种方式把知识变成产品:
|
||||
|
||||
### Kode Agent CLI -- 开源 Coding Agent CLI
|
||||
|
||||
|
||||
493
README.md
@@ -1,53 +1,54 @@
|
||||
[English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md)
|
||||
# Learn Claude Code -- Harness Engineering for Real Agents
|
||||
|
||||
<a href="https://trendshift.io/repositories/19746" target="_blank"><img src="https://trendshift.io/api/badge/repositories/19746" alt="shareAI-lab%2Flearn-claude-code | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
||||
|
||||
# Learn Claude Code -- Harness Engineering for Real Agents
|
||||
|
||||
## Agency Comes from the Model. An Agent Product = Model + Harness.
|
||||
|
||||
Before we talk about code, let's get one thing straight.
|
||||
Before we write any code, one thing needs to be clear.
|
||||
|
||||
**Agency -- the ability to perceive, reason, and act -- comes from model training, not from external code orchestration.** But a working agent product needs both the model and the harness. The model is the driver, the harness is the vehicle. This repo teaches you how to build the vehicle.
|
||||
**Agency -- the capacity to perceive, reason, and act -- comes from model training, not from external code orchestration.** But a working agent product needs both the model and the harness. The model is the driver. The harness is the vehicle. This repository teaches you how to build the vehicle.
|
||||
|
||||
### Where Agency Comes From
|
||||
|
||||
At the core of every agent is a neural network -- a Transformer, an RNN, a learned function -- that has been trained, through billions of gradient updates on action-sequence data, to perceive an environment, reason about goals, and take actions. Agency is never granted by the surrounding code. It is learned by the model during training.
|
||||
At the core of every agent is a neural network -- a Transformer, an RNN, a trained function -- shaped by billions of gradient updates on sequences of perception, reasoning, and action. Agency was never bestowed by the surrounding code. It was learned during training.
|
||||
|
||||
Humans are the best example. A biological neural network shaped by millions of years of evolutionary training, perceiving the world through senses, reasoning through a brain, acting through a body. When DeepMind, OpenAI, or Anthropic say "agent," the core of what they mean is always the same thing: **a model that has learned to act, plus the infrastructure that lets it operate in a specific environment.**
|
||||
Humans are the original proof. A biological neural network, refined by millions of years of evolutionary pressure, perceives the world through senses, reasons through a brain, and acts through a body. When DeepMind, OpenAI, or Anthropic say "agent," they all mean the same core thing: **a model that learned to act through training, plus the infrastructure that lets it operate in a specific environment.**
|
||||
|
||||
The proof is written in history:
|
||||
The historical record is unambiguous:
|
||||
|
||||
- **2013 -- DeepMind DQN plays Atari.** A single neural network, receiving only raw pixels and game scores, learned to play 7 Atari 2600 games -- surpassing all prior algorithms and beating human experts on 3 of them. By 2015, the same architecture scaled to [49 games and matched professional human testers](https://www.nature.com/articles/nature14236), published in *Nature*. No game-specific rules. No decision trees. One model, learning from experience. That model was the agent.
|
||||
- **2013 -- DeepMind DQN plays Atari.** A single neural network, receiving only raw pixels and game scores, learned 7 Atari 2600 games -- surpassing prior algorithms and beating human experts in 3 of them. By 2015, scaled to [49 games at professional tester level](https://www.nature.com/articles/nature14236), published in *Nature*. No game-specific rules. One model, learning from experience.
|
||||
|
||||
- **2019 -- OpenAI Five conquers Dota 2.** Five neural networks, having played [45,000 years of Dota 2](https://openai.com/index/openai-five-defeats-dota-2-world-champions/) against themselves in 10 months, defeated **OG** -- the reigning TI8 world champions -- 2-0 on a San Francisco livestream. In a subsequent public arena, the AI won 99.4% of 42,729 games against all comers. No scripted strategies. No meta-programmed team coordination. The models learned teamwork, tactics, and real-time adaptation entirely through self-play.
|
||||
- **2019 -- OpenAI Five conquers Dota 2.** Five neural networks played [45,000 years of Dota 2 against themselves](https://openai.com/index/openai-five-defeats-dota-2-world-champions/) over 10 months, then defeated **OG** -- the TI8 world champions -- 2-0 in a live match. In the public arena, the AI won 99.4% of 42,729 games. No scripted strategies. Models learned teamwork through self-play.
|
||||
|
||||
- **2019 -- DeepMind AlphaStar masters StarCraft II.** AlphaStar [beat professional players 10-1](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/) in a closed-door match, and later achieved [Grandmaster status](https://www.nature.com/articles/d41586-019-03298-6) on European servers -- top 0.15% of 90,000 players. A game with imperfect information, real-time decisions, and a combinatorial action space that dwarfs chess and Go. The agent? A model. Trained. Not scripted.
|
||||
- **2019 -- DeepMind AlphaStar masters StarCraft II.** AlphaStar [beat a professional player 10-1](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/) in closed matches, then reached [Grandmaster rank](https://www.nature.com/articles/d41586-019-03298-6) on the European server -- top 0.15% of 90,000 players. An incomplete-information, real-time game with a combinatorial action space far exceeding chess or Go.
|
||||
|
||||
- **2019 -- Tencent Jueyu dominates Honor of Kings.** Tencent AI Lab's "Jueyu" [defeated KPL professional players](https://www.jiemian.com/article/3371171.html) in a full 5v5 match at the World Champion Cup. In 1v1 mode, pros won only [1 out of 15 games and never survived past 8 minutes](https://developer.aliyun.com/article/851058). Training intensity: one day equaled 440 human years. By 2021, Jueyu surpassed KPL pros across the full hero pool. No handcrafted matchup tables. No scripted compositions. A model that learned the entire game from scratch through self-play.
|
||||
- **2019 -- Tencent Jueyu dominates Honor of Kings.** Tencent AI Lab's "Jueyu" system [defeated KPL professional players in full 5v5](https://www.jiemian.com/article/3371171.html) at the World Champion Cup semifinal. In 1v1 mode, pros [won just 1 out of 15 matches, lasting under 8 minutes at best](https://developer.aliyun.com/article/851058). Training intensity: one day equaled 440 human years. A model that learned the entire game from scratch through self-play.
|
||||
|
||||
- **2024-2025 -- LLM agents reshape software engineering.** Claude, GPT, Gemini -- large language models trained on the entirety of human code and reasoning -- are deployed as coding agents. They read codebases, write implementations, debug failures, coordinate in teams. The architecture is identical to every agent before them: a trained model, placed in an environment, given tools to perceive and act. The only difference is the scale of what they've learned and the generality of the tasks they solve.
|
||||
- **2024-2025 -- LLM agents reshape software engineering.** Claude, GPT, Gemini -- large language models trained on the full breadth of human code and reasoning -- are deployed as coding agents. They read codebases, write implementations, debug failures, and coordinate as teams. The architecture is identical to every previous agent: a trained model, placed in an environment, given tools for perception and action.
|
||||
|
||||
Every one of these milestones points to the same fact: **agency -- the ability to perceive, reason, and act -- is trained, not coded.** But every agent also needed an environment to operate in: the Atari emulator, the Dota 2 client, the StarCraft II engine, the IDE and terminal. The model provides intelligence. The environment provides the action space. Together they form a complete agent.
|
||||
Every milestone points to the same fact: **Agency -- the ability to perceive, reason, and act -- is trained, not coded.** But every agent also needs an environment to operate in: an Atari emulator, the Dota 2 client, the StarCraft II engine, an IDE and a terminal. The model supplies the intelligence. The environment supplies the action space. Together they form a complete agent.
|
||||
|
||||
### What an Agent Is NOT
|
||||
|
||||
The word "agent" has been hijacked by an entire cottage industry of prompt plumbing.
|
||||
The word "agent" has been hijacked by an entire prompt-plumbing industry.
|
||||
|
||||
Drag-and-drop workflow builders. No-code "AI agent" platforms. Prompt-chain orchestration libraries. They all share the same delusion: that wiring together LLM API calls with if-else branches, node graphs, and hardcoded routing logic constitutes "building an agent."
|
||||
Drag-and-drop workflow builders. No-code "AI Agent" platforms. Prompt-chain orchestration libraries. They share a single delusion: that stringing LLM API calls together with if-else branches, node graphs, and hardcoded routing logic constitutes "building an agent."
|
||||
|
||||
It doesn't. What they build is a Rube Goldberg machine -- an over-engineered, brittle pipeline of procedural rules, with an LLM wedged in as a glorified text-completion node. That is not an agent. That is a shell script with delusions of grandeur.
|
||||
It does not. What they produce are Rube Goldberg machines -- over-engineered, brittle, procedural rule pipelines with an LLM wedged in as a glorified text-completion node. That is not an agent. That is a shell script with grandiose pretensions.
|
||||
|
||||
**Prompt plumbing "agents" are the fantasy of programmers who don't train models.** They attempt to brute-force intelligence by stacking procedural logic -- massive rule trees, node graphs, chain-of-prompt waterfalls -- and praying that enough glue code will somehow emergently produce autonomous behavior. It won't. You cannot engineer your way to agency. Agency is learned, not programmed.
|
||||
You cannot brute-force intelligence by stacking procedural logic -- sprawling rule trees, node graphs, chained prompt waterfalls -- and praying that enough glue code will spontaneously produce autonomous behavior. It will not. You cannot engineer agency into existence. Agency is learned, not coded.
|
||||
|
||||
Those systems are dead on arrival: fragile, unscalable, fundamentally incapable of generalization. They are the modern resurrection of GOFAI (Good Old-Fashioned AI) -- the symbolic rule systems the field abandoned decades ago, now spray-painted with an LLM veneer. Different packaging, same dead end.
|
||||
### The Mindshift: From "Building Agents" to Building Harnesses
|
||||
|
||||
### The Mind Shift: From "Developing Agents" to Developing Harness
|
||||
When someone says "I am building an agent," they can only mean one of two things:
|
||||
|
||||
When someone says "I'm developing an agent," they can only mean one of two things:
|
||||
**1. Training a model.** Adjusting weights through reinforcement learning, fine-tuning, RLHF, or another gradient-based method. Collecting trajectory data -- real-world sequences of perception, reasoning, and action in a target domain -- and using it to shape the model's behavior. This is what DeepMind, OpenAI, Tencent AI Lab, and Anthropic do.
|
||||
|
||||
**1. Training the model.** Adjusting weights through reinforcement learning, fine-tuning, RLHF, or other gradient-based methods. Collecting task-process data -- the actual sequences of perception, reasoning, and action in real domains -- and using it to shape the model's behavior. This is what DeepMind, OpenAI, Tencent AI Lab, and Anthropic do. This is agent development in the truest sense.
|
||||
**2. Building a harness.** Writing the code that gives a model an operational environment. This is what most of us do, and it is the core of this repository.
|
||||
|
||||
**2. Building the harness.** Writing the code that gives the model an environment to operate in. This is what most of us do, and it is the focus of this repository.
|
||||
|
||||
A harness is everything the agent needs to function in a specific domain:
|
||||
A harness is everything an agent needs to work in a specific domain:
|
||||
|
||||
```
|
||||
Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
|
||||
@@ -56,83 +57,55 @@ Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
|
||||
Knowledge: product docs, domain references, API specs, style guides
|
||||
Observation: git diff, error logs, browser state, sensor data
|
||||
Action: CLI commands, API calls, UI interactions
|
||||
Permissions: sandboxing, approval workflows, trust boundaries
|
||||
Permissions: sandbox isolation, approval workflows, trust boundaries
|
||||
```
|
||||
|
||||
The model decides. The harness executes. The model reasons. The harness provides context. The model is the driver. The harness is the vehicle.
|
||||
|
||||
**A coding agent's harness is its IDE, terminal, and filesystem access.** A farm agent's harness is its sensor array, irrigation controls, and weather data feeds. A hotel agent's harness is its booking system, guest communication channels, and facility management APIs. The agent -- the intelligence, the decision-maker -- is always the model. The harness changes per domain. The agent generalizes across them.
|
||||
|
||||
This repo teaches you to build vehicles. Vehicles for coding. But the design patterns generalize to any domain: farm management, hotel operations, manufacturing, logistics, healthcare, education, scientific research. Anywhere a task needs to be perceived, reasoned about, and acted upon -- an agent needs a harness.
|
||||
This repository teaches you to build the vehicle. A vehicle for coding. But the design patterns generalize to any domain.
|
||||
|
||||
### What Harness Engineers Actually Do
|
||||
|
||||
If you are reading this repository, you are likely a harness engineer -- and that is a powerful thing to be. Here is your real job:
|
||||
If you are reading this repository, you are most likely a harness engineer. Here is what the job actually entails:
|
||||
|
||||
- **Implement tools.** Give the agent hands. File read/write, shell execution, API calls, browser control, database queries. Each tool is an action the agent can take in its environment. Design them to be atomic, composable, and well-described.
|
||||
- **Implement tools.** Give the agent hands. File read/write, shell execution, API calls, browser control, database queries. Each tool is one action the agent can take in its environment. Design them atomic, composable, and clearly described.
|
||||
|
||||
- **Curate knowledge.** Give the agent domain expertise. Product documentation, architectural decision records, style guides, regulatory requirements. Load them on-demand (s05), not upfront. The agent should know what's available and pull what it needs.
|
||||
- **Curate knowledge.** Give the agent domain expertise. Product documentation, architecture decision records, style guides, compliance requirements. Load on demand, not upfront.
|
||||
|
||||
- **Manage context.** Give the agent clean memory. Subagent isolation (s04) prevents noise from leaking. Context compression (s06) prevents history from overwhelming. Task systems (s07) persist goals beyond any single conversation.
|
||||
- **Manage context.** Give the agent clean memory. Subagent isolation prevents noise leakage. Context compaction prevents history from drowning the present. Task systems let goals persist beyond a single conversation.
|
||||
|
||||
- **Control permissions.** Give the agent boundaries. Sandbox file access. Require approval for destructive operations. Enforce trust boundaries between the agent and external systems. This is where safety engineering meets harness engineering.
|
||||
- **Control permissions.** Give the agent boundaries. Sandbox file access. Require approval for destructive operations. Enforce trust boundaries between the agent and external systems.
|
||||
|
||||
- **Collect task-process data.** Every action sequence the agent executes in your harness is training signal. The perception-reasoning-action traces from real deployments are the raw material for fine-tuning the next generation of agent models. Your harness doesn't just serve the agent -- it can help improve the agent.
|
||||
- **Collect trajectory data.** Every action sequence the agent executes in your harness is training signal. Real deployment trajectories are the raw material for fine-tuning the next generation of agent models.
|
||||
|
||||
You are not writing the intelligence. You are building the world the intelligence inhabits. The quality of that world -- how clearly the agent can perceive, how precisely it can act, how rich its available knowledge is -- directly determines how effectively the intelligence can express itself.
|
||||
You are not writing intelligence. You are building the world that intelligence inhabits. The quality of that world directly determines how effectively the intelligence can express itself.
|
||||
|
||||
**Build great harnesses. The agent will do the rest.**
|
||||
**Build the harness well. The model will do the rest.**
|
||||
|
||||
### Why Claude Code -- A Masterclass in Harness Engineering
|
||||
### Why Claude Code
|
||||
|
||||
Why does this repository dissect Claude Code specifically?
|
||||
Because Claude Code is the most elegant, most complete agent harness implementation we have seen. Not because of any clever trick, but because of what it *does not* do: it does not try to be the agent. It does not impose rigid workflows. It does not substitute hand-crafted decision trees for the model's own judgment. It gives the model tools, knowledge, context management, and permission boundaries -- then gets out of the way.
|
||||
|
||||
Because Claude Code is the most elegant and fully-realized agent harness we have seen. Not because of any single clever trick, but because of what it *doesn't* do: it doesn't try to be the agent. It doesn't impose rigid workflows. It doesn't second-guess the model with elaborate decision trees. It provides the model with tools, knowledge, context management, and permission boundaries -- then gets out of the way.
|
||||
|
||||
Look at what Claude Code actually is, stripped to its essence:
|
||||
Strip Claude Code down to its essence:
|
||||
|
||||
```
|
||||
Claude Code = one agent loop
|
||||
+ tools (bash, read, write, edit, glob, grep, browser...)
|
||||
+ on-demand skill loading
|
||||
+ context compression
|
||||
+ context compaction
|
||||
+ subagent spawning
|
||||
+ task system with dependency graph
|
||||
+ team coordination with async mailboxes
|
||||
+ worktree isolation for parallel execution
|
||||
+ task system with dependency graphs
|
||||
+ async mailbox team coordination
|
||||
+ worktree-isolated parallel execution
|
||||
+ permission governance
|
||||
+ hooks extension system
|
||||
+ memory persistence
|
||||
+ MCP external capability routing
|
||||
```
|
||||
|
||||
That's it. That's the entire architecture. Every component is a harness mechanism -- a piece of the world built for the agent to inhabit. The agent itself? It's Claude. A model. Trained by Anthropic on the full breadth of human reasoning and code. The harness doesn't make Claude smart. Claude is already smart. The harness gives Claude hands, eyes, and a workspace.
|
||||
That is it. The agent itself? Claude. A model. Trained by Anthropic on the full breadth of human reasoning and code. The harness did not make Claude smart. Claude was already smart. The harness gave Claude hands, eyes, and a workspace.
|
||||
|
||||
This is why Claude Code is the ideal teaching subject: **it demonstrates what happens when you trust the model and focus your engineering on the harness.** Every session in this repository (s01-s12) reverse-engineers one harness mechanism from Claude Code's architecture. By the end, you understand not just how Claude Code works, but the universal principles of harness engineering that apply to any agent in any domain.
|
||||
|
||||
The lesson is not "copy Claude Code." The lesson is: **the best agent products are built by engineers who understand that their job is harness, not intelligence.**
|
||||
|
||||
---
|
||||
|
||||
## The Vision: Fill the Universe with Real Agents
|
||||
|
||||
This is not just about coding agents.
|
||||
|
||||
Every domain where humans perform complex, multi-step, judgment-intensive work is a domain where agents can operate -- given the right harness. The patterns in this repository are universal:
|
||||
|
||||
```
|
||||
Estate management agent = model + property sensors + maintenance tools + tenant comms
|
||||
Agricultural agent = model + soil/weather data + irrigation controls + crop knowledge
|
||||
Hotel operations agent = model + booking system + guest channels + facility APIs
|
||||
Medical research agent = model + literature search + lab instruments + protocol docs
|
||||
Manufacturing agent = model + production line sensors + quality controls + logistics
|
||||
Education agent = model + curriculum knowledge + student progress + assessment tools
|
||||
```
|
||||
|
||||
The loop is always the same. The tools change. The knowledge changes. The permissions change. The agent -- the model -- generalizes.
|
||||
|
||||
Every harness engineer reading this repository is learning patterns that apply far beyond software engineering. You are learning to build the infrastructure for an intelligent, automated future. Every well-designed harness deployed in a real domain is one more place where an agent can perceive, reason, and act.
|
||||
|
||||
First we fill the workshops. Then the farms, the hospitals, the factories. Then the cities. Then the planet.
|
||||
|
||||
**Bash is all you need. Real agents are all the universe needs.**
|
||||
The takeaway is not "copy Claude Code." The takeaway is: **the best agent products come from engineers who understand that their job is the harness, not the intelligence.**
|
||||
|
||||
---
|
||||
|
||||
@@ -151,43 +124,13 @@ First we fill the workshops. Then the farms, the hospitals, the factories. Then
|
||||
loop back -----------------> messages[]
|
||||
|
||||
|
||||
That's the minimal loop. Every AI agent needs this loop.
|
||||
The MODEL decides when to call tools and when to stop.
|
||||
The CODE just executes what the model asks for.
|
||||
This repo teaches you to build what surrounds this loop --
|
||||
The model decides when to call tools and when to stop.
|
||||
The code just executes what the model asks for.
|
||||
This repo teaches you to build everything around this loop --
|
||||
the harness that makes the agent effective in a specific domain.
|
||||
```
|
||||
|
||||
**12 progressive sessions, from a simple loop to isolated autonomous execution.**
|
||||
**Each session adds one harness mechanism. Each mechanism has one motto.**
|
||||
|
||||
> **s01** *"One loop & Bash is all you need"* — one tool + one loop = an agent
|
||||
>
|
||||
> **s02** *"Adding a tool means adding one handler"* — the loop stays the same; new tools register into the dispatch map
|
||||
>
|
||||
> **s03** *"An agent without a plan drifts"* — list the steps first, then execute; completion doubles
|
||||
>
|
||||
> **s04** *"Break big tasks down; each subtask gets a clean context"* — subagents use independent messages[], keeping the main conversation clean
|
||||
>
|
||||
> **s05** *"Load knowledge when you need it, not upfront"* — inject via tool_result, not the system prompt
|
||||
>
|
||||
> **s06** *"Context will fill up; you need a way to make room"* — three-layer compression strategy for infinite sessions
|
||||
>
|
||||
> **s07** *"Break big goals into small tasks, order them, persist to disk"* — a file-based task graph with dependencies, laying the foundation for multi-agent collaboration
|
||||
>
|
||||
> **s08** *"Run slow operations in the background; the agent keeps thinking"* — daemon threads run commands, inject notifications on completion
|
||||
>
|
||||
> **s09** *"When the task is too big for one, delegate to teammates"* — persistent teammates + async mailboxes
|
||||
>
|
||||
> **s10** *"Teammates need shared communication rules"* — one request-response pattern drives all negotiation
|
||||
>
|
||||
> **s11** *"Teammates scan the board and claim tasks themselves"* — no need for the lead to assign each one
|
||||
>
|
||||
> **s12** *"Each works in its own directory, no interference"* — tasks manage goals, worktrees manage directories, bound by ID
|
||||
|
||||
---
|
||||
|
||||
## The Core Pattern
|
||||
## Core Pattern
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
@@ -214,140 +157,284 @@ def agent_loop(messages):
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
Every session layers one harness mechanism on top of this loop -- without changing the loop itself. The loop belongs to the agent. The mechanisms belong to the harness.
|
||||
Every lesson layers one harness mechanism on top of this loop -- the loop itself never changes. The loop belongs to the agent. The mechanisms belong to the harness.
|
||||
|
||||
## Scope (Important)
|
||||
---
|
||||
|
||||
This repository is a 0->1 learning project for harness engineering -- building the environment that surrounds an agent model.
|
||||
It intentionally simplifies or omits several production mechanisms:
|
||||
## Version Status
|
||||
|
||||
- Full event/hook buses (for example PreToolUse, SessionStart/End, ConfigChange).
|
||||
s12 includes only a minimal append-only lifecycle event stream for teaching.
|
||||
- Rule-based permission governance and trust workflows
|
||||
- Session lifecycle controls (resume/fork) and advanced worktree lifecycle controls
|
||||
- Full MCP runtime details (transport/OAuth/resource subscribe/polling)
|
||||
This repository currently contains two tutorial tracks:
|
||||
|
||||
Treat the team JSONL mailbox protocol in this repo as a teaching implementation, not a claim about any specific production internals.
|
||||
- **Current track: root-level `s01-s20`**
|
||||
The root-level `s01_*` ... `s20_*` folders are the new canonical version. Each chapter contains a full narrative README, translations, runnable `code.py`, and diagrams where needed.
|
||||
- **Legacy transition track: `docs/`, `agents/`, and the current `web/` app**
|
||||
These still preserve the older 12-lesson version. They are kept temporarily for existing readers, old links, and the web platform while the new 20-lesson track settles.
|
||||
|
||||
If you are starting now, read the root-level `s01_agent_loop/` through `s20_comprehensive/` chapters. If you are following an older link or using the current web app, you are likely reading the legacy 12-lesson track. The legacy and current chapter numbers do not always match, so avoid mixing chapter numbers across tracks.
|
||||
|
||||
### Legacy-to-Current Mapping
|
||||
|
||||
| Legacy 12-lesson track | Current 20-lesson track | Topic |
|
||||
|---|---|---|
|
||||
| old s01 | new s01 | Agent Loop |
|
||||
| old s02 | new s02 | Tool Use |
|
||||
| old s03 | new s05 | TodoWrite |
|
||||
| old s04 | new s06 | Subagent |
|
||||
| old s05 | new s07 | Skill Loading |
|
||||
| old s06 | new s08 | Context Compact |
|
||||
| old s07 | new s12 | Task System |
|
||||
| old s08 | new s13 | Background Tasks |
|
||||
| old s09 | new s15 | Agent Teams |
|
||||
| old s10 | new s16 | Team Protocols |
|
||||
| old s11 | new s17 | Autonomous Agents |
|
||||
| old s12 | new s18 | Worktree Isolation |
|
||||
| new only | s03, s04, s09, s10, s11, s14, s19, s20 | Permission, Hooks, Memory, System Prompt, Error Recovery, Cron, MCP, Comprehensive Agent |
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
This repository is a 0-to-1 harness engineering learning project: it teaches how to build the working environment around an agent model. To keep the learning path clear, some production mechanisms are intentionally simplified or omitted:
|
||||
|
||||
- Full event / hook bus behavior, such as `PreToolUse`, `SessionStart/End`, and `ConfigChange`.
|
||||
The teaching code uses minimal lifecycle events where needed.
|
||||
- Rule-based permission governance and full trust workflows.
|
||||
- Session lifecycle controls such as resume/fork, plus more complete worktree lifecycle handling.
|
||||
- Full MCP runtime details such as transport, OAuth, resource subscription, and polling.
|
||||
|
||||
The JSONL mailbox protocol in this repository is a teaching implementation, not a claim about any specific production internal implementation.
|
||||
|
||||
---
|
||||
|
||||
## 20 Progressive Lessons
|
||||
|
||||
**Each lesson adds one harness mechanism. Each mechanism has a motto.**
|
||||
|
||||
> **s01** *"One loop & Bash is all you need"* — one tool + one loop = one agent
|
||||
>
|
||||
> **s02** *"Adding a tool means adding one handler"* — the loop stays untouched; new tools register into the dispatch map
|
||||
>
|
||||
> **s03** *"Set boundaries first, then grant freedom"* — check what can run, what must stop, and what needs approval
|
||||
>
|
||||
> **s04** *"Hook around the loop, never rewrite the loop"* — add extension points without changing the main loop
|
||||
>
|
||||
> **s05** *"An agent without a plan drifts"* — list the steps before starting; completion rate doubles
|
||||
>
|
||||
> **s06** *"Big tasks split small, each subtask gets clean context"* — subagents do the side work and bring back only the result
|
||||
>
|
||||
> **s07** *"Load knowledge on demand, not upfront"* — list skills first, expand them only when needed
|
||||
>
|
||||
> **s08** *"Context always fills up -- have a way to make room"* — multi-layer compaction strategies buy you infinite sessions
|
||||
>
|
||||
> **s09** *"Remember what matters, forget what doesn't"* — three subsystems: selection, extraction, consolidation
|
||||
>
|
||||
> **s10** *"Prompts are assembled at runtime, not hardcoded"* — section-based concatenation, loaded on demand
|
||||
>
|
||||
> **s11** *"Errors aren't the end, they're the start of a retry"* — retry, make room, or take another path when things fail
|
||||
>
|
||||
> **s12** *"Big goals break into small tasks, ordered, persisted to disk"* — a file-backed task graph that lays the groundwork for multi-agent coordination
|
||||
>
|
||||
> **s13** *"Slow ops go background, agent keeps thinking"* — background threads run commands; notifications inject on completion
|
||||
>
|
||||
> **s14** *"Fire on schedule, no human kick needed"* — trigger tasks automatically by time
|
||||
>
|
||||
> **s15** *"Too big for one agent -- delegate to teammates"* — persistent teammates + async mailboxes
|
||||
>
|
||||
> **s16** *"Teammates need shared communication rules"* — use a fixed request-reply format for coordination
|
||||
>
|
||||
> **s17** *"Teammates check the board, claim work themselves"* — no leader assigning one by one; self-organizing
|
||||
>
|
||||
> **s18** *"Each works in its own directory, no interference"* — tasks own goals, worktrees own directories, bound by ID
|
||||
>
|
||||
> **s19** *"Not enough capability? Plug in more via MCP"* — connect external tools into the same tool pool
|
||||
>
|
||||
> **s20** *"Many mechanisms, one loop"* — all previous mechanisms return to one complete harness
|
||||
|
||||
---
|
||||
|
||||
## Learning Path
|
||||
|
||||
Main line: act → handle complex work → remember and recover → run long tasks → collaborate → extend and assemble.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
%% Card styles
|
||||
classDef stage1 fill:#E3F2FD,stroke:#1976D2,stroke-width:2px,color:#0D47A1,rx:12,ry:12,text-align:left
|
||||
classDef stage2 fill:#E8F5E9,stroke:#388E3C,stroke-width:2px,color:#1B5E20,rx:12,ry:12,text-align:left
|
||||
classDef stage3 fill:#FFF3E0,stroke:#F57C00,stroke-width:2px,color:#E65100,rx:12,ry:12,text-align:left
|
||||
classDef stage4 fill:#FCE4EC,stroke:#C2185b,stroke-width:2px,color:#880E4F,rx:12,ry:12,text-align:left
|
||||
classDef stage5 fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#4A148C,rx:12,ry:12,text-align:left
|
||||
classDef stage6 fill:#E0F7FA,stroke:#0097A7,stroke-width:2px,color:#006064,rx:12,ry:12,text-align:left
|
||||
|
||||
%% Group style
|
||||
classDef groupBox fill:#F8F9FA,stroke:#CED4DA,stroke-width:2px,stroke-dasharray: 5 5,rx:15,ry:15,color:#495057
|
||||
|
||||
%% Layer 1: stages 1-3
|
||||
subgraph Phase1 ["🌱 Stages 1-3: Core capabilities (simple to complex)"]
|
||||
direction LR
|
||||
S1["<b>1. Let the Agent act</b><br/>━━━━━━━━━━━━━<br/><b>s01 Agent Loop</b><br/>└─ one loop + bash<br/><br/><b>s02 Tool Use</b><br/>└─ one tool to many tools<br/><br/><b>s03 Permission</b><br/>└─ decide what can run<br/><br/><b>s04 Hooks</b><br/>└─ extension points around tools"]:::stage1
|
||||
|
||||
S2["<b>2. Handle complex work</b><br/>━━━━━━━━━━━━━<br/><b>s05 TodoWrite</b><br/>└─ plan first, then execute<br/><br/><b>s06 Subagent</b><br/>└─ side work, result back<br/><br/><b>s08 Context Compact</b><br/>└─ make room in long context"]:::stage2
|
||||
|
||||
S3["<b>3. Remember and recover</b><br/>━━━━━━━━━━━━━<br/><b>s09 Memory</b><br/>└─ remember what matters<br/><br/><b>s10 System Prompt</b><br/>└─ assemble at runtime<br/><br/><b>s11 Error Recovery</b><br/>└─ retry or change path"]:::stage3
|
||||
|
||||
S1 ==> S2 ==> S3
|
||||
end
|
||||
|
||||
%% Layer 2: stages 4-6
|
||||
subgraph Phase2 ["🚀 Stages 4-6: Advanced capabilities (long-running, collaboration, integration)"]
|
||||
direction LR
|
||||
S4["<b>4. Run long tasks</b><br/>━━━━━━━━━━━━━<br/><b>s12 Task System</b><br/>└─ persist tasks and deps<br/><br/><b>s13 Background Tasks</b><br/>└─ send slow work background<br/><br/><b>s14 Cron Scheduler</b><br/>└─ trigger by time"]:::stage4
|
||||
|
||||
S5["<b>5. Coordinate many Agents</b><br/>━━━━━━━━━━━━━<br/><b>s15 Agent Teams</b><br/>└─ teammates + mailboxes<br/><br/><b>s16 Team Protocols</b><br/>└─ fixed request-reply format<br/><br/><b>s17 Autonomous Agents</b><br/>└─ claim work from the board<br/><br/><b>s18 Worktree Isolation</b><br/>└─ separate directories"]:::stage5
|
||||
|
||||
S6["<b>6. Extend and assemble</b><br/>━━━━━━━━━━━━━<br/><b>s07 Skill Loading</b><br/>└─ expand skills on demand<br/><br/><b>s19 MCP Plugin</b><br/>└─ external tools, one pool<br/><br/><b>s20 Comprehensive Agent</b><br/>└─ all mechanisms, one loop"]:::stage6
|
||||
|
||||
S4 ==> S5 ==> S6
|
||||
end
|
||||
|
||||
%% Connect the two layers
|
||||
Phase1 ===> Phase2
|
||||
|
||||
class Phase1,Phase2 groupBox
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## All Chapters
|
||||
|
||||
| Chapter | Topic | Key Concepts |
|
||||
|---|---|---|
|
||||
| [s01](./s01_agent_loop/) | Agent Loop | `messages` / `while True` / `stop_reason` |
|
||||
| [s02](./s02_tool_use/) | Tool Use | `TOOL_HANDLERS` / dispatch map / concurrency |
|
||||
| [s03](./s03_permission/) | Permission System | `PermissionRule` / approval pipeline |
|
||||
| [s04](./s04_hooks/) | Hook System | `PreToolUse` / `PostToolUse` / extension points |
|
||||
| [s05](./s05_todo_write/) | TodoWrite | `TodoItem` / plan-then-execute |
|
||||
| [s06](./s06_subagent/) | Subagent | `fresh messages[]` / context isolation |
|
||||
| [s07](./s07_skill_loading/) | Skill Loading | `SkillManifest` / on-demand injection |
|
||||
| [s08](./s08_context_compact/) | Context Compact | snipCompact / microCompact / toolResultBudget / autoCompact |
|
||||
| [s09](./s09_memory/) | Memory System | selection / extraction / consolidation |
|
||||
| [s10](./s10_system_prompt/) | System Prompt | runtime assembly / section concatenation |
|
||||
| [s11](./s11_error_recovery/) | Error Recovery | token escalation / fallback model / retry strategies |
|
||||
| [s12](./s12_task_system/) | Task System | `TaskRecord` / `blockedBy` / disk persistence |
|
||||
| [s13](./s13_background_tasks/) | Background Tasks | threaded execution / notification queue |
|
||||
| [s14](./s14_cron_scheduler/) | Cron Scheduler | durable scheduling / session-scoped triggers |
|
||||
| [s15](./s15_agent_teams/) | Agent Teams | `MessageBus` / inbox / permission bubbling |
|
||||
| [s16](./s16_team_protocols/) | Team Protocols | shutdown handshake / plan approval |
|
||||
| [s17](./s17_autonomous_agents/) | Autonomous Agents | idle cycle / auto-claim / self-organization |
|
||||
| [s18](./s18_worktree_isolation/) | Worktree Isolation | `WorktreeRecord` / task-directory binding |
|
||||
| [s19](./s19_mcp_plugin/) | MCP Plugin | multi-transport / channel routing / tool pool assembly |
|
||||
| [s20](./s20_comprehensive/) | Comprehensive Agent | all mechanisms around one loop |
|
||||
|
||||
---
|
||||
|
||||
## How to Read
|
||||
|
||||
Each chapter is a folder. Open one and you will find:
|
||||
|
||||
```
|
||||
s08_context_compact/
|
||||
README.md # full narrative with inline code
|
||||
README.en.md # English translation
|
||||
README.ja.md # Japanese translation
|
||||
code.py # standalone runnable implementation
|
||||
images/ # SVG diagrams (where needed)
|
||||
```
|
||||
|
||||
Read the `README.md` for the core idea and work through the code. Complex chapters have `<details>` folds for deep dives -- open them when you want to go deeper. Simple chapters have 0-1 diagrams, complex chapters have more.
|
||||
|
||||
Read from s01 through s20 in order. Each chapter assumes you've read the previous ones and ends with a hook into the next.
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Current 20-Lesson Track
|
||||
|
||||
```sh
|
||||
git clone https://github.com/shareAI-lab/learn-claude-code
|
||||
cd learn-claude-code
|
||||
pip install -r requirements.txt
|
||||
cp .env.example .env # Edit .env with your ANTHROPIC_API_KEY
|
||||
cp .env.example .env # configure ANTHROPIC_API_KEY
|
||||
|
||||
python agents/s01_agent_loop.py # Start here
|
||||
python agents/s12_worktree_task_isolation.py # Full progression endpoint
|
||||
python agents/s_full.py # Capstone: all mechanisms combined
|
||||
python s01_agent_loop/code.py # Start here -- one loop + bash
|
||||
python s08_context_compact/code.py # Context compaction (complex)
|
||||
python s20_comprehensive/code.py # Endpoint: all mechanisms in one loop
|
||||
```
|
||||
|
||||
### Legacy 12-Lesson Track
|
||||
|
||||
```sh
|
||||
python agents/s01_agent_loop.py
|
||||
python agents/s12_worktree_task_isolation.py
|
||||
python agents/s_full.py
|
||||
```
|
||||
|
||||
### Web Platform
|
||||
|
||||
Interactive visualizations, step-through diagrams, source viewer, and documentation.
|
||||
The current web app still renders the legacy `docs/` s01-s12 track. Use the root-level folders for the new s01-s20 track.
|
||||
|
||||
```sh
|
||||
cd web && npm install && npm run dev # http://localhost:3000
|
||||
```
|
||||
|
||||
## Learning Path
|
||||
---
|
||||
|
||||
```
|
||||
Phase 1: THE LOOP Phase 2: PLANNING & KNOWLEDGE
|
||||
================== ==============================
|
||||
s01 The Agent Loop [1] s03 TodoWrite [5]
|
||||
while + stop_reason TodoManager + nag reminder
|
||||
| |
|
||||
+-> s02 Tool Use [4] s04 Subagents [5]
|
||||
dispatch map: name->handler fresh messages[] per child
|
||||
|
|
||||
s05 Skills [5]
|
||||
SKILL.md via tool_result
|
||||
|
|
||||
s06 Context Compact [5]
|
||||
3-layer compression
|
||||
|
||||
Phase 3: PERSISTENCE Phase 4: TEAMS
|
||||
================== =====================
|
||||
s07 Tasks [8] s09 Agent Teams [9]
|
||||
file-based CRUD + deps graph teammates + JSONL mailboxes
|
||||
| |
|
||||
s08 Background Tasks [6] s10 Team Protocols [12]
|
||||
daemon threads + notify queue shutdown + plan approval FSM
|
||||
|
|
||||
s11 Autonomous Agents [14]
|
||||
idle cycle + auto-claim
|
||||
|
|
||||
s12 Worktree Isolation [16]
|
||||
task coordination + optional isolated execution lanes
|
||||
|
||||
[N] = number of tools
|
||||
```
|
||||
|
||||
## Architecture
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
learn-claude-code/
|
||||
|
|
||||
|-- agents/ # Python reference implementations (s01-s12 + s_full capstone)
|
||||
|-- docs/{en,zh,ja}/ # Mental-model-first documentation (3 languages)
|
||||
|-- web/ # Interactive learning platform (Next.js)
|
||||
|-- skills/ # Skill files for s05
|
||||
+-- .github/workflows/ci.yml # CI: typecheck + build
|
||||
s01_agent_loop/ # one folder per chapter
|
||||
README.md # Chinese source (complete narrative)
|
||||
README.en.md # English translation
|
||||
README.ja.md # Japanese translation
|
||||
code.py # standalone runnable code
|
||||
images/ # SVG diagrams
|
||||
s02_tool_use/
|
||||
...
|
||||
s19_mcp_plugin/
|
||||
s20_comprehensive/ # endpoint chapter
|
||||
agents/ # legacy 12 runnable copies + s_full.py
|
||||
skills/ # skill files used by s07
|
||||
docs/ # legacy 12-lesson docs, kept during transition
|
||||
web/ # currently renders the legacy docs/ track
|
||||
tests/
|
||||
```
|
||||
|
||||
## Documentation
|
||||
---
|
||||
|
||||
Mental-model-first: problem, solution, ASCII diagram, minimal code.
|
||||
Available in [English](./docs/en/) | [中文](./docs/zh/) | [日本語](./docs/ja/).
|
||||
## What's Next
|
||||
|
||||
| Session | Topic | Motto |
|
||||
|---------|-------|-------|
|
||||
| [s01](./docs/en/s01-the-agent-loop.md) | The Agent Loop | *One loop & Bash is all you need* |
|
||||
| [s02](./docs/en/s02-tool-use.md) | Tool Use | *Adding a tool means adding one handler* |
|
||||
| [s03](./docs/en/s03-todo-write.md) | TodoWrite | *An agent without a plan drifts* |
|
||||
| [s04](./docs/en/s04-subagent.md) | Subagents | *Break big tasks down; each subtask gets a clean context* |
|
||||
| [s05](./docs/en/s05-skill-loading.md) | Skills | *Load knowledge when you need it, not upfront* |
|
||||
| [s06](./docs/en/s06-context-compact.md) | Context Compact | *Context will fill up; you need a way to make room* |
|
||||
| [s07](./docs/en/s07-task-system.md) | Tasks | *Break big goals into small tasks, order them, persist to disk* |
|
||||
| [s08](./docs/en/s08-background-tasks.md) | Background Tasks | *Run slow operations in the background; the agent keeps thinking* |
|
||||
| [s09](./docs/en/s09-agent-teams.md) | Agent Teams | *When the task is too big for one, delegate to teammates* |
|
||||
| [s10](./docs/en/s10-team-protocols.md) | Team Protocols | *Teammates need shared communication rules* |
|
||||
| [s11](./docs/en/s11-autonomous-agents.md) | Autonomous Agents | *Teammates scan the board and claim tasks themselves* |
|
||||
| [s12](./docs/en/s12-worktree-task-isolation.md) | Worktree + Task Isolation | *Each works in its own directory, no interference* |
|
||||
|
||||
## What's Next -- from understanding to shipping
|
||||
|
||||
After the 12 sessions you understand how harness engineering works inside out. Two ways to put that knowledge to work:
|
||||
After 20 lessons, you understand harness engineering from the inside out. Two paths to turn that knowledge into product:
|
||||
|
||||
### Kode Agent CLI -- Open-Source Coding Agent CLI
|
||||
|
||||
> `npm i -g @shareai-lab/kode`
|
||||
|
||||
Skill & LSP support, Windows-ready, pluggable with GLM / MiniMax / DeepSeek and other open models. Install and go.
|
||||
Skill and LSP support, Windows compatible, works with GLM / MiniMax / DeepSeek and other open models. Install and go.
|
||||
|
||||
GitHub: **[shareAI-lab/Kode-cli](https://github.com/shareAI-lab/Kode-cli)**
|
||||
GitHub: **[shareAI-lab/Kode-Agent](https://github.com/shareAI-lab/Kode-Agent)**
|
||||
|
||||
### Kode Agent SDK -- Embed Agent Capabilities in Your App
|
||||
### Kode Agent SDK -- Embed Agent Capabilities in Your Application
|
||||
|
||||
The official Claude Code Agent SDK communicates with a full CLI process under the hood -- each concurrent user means a separate terminal process. Kode SDK is a standalone library with no per-user process overhead, embeddable in backends, browser extensions, embedded devices, or any runtime.
|
||||
A standalone library with no per-user process overhead. Embed it in backends, browser extensions, embedded devices, or any runtime.
|
||||
|
||||
GitHub: **[shareAI-lab/Kode-agent-sdk](https://github.com/shareAI-lab/Kode-agent-sdk)**
|
||||
GitHub: **[shareAI-lab/kode-agent-sdk](https://github.com/shareAI-lab/kode-agent-sdk)**
|
||||
|
||||
---
|
||||
|
||||
## Sister Repo: from *on-demand sessions* to *always-on assistant*
|
||||
## Sister Tutorial: From Passive Sessions to Always-On Assistants
|
||||
|
||||
The harness this repo teaches is **use-and-discard** -- open a terminal, give the agent a task, close when done, next session starts blank. That is the Claude Code model.
|
||||
The harness taught in this repository is the **use-and-discard** kind -- open a terminal, give the agent a task, close when done, next session starts fresh. Claude Code works this way.
|
||||
|
||||
[OpenClaw](https://github.com/openclaw/openclaw) proved another possibility: on top of the same agent core, two harness mechanisms turn the agent from "poke it to make it move" into "it wakes up every 30 seconds to look for work":
|
||||
But [OpenClaw](https://github.com/openclaw/openclaw) proves another possibility: on the same agent core, two additional harness mechanisms turn an agent from "poke it and it moves" into "wakes itself every 30 seconds to look for work":
|
||||
|
||||
- **Heartbeat** -- every 30s the harness sends the agent a message to check if there is anything to do. Nothing? Go back to sleep. Something? Act immediately.
|
||||
- **Cron** -- the agent can schedule its own future tasks, executed automatically when the time comes.
|
||||
- **Heartbeat** -- every 30 seconds the harness sends the agent a message, letting it check for pending work. Nothing to do? Keep sleeping. Something appeared? Act immediately.
|
||||
- **Cron** -- the agent can schedule its own future tasks, which fire automatically when the time arrives.
|
||||
|
||||
Add multi-channel IM routing (WhatsApp / Telegram / Slack / Discord, 13+ platforms), persistent context memory, and a Soul personality system, and the agent goes from a disposable tool to an always-on personal AI assistant.
|
||||
Add IM multi-channel routing (WhatsApp / Telegram / Slack / Discord and 13+ other platforms), persistent context memory, and a Soul personality system, and the agent transforms from a disposable tool into an always-on personal AI assistant.
|
||||
|
||||
**[claw0](https://github.com/shareAI-lab/claw0)** is our companion teaching repo that deconstructs these harness mechanisms from scratch:
|
||||
**[claw0](https://github.com/shareAI-lab/claw0)** is our sister teaching repository, breaking down these harness mechanisms from scratch:
|
||||
|
||||
```
|
||||
claw agent = agent core + heartbeat + cron + IM chat + memory + soul
|
||||
@@ -355,23 +442,19 @@ claw agent = agent core + heartbeat + cron + IM chat + memory + soul
|
||||
|
||||
```
|
||||
learn-claude-code claw0
|
||||
(agent harness core: (proactive always-on harness:
|
||||
loop, tools, planning, heartbeat, cron, IM channels,
|
||||
teams, worktree isolation) memory, soul personality)
|
||||
(agent harness internals: (always-on harness:
|
||||
loop, tools, planning, heartbeat, cron, IM channels,
|
||||
teams, worktree isolation) memory, Soul personality)
|
||||
```
|
||||
|
||||
## About
|
||||
<img width="260" src="https://github.com/user-attachments/assets/fe8b852b-97da-4061-a467-9694906b5edf" /><br>
|
||||
|
||||
Scan with WeChat to follow us,
|
||||
or follow on X: [shareAI-Lab](https://x.com/baicai003)
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
---
|
||||
|
||||
**Agency comes from the model. The harness makes agency real. Build great harnesses. The model will do the rest.**
|
||||
**Agency comes from the model. The harness gives agency a place to land. Build the harness well, and the model will do the rest.**
|
||||
|
||||
**Bash is all you need. Real agents are all the universe needs.**
|
||||
|
||||
**This is not "copy the source code." This is "grasp the key designs and build it yourself."**
|
||||
|
||||
@@ -25,7 +25,7 @@
|
||||
|
||||
一个退出条件控制整个流程。循环持续运行, 直到模型不再调用工具。
|
||||
|
||||
## 工作原理
|
||||
## 工作原理
|
||||
|
||||
1. 用户 prompt 作为第一条消息。
|
||||
|
||||
|
||||
@@ -26,7 +26,7 @@
|
||||
| [ ] task A |
|
||||
| [>] task B <- doing |
|
||||
| [x] task C |
|
||||
+-----------------------+
|
||||
+----------- ------------+
|
||||
|
|
||||
if rounds_since_todo >= 3:
|
||||
inject <reminder> into tool_result
|
||||
|
||||
207
s01_agent_loop/README.en.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# s01: The Agent Loop — One Loop Is All You Need
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
`s01` → [s02](../s02_tool_use/) → s03 → s04 → ... → s20
|
||||
> *"One loop & Bash is all you need"* — One tool + one loop = one Agent.
|
||||
>
|
||||
> **Harness Layer**: The Loop — the first bridge between the model and the real world.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
You ask the model: "List the files in my directory and run XXX.py."
|
||||
|
||||
The model can output a bash command, but once it's done outputting, it stops — it won't execute the command on its own, and it won't keep reasoning based on the result.
|
||||
|
||||
You could run it manually, paste the output back into the chat, and let it continue. Next command comes out, you run it again, paste it back.
|
||||
|
||||
Every round-trip, you're the middle layer. Automating that is what this chapter is about.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
A `while True` loop: keep going when the model calls a tool, stop when it doesn't. The entire process hinges on two signals:
|
||||
|
||||
| Signal | Meaning | Loop Action |
|
||||
|--------|---------|-------------|
|
||||
| `stop_reason == "tool_use"` | Model raises hand: "I need a tool" | Execute → feed result back → continue |
|
||||
| `stop_reason != "tool_use"` | Model says: "I'm done" | Exit loop |
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
Let's translate this process into code. Step by step:
|
||||
|
||||
**Step 1**: Start with the user's question as the first message.
|
||||
|
||||
```python
|
||||
messages = [{"role": "user", "content": query}]
|
||||
```
|
||||
|
||||
**Step 2**: Send the messages and tool definitions to the LLM.
|
||||
|
||||
```python
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
```
|
||||
|
||||
**Step 3**: Append the model's response and check whether it called a tool. No tool call → done.
|
||||
|
||||
```python
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
```
|
||||
|
||||
**Step 4**: Execute the tool the model requested and collect the results.
|
||||
|
||||
```python
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
```
|
||||
|
||||
**Step 5**: Append the tool results as a new message and go back to Step 2.
|
||||
|
||||
```python
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
Assembled into a complete function:
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
Under 30 lines — that's the minimal runnable agent harness kernel. It's not intelligence itself, but the smallest runtime framework that lets the model keep acting. The model decides (whether to call a tool, which one), the harness executes (if called, run it, feed the result back). The next 18 chapters all add mechanisms on top of this loop. The loop itself never changes.
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
> **Teaching demo notice**: The code executes shell commands generated by the model. Run it in a temporary test directory to avoid affecting your project files. s03 covers the real permission system.
|
||||
|
||||
**Setup** (first run):
|
||||
|
||||
```sh
|
||||
pip install -r requirements.txt
|
||||
cp .env.example .env
|
||||
# Edit .env, fill in ANTHROPIC_API_KEY and MODEL_ID
|
||||
```
|
||||
|
||||
**Run**:
|
||||
|
||||
```sh
|
||||
python s01_agent_loop/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Create a file called hello.py that prints "Hello, World!"`
|
||||
2. `List all Python files in this directory`
|
||||
3. `What is the current git branch?`
|
||||
|
||||
What to watch for: When does the model call a tool (loop continues), and when does it not (loop ends)?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
Right now the model only has bash — reading files requires `cat`, writing files requires `echo ... >`, finding files requires `find`. Ugly and error-prone.
|
||||
|
||||
→ s02 Tool Use: What happens when we give it 5 proper tools? Will the model call multiple tools at once? Will parallel tool executions step on each other?
|
||||
|
||||
<details>
|
||||
<summary>Dive into CC Source Code</summary>
|
||||
|
||||
> The following is based on a review of CC source code `src/query.ts` (1729 lines). The core differences are twofold: CC doesn't rely on the `stop_reason` field to decide whether to continue the loop — instead it checks whether the content contains `tool_use` blocks (because `stop_reason` is unreliable in streaming responses); CC has more exit paths and recovery strategies for production-grade protection.
|
||||
|
||||
**The 30-line `while True` from the teaching version IS the core of CC's 1729 lines.** Everything below is a protection mechanism layered on top of that core.
|
||||
|
||||
<details>
|
||||
<summary>1. Loop Structure Differences</summary>
|
||||
|
||||
The teaching version checks `response.stop_reason`. CC doesn't use it as the sole signal for loop continuation — in streaming responses, `stop_reason` may not have updated yet even though `tool_use` blocks are already present. CC uses a `needsFollowUp` flag: during streaming message reception (`query.ts:830-834`), it's set to `true` whenever a `tool_use` block is detected. `QueryEngine.ts` captures the real `stop_reason` from `message_delta` for other logic, but the query loop itself relies on `needsFollowUp`.
|
||||
|
||||
```typescript
|
||||
// query.ts:554-558
|
||||
// stop_reason === 'tool_use' is unreliable.
|
||||
// Set during streaming whenever a tool_use block arrives.
|
||||
let needsFollowUp = false
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2. State Object — 10 Fields (Teaching Version Only Uses messages)</summary>
|
||||
|
||||
| # | Field | Purpose | Chapter |
|
||||
|---|-------|---------|---------|
|
||||
| 1 | `messages` | Message array for the current iteration | s01 |
|
||||
| 2 | `toolUseContext` | Tool, signal, and permission context | s02 |
|
||||
| 3 | `autoCompactTracking` | Compaction state tracking | s08 |
|
||||
| 4 | `maxOutputTokensRecoveryCount` | Token recovery attempt count (max 3) | s11 |
|
||||
| 5 | `hasAttemptedReactiveCompact` | Whether reactive compaction was attempted this round | s08 |
|
||||
| 6 | `maxOutputTokensOverride` | 8K→64K upgrade override | s11 |
|
||||
| 7 | `pendingToolUseSummary` | Background Haiku-generated tool use summary | s08 |
|
||||
| 8 | `stopHookActive` | Whether the stop hook produced a blocking error | s04 |
|
||||
| 9 | `turnCount` | Turn count (for maxTurns check) | s01 |
|
||||
| 10 | `transition` | Last continue reason | s11 |
|
||||
|
||||
> Note: `taskBudgetRemaining` (`query.ts:291`) is a loop-local variable, not on State. The source comment explicitly says "Loop-local (not on State)".
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>3. Multiple Exit and Continue Paths</summary>
|
||||
|
||||
The teaching version has only 1 exit path (model doesn't call a tool → done). The production version has multiple exit and continue paths, covering blocking limit, prompt too long, model error, abort, hook stop, max turns, token budget continuation, reactive compact retry, and more. Each scenario has a corresponding recovery or exit strategy.
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>4. Streaming Tool Execution and QueryEngine</summary>
|
||||
|
||||
CC's `StreamingToolExecutor` (`query.ts:561`) allows tools to begin parallel execution while the model is still generating (concurrency-safe tools run in parallel, others run exclusively). `QueryEngine.ts` adds additional protections for cost overruns, structured output validation failures, and more. The teaching version doesn't implement these — the goal is conceptual clarity, not peak performance.
|
||||
|
||||
</details>
|
||||
|
||||
**In one sentence**: The core of query.ts's 1729 lines is a 30-line `while True`. All the complex fields and exit paths are protection mechanisms. Understand the core loop first, and everything that follows unfolds naturally.
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
207
s01_agent_loop/README.ja.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# s01: Agent Loop — ループ一つで十分
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
`s01` → [s02](../s02_tool_use/) → s03 → s04 → ... → s20
|
||||
> *"One loop & Bash is all you need"* — ツール一つ + ループ一つ = 一つの Agent。
|
||||
>
|
||||
> **Harness レイヤー**: ループ — モデルと現実世界をつなぐ最初の架け橋。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
モデルにこう頼んだとする:「ディレクトリ内のファイル一覧を取得して、XXX.py を実行して」。
|
||||
|
||||
モデルは bash コマンドを出力できるが、出力が終わると止まってしまう — 自分で実行することも、結果を見て推論を続けることもない。
|
||||
|
||||
手動で実行し、出力をチャットに貼り付ければ、モデルは続きを生成できる。次のコマンドが出たら、また実行して貼り付ける。
|
||||
|
||||
毎回の往復で、あなたが中間層になっている。これを自動化するのが、この章の目的だ。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
一つの `while True` ループ — モデルがツールを呼べば続き、呼ばなければ停止。全体でたった 2 つのシグナル:
|
||||
|
||||
| シグナル | 意味 | ループの動作 |
|
||||
|----------|------|-------------|
|
||||
| `stop_reason == "tool_use"` | モデルが「ツールが必要」と挙手 | 実行 → 結果を戻す → 続行 |
|
||||
| `stop_reason != "tool_use"` | モデルが「完了」と宣言 | ループ終了 |
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||
このプロセスをコードに変換してみよう。ステップごとに:
|
||||
|
||||
**ステップ 1**:ユーザーの質問を最初のメッセージとして設定する。
|
||||
|
||||
```python
|
||||
messages = [{"role": "user", "content": query}]
|
||||
```
|
||||
|
||||
**ステップ 2**:メッセージとツール定義を一緒に LLM に送信する。
|
||||
|
||||
```python
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
```
|
||||
|
||||
**ステップ 3**:モデルの応答を追加し、ツールを呼び出したか確認する。呼び出しなし → 終了。
|
||||
|
||||
```python
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
```
|
||||
|
||||
**ステップ 4**:モデルが要求したツールを実行し、結果を収集する。
|
||||
|
||||
```python
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
```
|
||||
|
||||
**ステップ 5**:ツールの結果を新しいメッセージとして追加し、ステップ 2 に戻る。
|
||||
|
||||
```python
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
完全な関数に組み立てる:
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
30 行未満 — これが最小実行可能な agent harness のカーネルだ。これは知能そのものではなく、モデルが継続的に行動できるための最小ランタイムフレームワーク。モデルが決定し(ツールを呼ぶか、どれを呼ぶか)、harness が実行する(呼ばれたら実行し、結果を戻す)。次の 18 章はすべてこのループの上に仕組みを積み重ねていく。ループ自体は永遠に変わらない。
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
> **教育デモの注意**: このコードはモデルが生成したシェルコマンドを実行します。プロジェクトファイルへの影響を避けるため、一時テストディレクトリで実行してください。s03 で本格的な権限システムを説明します。
|
||||
|
||||
**準備**(初回のみ):
|
||||
|
||||
```sh
|
||||
pip install -r requirements.txt
|
||||
cp .env.example .env
|
||||
# .env を編集し、ANTHROPIC_API_KEY と MODEL_ID を入力
|
||||
```
|
||||
|
||||
**実行**:
|
||||
|
||||
```sh
|
||||
python s01_agent_loop/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみよう:
|
||||
|
||||
1. `Create a file called hello.py that prints "Hello, World!"`
|
||||
2. `List all Python files in this directory`
|
||||
3. `What is the current git branch?`
|
||||
|
||||
観察のポイント:モデルがツールを呼び出すとき(ループ継続)、呼び出さないとき(ループ終了)の違い。
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
現在、モデルが持っているのは bash だけだ — ファイルを読むには `cat`、書くには `echo ... >`、探すには `find`。不便でエラーも起きやすい。
|
||||
|
||||
→ s02 Tool Use:5 つの本格的なツールを与えたらどうなる? モデルは複数のツールを同時に呼び出すか? 並列実行で競合は起きないか?
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードを深掘り</summary>
|
||||
|
||||
> 以下は CC ソースコード `src/query.ts`(1729 行)の検証に基づく。核心的な違いは二つ:CC はループ継続の判断に `stop_reason` フィールドを頼らず、コンテンツに `tool_use` ブロックが含まれるかをチェックする(ストリーミングレスポンスでは `stop_reason` が信頼できないため)。CC には本番環境向けのより多くの終了パスとリカバリ戦略がある。
|
||||
|
||||
**教育版の 30 行 `while True` が CC の 1729 行の核心。** 以下の各項目は、すべてその核心の上に積み重ねられた保護機構である。
|
||||
|
||||
<details>
|
||||
<summary>一、ループ構造の違い</summary>
|
||||
|
||||
教育版は `response.stop_reason` をチェックする。CC はこれをループ継続の唯一の根拠として使わない — ストリーミングレスポンスでは、`stop_reason` がまだ更新されていなくても、コンテンツに既に `tool_use` ブロックが含まれている可能性がある。CC は `needsFollowUp` フラグを使用する:ストリーミングメッセージの受信時(`query.ts:830-834`)に、`tool_use` ブロックが検出されると `true` に設定される。`QueryEngine.ts` は `message_delta` から実際の `stop_reason` を取得して他の処理に利用するが、query loop 自体は `needsFollowUp` に依存する。
|
||||
|
||||
```typescript
|
||||
// query.ts:554-558
|
||||
// stop_reason === 'tool_use' is unreliable.
|
||||
// Set during streaming whenever a tool_use block arrives.
|
||||
let needsFollowUp = false
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>二、State オブジェクト 10 フィールド(教育版は messages のみ使用)</summary>
|
||||
|
||||
| # | フィールド | 用途 | 対応章 |
|
||||
|---|-----------|------|--------|
|
||||
| 1 | `messages` | 現在のイテレーションのメッセージ配列 | s01 |
|
||||
| 2 | `toolUseContext` | ツール、シグナル、権限コンテキスト | s02 |
|
||||
| 3 | `autoCompactTracking` | 圧縮状態の追跡 | s08 |
|
||||
| 4 | `maxOutputTokensRecoveryCount` | トークンリカバリ試行回数(上限 3) | s11 |
|
||||
| 5 | `hasAttemptedReactiveCompact` | 今回のラウンドでリアクティブ圧縮を試みたか | s08 |
|
||||
| 6 | `maxOutputTokensOverride` | 8K→64K へのアップグレード上書き | s11 |
|
||||
| 7 | `pendingToolUseSummary` | バックグラウンド Haiku 生成のツール使用要約 | s08 |
|
||||
| 8 | `stopHookActive` | 停止フックがブロッキングエラーを発生させたか | s04 |
|
||||
| 9 | `turnCount` | ターン数(maxTurns チェック用) | s01 |
|
||||
| 10 | `transition` | 前回の継続理由 | s11 |
|
||||
|
||||
> 注:`taskBudgetRemaining`(`query.ts:291`)は loop-local のローカル変数であり、State には含まれない。ソースコメントには明確に "Loop-local (not on State)" と書かれている。
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>三、複数の終了パスと継続パス</summary>
|
||||
|
||||
教育版には 1 つの終了パスしかない(モデルがツールを呼ばなければ終了)。本番版には複数の終了・継続パスがあり、blocking limit、prompt too long、model error、abort、hook stop、max turns、token budget continuation、reactive compact retry など多くのシナリオをカバーしている。各シナリオには対応するリカバリまたは終了戦略がある。
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>四、ストリーミングツール実行と QueryEngine</summary>
|
||||
|
||||
CC の `StreamingToolExecutor`(`query.ts:561`)は、モデルがまだ生成中にツールの実行を開始できる(concurrency-safe なツールは並列、それ以外は排他実行)。`QueryEngine.ts` はさらに、コスト超過や構造化出力の検証失敗などの保護を追加する。教育版はこれらを実装しない — 目標は概念の明確さであり、極限のパフォーマンスではない。
|
||||
|
||||
</details>
|
||||
|
||||
**一言で**: query.ts の 1729 行の核心は 30 行の `while True`。複雑なフィールドや終了パスはすべて保護機構だ。まず核心のループを理解すれば、その後のすべては自然に理解できる。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
207
s01_agent_loop/README.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# s01: Agent Loop — 一个循环就够了
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
`s01` → [s02](../s02_tool_use/) → s03 → s04 → ... → s20
|
||||
> *"One loop & Bash is all you need"* — 一个工具 + 一个循环 = 一个 Agent。
|
||||
>
|
||||
> **Harness 层**: 循环 — 模型与真实世界的第一道连接。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
你提出了一个问题给大模型:“帮我读取下我的目录下有哪些文件,并且执行XXX.py”。
|
||||
|
||||
模型能输出一条 bash 命令,但输出完了就停了,它不会自己跑,也不会看到结果后继续推理。
|
||||
|
||||
你可以手动跑一遍,把输出粘贴回对话框,让它接着干。下一个命令出来,你再跑一遍、再贴回去。
|
||||
|
||||
每一个来回,你都在做中间层。而把它自动化,就是这一章要做的事。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
一个 `while True` 循环,模型调用工具就继续,不调用就停。整个过程只有两个信号:
|
||||
|
||||
| 信号 | 含义 | 循环动作 |
|
||||
|------|------|---------|
|
||||
| `stop_reason == "tool_use"` | 模型举手说"我要用工具" | 执行 → 结果喂回去 → 继续 |
|
||||
| `stop_reason != "tool_use"` | 模型说"我做完了" | 退出循环 |
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||
将这个过程翻译成代码。分步来看:
|
||||
|
||||
**第 1 步**:把用户的问题作为第一条消息。
|
||||
|
||||
```python
|
||||
messages = [{"role": "user", "content": query}]
|
||||
```
|
||||
|
||||
**第 2 步**:将消息和工具定义一起发给 LLM。
|
||||
|
||||
```python
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
```
|
||||
|
||||
**第 3 步**:追加模型回答,检查它是否调了工具。没调 → 结束。
|
||||
|
||||
```python
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
```
|
||||
|
||||
**第 4 步**:执行模型要求的工具,收集结果。
|
||||
|
||||
```python
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
```
|
||||
|
||||
**第 5 步**:把工具结果作为新消息追加,回到第 2 步。
|
||||
|
||||
```python
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
组装为一个完整函数:
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
不到 30 行,这就是最小可运行的 agent harness 内核。它不是智能本身,而是让模型能持续行动的最小运行框架,模型负责决策(要不要调工具、调哪个),harness 负责执行(调了就跑、结果喂回去)。后面 18 个章节都在这个循环上叠加机制,循环本身始终不变。
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
> **教学 demo 提示**:代码会执行模型生成的 shell 命令。建议在一个临时测试目录中运行,避免影响你的项目文件。s03 会讲真正的权限系统。
|
||||
|
||||
**准备**(首次运行):
|
||||
|
||||
```sh
|
||||
pip install -r requirements.txt
|
||||
cp .env.example .env
|
||||
# 编辑 .env,填入 ANTHROPIC_API_KEY 和 MODEL_ID
|
||||
```
|
||||
|
||||
**运行**:
|
||||
|
||||
```sh
|
||||
python s01_agent_loop/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Create a file called hello.py that prints "Hello, World!"`
|
||||
2. `List all Python files in this directory`
|
||||
3. `What is the current git branch?`
|
||||
|
||||
观察重点:模型什么时候调用工具(循环继续),什么时候不调用(循环结束)?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
现在模型手里只有 bash 一个工具,读文件要 `cat`,写文件要 `echo ... >`,找个文件要 `find`,又丑又容易出错。
|
||||
|
||||
s02 Tool Use → 给它 5 个真正的工具,会发生什么?模型会不会一次调用多个工具?几个工具同时跑会不会互相踩?
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下内容基于 CC 源码 `src/query.ts`(1729 行)的核查。核心差异就两个:CC 不看 `stop_reason` 字段而是检查内容里有没有 tool_use 块(因为流式响应中 stop_reason 不可靠);CC 有更多的退出路径和恢复策略做生产级保护。
|
||||
|
||||
**教学版的 30 行 `while True` 就是 CC 1729 行的核心。** 下面每一项都是在这个核心上叠加的保护机制。
|
||||
|
||||
<details>
|
||||
<summary>一、循环结构差异</summary>
|
||||
|
||||
教学版检查 `response.stop_reason`。CC 不把它作为循环继续的唯一依据——流式响应中 `stop_reason` 可能还没更新但内容里已经有 `tool_use` 块了。CC 用 `needsFollowUp` 标志:接收到流式消息时(`query.ts:830-834`),只要检测到 `tool_use` 块就设为 `true`;`QueryEngine.ts` 会从 `message_delta` 捕获真实 `stop_reason` 用于其他逻辑,但 query loop 本身靠 `needsFollowUp` 决定是否继续。
|
||||
|
||||
```typescript
|
||||
// query.ts:554-558
|
||||
// stop_reason === 'tool_use' is unreliable.
|
||||
// Set during streaming whenever a tool_use block arrives.
|
||||
let needsFollowUp = false
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>二、State 对象 10 字段(教学版只用 messages)</summary>
|
||||
|
||||
| # | 字段 | 用途 | 对应章节 |
|
||||
|---|------|------|---------|
|
||||
| 1 | `messages` | 当前迭代的消息数组 | s01 |
|
||||
| 2 | `toolUseContext` | 工具、信号、权限上下文 | s02 |
|
||||
| 3 | `autoCompactTracking` | 压缩状态追踪 | s08 |
|
||||
| 4 | `maxOutputTokensRecoveryCount` | token 恢复尝试次数(上限 3) | s11 |
|
||||
| 5 | `hasAttemptedReactiveCompact` | 本轮是否已尝试响应式压缩 | s08 |
|
||||
| 6 | `maxOutputTokensOverride` | 8K→64K 的升级覆盖 | s11 |
|
||||
| 7 | `pendingToolUseSummary` | 后台 Haiku 生成的 tool use 摘要 | s08 |
|
||||
| 8 | `stopHookActive` | 停止钩子是否产生阻塞错误 | s04 |
|
||||
| 9 | `turnCount` | 轮次计数(maxTurns 检查) | s01 |
|
||||
| 10 | `transition` | 上一次继续原因 | s11 |
|
||||
|
||||
> 注:`taskBudgetRemaining`(`query.ts:291`)是 loop-local 局部变量,不在 State 上。源码注释明确写了 "Loop-local (not on State)"。
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>三、多条退出和继续路径</summary>
|
||||
|
||||
教学版只有 1 条退出路径(模型不调工具就结束)。生产版有多条退出和继续路径,覆盖 blocking limit、prompt too long、model error、abort、hook stop、max turns、token budget continuation、reactive compact retry 等场景。每种场景都有对应的恢复或退出策略。
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>四、流式工具执行和 QueryEngine</summary>
|
||||
|
||||
CC 的 `StreamingToolExecutor`(`query.ts:561`)让工具在模型还在生成时就开始并行执行(根据工具是否 concurrency-safe 决定并发或独占)。`QueryEngine.ts` 额外加了费用超限、结构化输出验证失败等保护。教学版不实现这些——目标是概念清晰,不是性能极致。
|
||||
|
||||
</details>
|
||||
|
||||
**一句话**:1729 行的 query.ts 核心就是 30 行 `while True`。所有复杂字段和退出路径都是保护机制。先理解核心循环,后面的一切自然展开。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v0, ja@v0 -->
|
||||
137
s01_agent_loop/code.py
Normal file
@@ -0,0 +1,137 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s01_agent_loop.py - The Agent Loop
|
||||
|
||||
The entire secret of an AI coding agent in one pattern:
|
||||
|
||||
while stop_reason == "tool_use":
|
||||
response = LLM(messages, tools)
|
||||
execute tools
|
||||
append results
|
||||
|
||||
+----------+ +-------+ +---------+
|
||||
| User | ---> | LLM | ---> | Tool |
|
||||
| prompt | | | | execute |
|
||||
+----------+ +---+---+ +----+----+
|
||||
^ |
|
||||
| tool_result |
|
||||
+---------------+
|
||||
(loop continues)
|
||||
|
||||
This is the core loop: feed tool results back to the model
|
||||
until the model decides to stop. Production agents layer
|
||||
policy, hooks, and lifecycle controls on top.
|
||||
|
||||
Usage:
|
||||
pip install anthropic python-dotenv
|
||||
ANTHROPIC_API_KEY=... python s01_agent_loop/code.py
|
||||
"""
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
|
||||
try:
|
||||
import readline
|
||||
# macOS 的 libedit 在处理中文输入时有退格问题,这四行修复它
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
readline.parse_and_bind('set input-meta on')
|
||||
readline.parse_and_bind('set output-meta on')
|
||||
readline.parse_and_bind('set convert-meta off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
SYSTEM = f"You are a coding agent at {os.getcwd()}. Use bash to solve tasks. Act, don't explain."
|
||||
|
||||
# ── Tool definition: just bash ────────────────────────────
|
||||
TOOLS = [{
|
||||
"name": "bash",
|
||||
"description": "Run a shell command.",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {"command": {"type": "string"}},
|
||||
"required": ["command"],
|
||||
},
|
||||
}]
|
||||
|
||||
|
||||
# ── Tool execution ────────────────────────────────────────
|
||||
def run_bash(command: str) -> str:
|
||||
dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"]
|
||||
if any(d in command for d in dangerous):
|
||||
return "Error: Dangerous command blocked"
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=os.getcwd(),
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
except (FileNotFoundError, OSError) as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
# ── The core pattern: a while loop that calls tools until the model stops ──
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
|
||||
# Append assistant turn
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
# If the model didn't call a tool, we're done
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
# Execute each tool call, collect results
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
print(f"\033[33m$ {block.input['command']}\033[0m")
|
||||
output = run_bash(block.input["command"])
|
||||
print(output[:200])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
|
||||
# Feed tool results back, loop continues
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
# ── Entry point ──────────────────────────────────────────
|
||||
if __name__ == "__main__":
|
||||
print("s01: Agent Loop")
|
||||
print("输入问题,回车发送。输入 q 退出。\n")
|
||||
|
||||
history = []
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms01 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
# Print the model's final text response
|
||||
response_content = history[-1]["content"]
|
||||
if isinstance(response_content, list):
|
||||
for block in response_content:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
86
s01_agent_loop/images/agent-loop.en.svg
Normal file
@@ -0,0 +1,86 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 530" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="720" height="530" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Agent Loop — A while Loop Drives the Entire Agent</text>
|
||||
|
||||
<!-- ===== User Input ===== -->
|
||||
<rect x="60" y="80" width="160" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="140" y="102" fill="#1e3a5f" font-size="13" font-weight="600" text-anchor="middle">User Query</text>
|
||||
<text x="140" y="120" fill="#64748b" font-size="11" text-anchor="middle">"Create hello.py for me"</text>
|
||||
|
||||
<!-- Arrow: User → Messages -->
|
||||
<line x1="140" y1="132" x2="140" y2="162" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== Messages ===== -->
|
||||
<rect x="60" y="164" width="160" height="48" rx="8" fill="#f8fafc" stroke="#94a3b8" stroke-width="1.5" stroke-dasharray="4,2"/>
|
||||
<text x="140" y="185" fill="#334155" font-size="12" font-weight="500" text-anchor="middle">messages[]</text>
|
||||
<text x="140" y="201" fill="#94a3b8" font-size="10" text-anchor="middle">Accumulated message list</text>
|
||||
|
||||
<!-- Arrow: Messages → LLM -->
|
||||
<line x1="220" y1="188" x2="288" y2="188" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== LLM ===== -->
|
||||
<rect x="290" y="140" width="180" height="96" rx="8" fill="#fff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="380" y="166" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<line x1="310" y1="176" x2="450" y2="176" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="380" y="194" fill="#475569" font-size="11" text-anchor="middle">Model reads message history</text>
|
||||
<text x="380" y="210" fill="#475569" font-size="11" text-anchor="middle">Decision: Need a tool?</text>
|
||||
<text x="380" y="228" fill="#64748b" font-size="10" text-anchor="middle">Returns stop_reason signal</text>
|
||||
|
||||
<!-- Arrow: LLM → Decision (down) -->
|
||||
<line x1="380" y1="236" x2="380" y2="276" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== Decision Diamond ===== -->
|
||||
<polygon points="380,280 470,316 380,352 290,316" fill="#fff8f0" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="380" y="312" fill="#92400e" font-size="12" font-weight="600" text-anchor="middle">stop_reason</text>
|
||||
<text x="380" y="326" fill="#92400e" font-size="10" text-anchor="middle">== "tool_use"?</text>
|
||||
|
||||
<!-- Arrow: No → End (right) -->
|
||||
<line x1="470" y1="316" x2="540" y2="316" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="505" y="308" fill="#16a34a" font-size="10" font-weight="600" text-anchor="middle">No</text>
|
||||
|
||||
<!-- End Node -->
|
||||
<rect x="542" y="294" width="130" height="44" rx="22" fill="#dcfce7" stroke="#16a34a" stroke-width="2"/>
|
||||
<text x="607" y="313" fill="#166534" font-size="12" font-weight="600" text-anchor="middle">Return Result</text>
|
||||
<text x="607" y="329" fill="#166534" font-size="10" text-anchor="middle">Loop Ends</text>
|
||||
|
||||
<!-- Arrow: Yes → Tool Execution (down) -->
|
||||
<line x1="380" y1="352" x2="380" y2="392" stroke="#d97706" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="395" y="376" fill="#d97706" font-size="10" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ===== Tool Execution ===== -->
|
||||
<rect x="290" y="394" width="180" height="48" rx="8" fill="#fff7ed" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="380" y="415" fill="#92400e" font-size="12" font-weight="600" text-anchor="middle">Execute Tool Call</text>
|
||||
<text x="380" y="432" fill="#92400e" font-size="10" text-anchor="middle">run_bash(command)</text>
|
||||
|
||||
<!-- Arrow: Tool result → Append to messages (loop back) -->
|
||||
<path d="M 290 418 L 40 418 L 40 188 L 58 188" fill="none" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="18" y="375" fill="#92400e" font-size="10" font-weight="500" transform="rotate(-90, 18, 375)">Append tool_result to messages</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="60" y="462" width="600" height="48" rx="6" fill="#f1f5f9"/>
|
||||
<text x="80" y="482" fill="#334155" font-size="10">Core: a</text>
|
||||
<text x="113" y="482" fill="#1e3a5f" font-size="10" font-weight="700" font-family="monospace">while True</text>
|
||||
<text x="186" y="482" fill="#334155" font-size="10">loop. Model calls tool → Execute → Feed back → Ask again. No tool call → Stop.</text>
|
||||
<text x="80" y="500" fill="#64748b" font-size="10">All subsequent chapters layer mechanisms on top of this loop.</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.4 KiB |
86
s01_agent_loop/images/agent-loop.ja.svg
Normal file
@@ -0,0 +1,86 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 530" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="720" height="530" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Agent Loop — 一つの while ループで Agent 全体を駆動</text>
|
||||
|
||||
<!-- ===== ユーザー入力 ===== -->
|
||||
<rect x="60" y="80" width="160" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="140" y="102" fill="#1e3a5f" font-size="13" font-weight="600" text-anchor="middle">ユーザーの質問</text>
|
||||
<text x="140" y="120" fill="#64748b" font-size="11" text-anchor="middle">"hello.py を作って"</text>
|
||||
|
||||
<!-- 矢印:ユーザー → メッセージリスト -->
|
||||
<line x1="140" y1="132" x2="140" y2="162" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== メッセージリスト ===== -->
|
||||
<rect x="60" y="164" width="160" height="48" rx="8" fill="#f8fafc" stroke="#94a3b8" stroke-width="1.5" stroke-dasharray="4,2"/>
|
||||
<text x="140" y="185" fill="#334155" font-size="12" font-weight="500" text-anchor="middle">messages[]</text>
|
||||
<text x="140" y="201" fill="#94a3b8" font-size="10" text-anchor="middle">累積メッセージリスト</text>
|
||||
|
||||
<!-- 矢印:メッセージ → LLM -->
|
||||
<line x1="220" y1="188" x2="288" y2="188" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== LLM ===== -->
|
||||
<rect x="290" y="140" width="180" height="96" rx="8" fill="#fff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="380" y="166" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">大規模言語モデル (LLM)</text>
|
||||
<line x1="310" y1="176" x2="450" y2="176" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="380" y="194" fill="#475569" font-size="11" text-anchor="middle">モデルがメッセージ履歴を読む</text>
|
||||
<text x="380" y="210" fill="#475569" font-size="11" text-anchor="middle">判断:ツールが必要か?</text>
|
||||
<text x="380" y="228" fill="#64748b" font-size="10" text-anchor="middle">stop_reason シグナルを返す</text>
|
||||
|
||||
<!-- 矢印:LLM → 判定(下) -->
|
||||
<line x1="380" y1="236" x2="380" y2="276" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== 判定ダイヤモンド ===== -->
|
||||
<polygon points="380,280 470,316 380,352 290,316" fill="#fff8f0" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="380" y="312" fill="#92400e" font-size="12" font-weight="600" text-anchor="middle">stop_reason</text>
|
||||
<text x="380" y="326" fill="#92400e" font-size="10" text-anchor="middle">== "tool_use"?</text>
|
||||
|
||||
<!-- 矢印:いいえ → 終了(右) -->
|
||||
<line x1="470" y1="316" x2="540" y2="316" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="505" y="308" fill="#16a34a" font-size="10" font-weight="600" text-anchor="middle">No</text>
|
||||
|
||||
<!-- 終了ノード -->
|
||||
<rect x="542" y="294" width="130" height="44" rx="22" fill="#dcfce7" stroke="#16a34a" stroke-width="2"/>
|
||||
<text x="607" y="313" fill="#166534" font-size="12" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
<text x="607" y="329" fill="#166534" font-size="10" text-anchor="middle">ループ終了</text>
|
||||
|
||||
<!-- 矢印:はい → ツール実行(下) -->
|
||||
<line x1="380" y1="352" x2="380" y2="392" stroke="#d97706" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="395" y="376" fill="#d97706" font-size="10" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ===== ツール実行 ===== -->
|
||||
<rect x="290" y="394" width="180" height="48" rx="8" fill="#fff7ed" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="380" y="415" fill="#92400e" font-size="12" font-weight="600" text-anchor="middle">ツール呼び出しを実行</text>
|
||||
<text x="380" y="432" fill="#92400e" font-size="10" text-anchor="middle">run_bash(command)</text>
|
||||
|
||||
<!-- 矢印:ツール結果 → メッセージに追加(ループバック) -->
|
||||
<path d="M 290 418 L 40 418 L 40 188 L 58 188" fill="none" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="18" y="375" fill="#92400e" font-size="10" font-weight="500" transform="rotate(-90, 18, 375)">tool_result を messages に追加</text>
|
||||
|
||||
<!-- 凡例 -->
|
||||
<rect x="60" y="462" width="600" height="48" rx="6" fill="#f1f5f9"/>
|
||||
<text x="80" y="482" fill="#334155" font-size="10">核心:一つの</text>
|
||||
<text x="138" y="482" fill="#1e3a5f" font-size="10" font-weight="700" font-family="monospace">while True</text>
|
||||
<text x="210" y="482" fill="#334155" font-size="10">ループ。ツール呼出 → 実行 → 結果を戻す → 再度問う。ツールなし → 停止。</text>
|
||||
<text x="80" y="500" fill="#64748b" font-size="10">以降の全章がこのループの上に仕組みを積み重ねる。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.7 KiB |
86
s01_agent_loop/images/agent-loop.svg
Normal file
@@ -0,0 +1,86 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 530" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="720" height="530" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Agent Loop — 一个 while 循环驱动整个 Agent</text>
|
||||
|
||||
<!-- ===== 用户输入 ===== -->
|
||||
<rect x="60" y="80" width="160" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="140" y="102" fill="#1e3a5f" font-size="13" font-weight="600" text-anchor="middle">用户提问</text>
|
||||
<text x="140" y="120" fill="#64748b" font-size="11" text-anchor="middle">"帮我创建 hello.py"</text>
|
||||
|
||||
<!-- 箭头:用户 → 消息列表 -->
|
||||
<line x1="140" y1="132" x2="140" y2="162" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== 消息列表 ===== -->
|
||||
<rect x="60" y="164" width="160" height="48" rx="8" fill="#f8fafc" stroke="#94a3b8" stroke-width="1.5" stroke-dasharray="4,2"/>
|
||||
<text x="140" y="185" fill="#334155" font-size="12" font-weight="500" text-anchor="middle">messages[]</text>
|
||||
<text x="140" y="201" fill="#94a3b8" font-size="10" text-anchor="middle">累积式消息列表</text>
|
||||
|
||||
<!-- 箭头:消息列表 → LLM -->
|
||||
<line x1="220" y1="188" x2="288" y2="188" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== LLM ===== -->
|
||||
<rect x="290" y="140" width="180" height="96" rx="8" fill="#fff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="380" y="166" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">大模型 (LLM)</text>
|
||||
<line x1="310" y1="176" x2="450" y2="176" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="380" y="194" fill="#475569" font-size="11" text-anchor="middle">模型阅读消息历史</text>
|
||||
<text x="380" y="210" fill="#475569" font-size="11" text-anchor="middle">判断:需要工具吗?</text>
|
||||
<text x="380" y="228" fill="#64748b" font-size="10" text-anchor="middle">返回 stop_reason 信号</text>
|
||||
|
||||
<!-- 箭头:LLM → 判断(向下) -->
|
||||
<line x1="380" y1="236" x2="380" y2="276" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== 判断菱形 ===== -->
|
||||
<polygon points="380,280 470,316 380,352 290,316" fill="#fff8f0" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="380" y="312" fill="#92400e" font-size="12" font-weight="600" text-anchor="middle">stop_reason</text>
|
||||
<text x="380" y="326" fill="#92400e" font-size="10" text-anchor="middle">== "tool_use"?</text>
|
||||
|
||||
<!-- 箭头:否 → 结束(向右) -->
|
||||
<line x1="470" y1="316" x2="540" y2="316" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="505" y="308" fill="#16a34a" font-size="10" font-weight="600" text-anchor="middle">否</text>
|
||||
|
||||
<!-- 结束节点 -->
|
||||
<rect x="542" y="294" width="130" height="44" rx="22" fill="#dcfce7" stroke="#16a34a" stroke-width="2"/>
|
||||
<text x="607" y="313" fill="#166534" font-size="12" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
<text x="607" y="329" fill="#166534" font-size="10" text-anchor="middle">循环结束</text>
|
||||
|
||||
<!-- 箭头:是 → 工具执行(向下) -->
|
||||
<line x1="380" y1="352" x2="380" y2="392" stroke="#d97706" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="395" y="376" fill="#d97706" font-size="10" font-weight="600">是</text>
|
||||
|
||||
<!-- ===== 工具执行 ===== -->
|
||||
<rect x="290" y="394" width="180" height="48" rx="8" fill="#fff7ed" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="380" y="415" fill="#92400e" font-size="12" font-weight="600" text-anchor="middle">执行工具调用</text>
|
||||
<text x="380" y="432" fill="#92400e" font-size="10" text-anchor="middle">run_bash(command)</text>
|
||||
|
||||
<!-- 箭头:工具结果 → 追加到消息列表(向左弯回去) -->
|
||||
<path d="M 290 418 L 40 418 L 40 188 L 58 188" fill="none" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="18" y="375" fill="#92400e" font-size="10" font-weight="500" transform="rotate(-90, 18, 375)">追加 tool_result 到 messages</text>
|
||||
|
||||
<!-- 图例 -->
|
||||
<rect x="60" y="462" width="600" height="48" rx="6" fill="#f1f5f9"/>
|
||||
<text x="80" y="482" fill="#334155" font-size="10">核心:一个</text>
|
||||
<text x="148" y="482" fill="#1e3a5f" font-size="10" font-weight="700" font-family="monospace">while True</text>
|
||||
<text x="220" y="482" fill="#334155" font-size="10">循环。模型调工具 → 执行 → 喂回 → 再问。不调工具就停。</text>
|
||||
<text x="80" y="500" fill="#64748b" font-size="10">后续所有章节都在这个循环上叠加机制。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.5 KiB |
222
s02_tool_use/README.en.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# s02: Tool Use — Add a Tool, Add Just One Line
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → `s02` → [s03](../s03_permission/) → s04 → ... → s20
|
||||
> *"Add a tool, add just one handler"* — The loop stays the same. Register the new tool in the dispatch map and you're done.
|
||||
>
|
||||
> **Harness Layer**: Tool Dispatch — Expanding the model's reach.
|
||||
|
||||
---
|
||||
|
||||
## Only One Tool: Bash
|
||||
|
||||
The s01 Agent has only one tool: bash. To read a file, `cat`; to write, `echo "..." > file.py`; to edit, `sed`.
|
||||
|
||||
The model thinks "read this file" but has to spell out `cat path/to/file`. An extra layer of translation that wastes tokens and invites errors.
|
||||
|
||||
---
|
||||
|
||||
## Overview: Tool Dispatch
|
||||
|
||||

|
||||
|
||||
The s01 loop is fully preserved (LLM call, stop_reason check, message append — not a single word changed). The only change is in that one line of tool execution: `run_bash()` is replaced with `TOOL_HANDLERS[block.name]()` dispatch lookup.
|
||||
|
||||
Adding a tool to the Agent requires just two things:
|
||||
|
||||
1. **Define the tool**: Add one entry to the `TOOLS` array
|
||||
2. **Register the handler**: Add one mapping in the `TOOL_HANDLERS` dict
|
||||
|
||||
---
|
||||
|
||||
## From 1 Tool to 5 Tools
|
||||
|
||||
s01 had only bash:
|
||||
|
||||
```python
|
||||
TOOLS = [{"name": "bash", ...}]
|
||||
|
||||
def run_bash(command): ...
|
||||
```
|
||||
|
||||
s02 expands to 5 tools, each independently defined:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.", ...},
|
||||
{"name": "read_file", "description": "Read file contents.", ...},
|
||||
{"name": "write_file", "description": "Write content to file.", ...},
|
||||
{"name": "edit_file", "description": "Replace text in file once.", ...},
|
||||
{"name": "glob", "description": "Find files by pattern.", ...},
|
||||
]
|
||||
```
|
||||
|
||||
Each tool has its own implementation function:
|
||||
|
||||
```python
|
||||
def run_read(path, limit=None):
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit:
|
||||
lines = lines[:limit]
|
||||
return "\n".join(lines)
|
||||
|
||||
def run_write(path, content):
|
||||
safe_path(path).write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
|
||||
def run_edit(path, old_text, new_text):
|
||||
text = safe_path(path).read_text()
|
||||
if old_text not in text:
|
||||
return "Error: text not found"
|
||||
safe_path(path).write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
|
||||
def run_glob(pattern):
|
||||
import glob as g
|
||||
return "\n".join(g.glob(pattern, root_dir=WORKDIR))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tool Dispatch
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash,
|
||||
"read_file": run_read,
|
||||
"write_file": run_write,
|
||||
"edit_file": run_edit,
|
||||
"glob": run_glob,
|
||||
}
|
||||
|
||||
# Only one line changed in the loop — from hardcoded run_bash to dispatch lookup:
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = TOOL_HANDLERS[block.name] # lookup
|
||||
output = handler(**block.input) # call
|
||||
results.append(...)
|
||||
```
|
||||
|
||||
Adding a tool = one entry in `TOOLS` array + one line in `TOOL_HANDLERS` dict. The loop stays the same.
|
||||
|
||||
---
|
||||
|
||||
## Multiple Tool Calls
|
||||
|
||||
The model often returns multiple tool_use calls at once — "read a.py and b.py, then list all .py files".
|
||||
|
||||
The teaching version executes them one by one in the original `response.content` order. CC's approach is more complex: it slices the original order into consecutive batches, where concurrency-safe tools within a batch run in parallel, and batches are strictly sequential (see appendix).
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Concept | One-Liner |
|
||||
|---------|-----------|
|
||||
| TOOL_HANDLERS | Tool name → handler function dict. Add a tool = add one mapping line |
|
||||
| Tool Definition | JSON schema telling the model "what I can do" |
|
||||
| Multiple tool calls | Model may return multiple tool_use at once; teaching version executes them in original order |
|
||||
| Loop Unchanged | s01's `while True` loop — not a single line changed |
|
||||
|
||||
---
|
||||
|
||||
## Changes from s01
|
||||
|
||||
| Component | Before (s01) | After (s02) |
|
||||
|-----------|-------------|-------------|
|
||||
| Tool count | 1 (bash) | 5 (+read, write, edit, glob) |
|
||||
| Tool execution | Hardcoded `run_bash()` | TOOL_HANDLERS dispatch lookup |
|
||||
| Path safety | None | safe_path validation (file tools only) |
|
||||
| Loop | `while True` + `stop_reason` | Identical to s01 |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s02_tool_use/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Read the file README.md and tell me what this project is about`
|
||||
2. `Create a file called test.py that prints "hello", then read it back`
|
||||
3. `Find all Python files in this directory`
|
||||
4. `Read both README.md and requirements.txt, then create a summary file`
|
||||
|
||||
What to watch for: When does the model call just one tool, and when does it call multiple at once? Are multiple tool calls executed in the correct order?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
The Agent now has 5 specialized tools. File tools are protected by `safe_path`, but bash is unrestricted — `rm -rf /` still runs.
|
||||
|
||||
→ s03 Permission: Add a gate before tool execution — is this operation safe? Does it need user approval?
|
||||
|
||||
<details>
|
||||
<summary>Dive into CC Source Code</summary>
|
||||
|
||||
> The following is based on a review of CC source code `Tool.ts`, `tools.ts`, `toolOrchestration.ts`, `toolExecution.ts`, and `StreamingToolExecutor.ts`.
|
||||
|
||||
### 1. Tool Definition Approach
|
||||
|
||||
**Teaching version**: `TOOLS` array + `TOOL_HANDLERS` dict. Definition and implementation are separate.
|
||||
**CC**: Each tool is an independent object created by `buildTool()`, containing schema, validation, permissions, and execution. `getAllBaseTools()` aggregates all tools.
|
||||
|
||||
The teaching version's separation is clearer for teaching — readers immediately see "add a tool = two definitions".
|
||||
|
||||
### 2. Concurrency Safety: isConcurrencySafe()
|
||||
|
||||

|
||||
|
||||
The teaching version executes tools one by one in original order, without concurrency. CC uses `isConcurrencySafe(input)` to determine concurrency — note this isn't simply "read-only vs write", but judges by specific input:
|
||||
|
||||
| | isReadOnly | isConcurrencySafe |
|
||||
|---|---|---|
|
||||
| FileRead | true | true |
|
||||
| Glob | true | true |
|
||||
| Bash `ls` | true | **true** ← key difference |
|
||||
| Bash `rm` | false | false |
|
||||
| TaskCreate | false | **true** ← modifies state but can be concurrent (introduced in s12) |
|
||||
|
||||
CC's Bash tool's `isConcurrencySafe` equals `isReadOnly` — read-only commands can be concurrent, write commands cannot. TaskCreate modifies task files, but each writes a different file, so it can be concurrent.
|
||||
|
||||
### 3. Partition Algorithm
|
||||
|
||||
CC's `partitionToolCalls()` (`toolOrchestration.ts:91-115`) doesn't split into two groups — it batches tool calls **by consecutive blocks**:
|
||||
|
||||
```
|
||||
[read A, read B, glob *.py, bash "rm x", read C]
|
||||
→ batch1(concurrent): [read A, read B, glob *.py]
|
||||
→ batch2(serial): [bash "rm x"]
|
||||
→ batch3(concurrent): [read C]
|
||||
```
|
||||
|
||||
Consecutive concurrency-safe calls are grouped into the same batch for truly concurrent execution (`toolOrchestration.ts:152-176`, with a concurrency limit). When a non-concurrency-safe call is encountered, a new batch starts for serial execution. Batches are strictly sequential.
|
||||
|
||||
### 4. Validation Pipeline
|
||||
|
||||
Each tool call in CC goes through a strict 5-step validation (`toolExecution.ts`):
|
||||
|
||||
1. **Zod schema validation** (`614-680`, teaching version uses JSON Schema): parameter type/structure check
|
||||
2. **Tool-level validateInput()** (`682-733`): parameter value validation (e.g., is the path within the working directory)
|
||||
3. **PreToolUse hooks** (`800-862`, covered in s04): hooks can return messages, modify input, or block execution
|
||||
4. **Permission check** (`921-931`, core topic of s03): canUseTool + checkPermissions → allow/deny/ask
|
||||
5. **Execute tool.call()** (`1207-1222`)
|
||||
|
||||
The teaching version omits Zod (uses JSON Schema), omits validateInput (uses safety functions), but preserves the permission check and hook concepts.
|
||||
|
||||
### 5. Streaming Tool Execution
|
||||
|
||||
CC's `StreamingToolExecutor` (`StreamingToolExecutor.ts`) starts tools while the model is still generating — no waiting for the model to finish. `read_file` might complete while the model is still outputting "Let me analyze". The teaching version doesn't implement this, consistent with s01's goal — conceptual clarity, not peak performance.
|
||||
|
||||
### 6. Tool Result Persistence
|
||||
|
||||
Each tool has a `maxResultSizeChars` field. Results exceeding this threshold are persisted to disk, and the model sees a preview + file path. FileRead is special — set to `Infinity`, preventing file read output from being persisted again. Specifically, if FileRead's result exceeds the threshold and gets persisted, the model's next read of that persisted file would trigger another persistence → infinite loop (read file → persist → re-read → re-persist → ...).
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
222
s02_tool_use/README.ja.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# s02: Tool Use — ツール一つ追加、一行追加だけ
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → `s02` → [s03](../s03_permission/) → s04 → ... → s20
|
||||
> *"ツールを一つ追加、ハンドラを一つ追加"* — ループはそのまま。新しいツールをディスパッチマップに登録するだけ。
|
||||
>
|
||||
> **Harness レイヤー**: ツールディスパッチ — モデルが触れる範囲を拡張。
|
||||
|
||||
---
|
||||
|
||||
## ツールは bash 一つだけ
|
||||
|
||||
s01 の Agent には bash 一つのツールしかない。ファイルを読むには `cat`、書くには `echo "..." > file.py`、編集するには `sed`。
|
||||
|
||||
モデルは「このファイルを読みたい」と考えながら、`cat path/to/file` と組み立てなければならない。翻訳の層が一つ増え、トークンを無駄にし、エラーも起きやすい。
|
||||
|
||||
---
|
||||
|
||||
## 概要:ツールディスパッチ
|
||||
|
||||

|
||||
|
||||
s01 のループは完全に保持される(LLM 呼び出し、stop_reason 判定、メッセージ追加 — 一文字も変更なし)。唯一の変更点はツール実行の 1 行:`run_bash()` が `TOOL_HANDLERS[block.name]()` の検索ディスパッチに置き換わる。
|
||||
|
||||
Agent にツールを追加するには、たった二つ:
|
||||
|
||||
1. **ツールを定義**:`TOOLS` 配列に一条を追加
|
||||
2. **ハンドラを登録**:`TOOL_HANDLERS` 辞書に一つのマッピングを追加
|
||||
|
||||
---
|
||||
|
||||
## 1 つのツールから 5 つのツールへ
|
||||
|
||||
s01 には bash だけだった:
|
||||
|
||||
```python
|
||||
TOOLS = [{"name": "bash", ...}]
|
||||
|
||||
def run_bash(command): ...
|
||||
```
|
||||
|
||||
s02 では 5 つに増え、各ツールは独立して定義される:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.", ...},
|
||||
{"name": "read_file", "description": "Read file contents.", ...},
|
||||
{"name": "write_file", "description": "Write content to file.", ...},
|
||||
{"name": "edit_file", "description": "Replace text in file once.", ...},
|
||||
{"name": "glob", "description": "Find files by pattern.", ...},
|
||||
]
|
||||
```
|
||||
|
||||
各ツールには専用の実装関数がある:
|
||||
|
||||
```python
|
||||
def run_read(path, limit=None):
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit:
|
||||
lines = lines[:limit]
|
||||
return "\n".join(lines)
|
||||
|
||||
def run_write(path, content):
|
||||
safe_path(path).write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
|
||||
def run_edit(path, old_text, new_text):
|
||||
text = safe_path(path).read_text()
|
||||
if old_text not in text:
|
||||
return "Error: text not found"
|
||||
safe_path(path).write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
|
||||
def run_glob(pattern):
|
||||
import glob as g
|
||||
return "\n".join(g.glob(pattern, root_dir=WORKDIR))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ツールディスパッチ
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash,
|
||||
"read_file": run_read,
|
||||
"write_file": run_write,
|
||||
"edit_file": run_edit,
|
||||
"glob": run_glob,
|
||||
}
|
||||
|
||||
# ループ内で変更されたのは一行だけ — ハードコードの run_bash から検索ディスパッチへ:
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = TOOL_HANDLERS[block.name] # 検索
|
||||
output = handler(**block.input) # 呼び出し
|
||||
results.append(...)
|
||||
```
|
||||
|
||||
ツールの追加 = `TOOLS` 配列に一条 + `TOOL_HANDLERS` 辞書に一行。ループは変わらない。
|
||||
|
||||
---
|
||||
|
||||
## 複数のツール呼び出し
|
||||
|
||||
モデルはよく一度に複数の tool_use を返す — 「a.py と b.py を読んで、全 .py ファイルを列挙して」。
|
||||
|
||||
教育版は `response.content` の元の順序で一つずつ実行する。CC のやり方はより複雑:元の順序を保ったまま連続バッチに分割し、バッチ内の並列安全なツールを並行実行し、バッチ間は厳密に順次(付録を参照)。
|
||||
|
||||
---
|
||||
|
||||
## 速查
|
||||
|
||||
| 概念 | 一言で |
|
||||
|------|--------|
|
||||
| TOOL_HANDLERS | ツール名 → ハンドラ関数の辞書。ツール追加 = マッピング一行追加 |
|
||||
| ツール定義 | モデルに「何ができるか」を伝える JSON schema |
|
||||
| 複数ツール呼び出し | モデルは一度に複数の tool_use を返す可能性がある。教育版は元の順序で一つずつ実行 |
|
||||
| ループ不変 | s01 の `while True` ループ — 一行も変更なし |
|
||||
|
||||
---
|
||||
|
||||
## s01 からの変更
|
||||
|
||||
| コンポーネント | 変更前 (s01) | 変更後 (s02) |
|
||||
|--------------|-------------|-------------|
|
||||
| ツール数 | 1 (bash) | 5 (+read, write, edit, glob) |
|
||||
| ツール実行 | ハードコード `run_bash()` | TOOL_HANDLERS 検索ディスパッチ |
|
||||
| パス安全性 | なし | safe_path 検証(file tools のみ) |
|
||||
| ループ | `while True` + `stop_reason` | s01 と完全に同一 |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s02_tool_use/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみよう:
|
||||
|
||||
1. `Read the file README.md and tell me what this project is about`
|
||||
2. `Create a file called test.py that prints "hello", then read it back`
|
||||
3. `Find all Python files in this directory`
|
||||
4. `Read both README.md and requirements.txt, then create a summary file`
|
||||
|
||||
観察のポイント:モデルがツールを一つだけ呼び出すときと、複数同時に呼び出すときの違い。複数のツール呼び出しは正しい順序で実行されているか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
Agent は 5 つの専用ツールを持つようになった。file tools は `safe_path` で保護されるが、bash は制限なし — `rm -rf /` はまだ実行できる。
|
||||
|
||||
→ s03 Permission:ツール実行前にゲートを追加 — この操作は安全か? ユーザーの承認が必要か?
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードを深掘り</summary>
|
||||
|
||||
> 以下は CC ソースコード `Tool.ts`、`tools.ts`、`toolOrchestration.ts`、`toolExecution.ts`、`StreamingToolExecutor.ts` の検証に基づく。
|
||||
|
||||
### 一、ツール定義方式
|
||||
|
||||
**教育版**:`TOOLS` 配列 + `TOOL_HANDLERS` 辞書。定義と実装が分離。
|
||||
**CC**:各ツールは `buildTool()` で作成された独立オブジェクトで、schema、バリデーション、権限、実行を含む。`getAllBaseTools()` が全ツールを集約。
|
||||
|
||||
教育版の分離方式は教学に適している — 読者は「ツール追加 = 二つの定義」と一目で分かる。
|
||||
|
||||
### 二、並列安全性:isConcurrencySafe()
|
||||
|
||||

|
||||
|
||||
教育版は元の順序で一つずつ実行し、並列処理は行わない。CC は `isConcurrencySafe(input)` で並列可否を判断する — これは単なる「読み取り専用 vs 書き込み」ではなく、具体的な入力で判断する:
|
||||
|
||||
| | isReadOnly | isConcurrencySafe |
|
||||
|---|---|---|
|
||||
| FileRead | true | true |
|
||||
| Glob | true | true |
|
||||
| Bash `ls` | true | **true** ← 重要な違い |
|
||||
| Bash `rm` | false | false |
|
||||
| TaskCreate | false | **true** ← 状態変更するが並列可能(s12 で紹介) |
|
||||
|
||||
CC の Bash ツールの `isConcurrencySafe` は `isReadOnly` と同じ — 読み取り専用コマンドは並列可能、書き込みコマンドは不可。TaskCreate はタスクファイルを変更するが、毎回異なるファイルに書き込むため並列可能。
|
||||
|
||||
### 三、パーティションアルゴリズム
|
||||
|
||||
CC の `partitionToolCalls()`(`toolOrchestration.ts:91-115`)は二つのグループに分けるのではなく、ツール呼び出しを**連続ブロックごとにバッチ化**する:
|
||||
|
||||
```
|
||||
[read A, read B, glob *.py, bash "rm x", read C]
|
||||
→ batch1(並列): [read A, read B, glob *.py]
|
||||
→ batch2(直列): [bash "rm x"]
|
||||
→ batch3(並列): [read C]
|
||||
```
|
||||
|
||||
連続する並列安全な呼び出しを同じバッチにまとめ、真の並列実行を行う(`toolOrchestration.ts:152-176`、並列数上限あり)。非並列安全な呼び出しに遭遇すると新しいバッチを開始して直列実行。バッチ間は厳密に順次。
|
||||
|
||||
### 四、バリデーションパイプライン
|
||||
|
||||
CC の各ツール呼び出しは厳格な 5 段階のバリデーションを経る(`toolExecution.ts`):
|
||||
|
||||
1. **Zod schema バリデーション**(`614-680`、教育版は JSON Schema で代替):パラメータの型/構造チェック
|
||||
2. **ツールレベル validateInput()**(`682-733`):パラメータ値の検証(例:パスが作業ディレクトリ内か)
|
||||
3. **PreToolUse フック**(`800-862`、s04 で詳解):フックはメッセージの返却、入力の変更、実行のブロックが可能
|
||||
4. **権限チェック**(`921-931`、s03 の核心):canUseTool + checkPermissions → allow/deny/ask
|
||||
5. **tool.call() の実行**(`1207-1222`)
|
||||
|
||||
教育版は Zod を省略(JSON Schema を使用)、validateInput を省略(安全関数を使用)、権限チェックとフック概念は保持。
|
||||
|
||||
### 五、ストリーミングツール実行
|
||||
|
||||
CC の `StreamingToolExecutor`(`StreamingToolExecutor.ts`)はモデルがまだ生成中にツールを起動する — モデルの完了を待たない。`read_file` はモデルが「分析します」と出力中に完了するかもしれない。教育版はこれを実装しない。s01 と同じ目標 — 概念の明確さ、極限のパフォーマンスではない。
|
||||
|
||||
### 六、ツール結果の永続化
|
||||
|
||||
各ツールには `maxResultSizeChars` フィールドがある。この閾値を超える結果はディスクに保存され、モデルにはプレビュー + ファイルパスが表示される。FileRead は特殊 — `Infinity` に設定され、ファイル読み出し結果の再永続化を防ぐ。具体的には、FileRead の結果が閾値を超えて永続化されると、モデルがその永続化ファイルを次に読むときにまた永続化がトリガーされ → 無限ループ(ファイル読む → 永続化 → 再読み → 再永続化 → ...)になる。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
222
s02_tool_use/README.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# s02: Tool Use — 多加一个工具,只加一行
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → `s02` → [s03](../s03_permission/) → s04 → ... → s20
|
||||
> *"加一个工具, 只加一个 handler"* — 循环不用动, 新工具注册进 dispatch map 就行。
|
||||
>
|
||||
> **Harness 层**: 工具分发 — 扩展模型能触达的边界。
|
||||
|
||||
---
|
||||
|
||||
## 只有 bash 一个工具
|
||||
|
||||
s01 的 Agent 只有一个 bash 工具。读文件要 `cat`,写文件要 `echo "..." > file.py`,改文件要 `sed`。
|
||||
|
||||
模型想的是"读这个文件",却要拼出 `cat path/to/file`。多了一层翻译,浪费 token,还容易拼错。
|
||||
|
||||
---
|
||||
|
||||
## 全局视角:工具分发
|
||||
|
||||

|
||||
|
||||
s01 的循环完全保留(LLM 调用、stop_reason 判断、消息追加)。唯一的变动在工具执行那 1 行:`run_bash()` 替换为 `TOOL_HANDLERS[block.name]()` 查表分发。
|
||||
|
||||
给 Agent 加一个工具只需要做两件事:
|
||||
|
||||
1. **定义工具**:在 `TOOLS` 数组里加一条描述
|
||||
2. **注册处理函数**:在 `TOOL_HANDLERS` 字典里加一个映射
|
||||
|
||||
---
|
||||
|
||||
## 从 1 个工具到 5 个工具
|
||||
|
||||
s01 只有一个 bash:
|
||||
|
||||
```python
|
||||
TOOLS = [{"name": "bash", ...}]
|
||||
|
||||
def run_bash(command): ...
|
||||
```
|
||||
|
||||
s02 加到 5 个,每个工具都是独立定义:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.", ...},
|
||||
{"name": "read_file", "description": "Read file contents.", ...},
|
||||
{"name": "write_file", "description": "Write content to file.", ...},
|
||||
{"name": "edit_file", "description": "Replace text in file once.", ...},
|
||||
{"name": "glob", "description": "Find files by pattern.", ...},
|
||||
]
|
||||
```
|
||||
|
||||
每个工具有自己的实现函数:
|
||||
|
||||
```python
|
||||
def run_read(path, limit=None):
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit:
|
||||
lines = lines[:limit]
|
||||
return "\n".join(lines)
|
||||
|
||||
def run_write(path, content):
|
||||
safe_path(path).write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
|
||||
def run_edit(path, old_text, new_text):
|
||||
text = safe_path(path).read_text()
|
||||
if old_text not in text:
|
||||
return "Error: text not found"
|
||||
safe_path(path).write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
|
||||
def run_glob(pattern):
|
||||
import glob as g
|
||||
return "\n".join(g.glob(pattern, root_dir=WORKDIR))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 工具分发
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash,
|
||||
"read_file": run_read,
|
||||
"write_file": run_write,
|
||||
"edit_file": run_edit,
|
||||
"glob": run_glob,
|
||||
}
|
||||
|
||||
# 循环里只改了一行——从硬编码 run_bash 变成查表:
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = TOOL_HANDLERS[block.name] # 查表
|
||||
output = handler(**block.input) # 调用
|
||||
results.append(...)
|
||||
```
|
||||
|
||||
加一个工具 = 在 `TOOLS` 数组加一条 + 在 `TOOL_HANDLERS` 字典加一行。循环不变。
|
||||
|
||||
---
|
||||
|
||||
## 多个工具调用
|
||||
|
||||
模型经常一次返回多个 tool_use:"读一下 a.py 和 b.py,然后列出所有 .py 文件"。
|
||||
|
||||
教学版按 `response.content` 原始顺序逐个执行。CC 的做法更复杂:按原始顺序切成连续 batch,batch 内并发安全的工具并行执行,batch 间严格顺序(见附录)。
|
||||
|
||||
---
|
||||
|
||||
## 速查
|
||||
|
||||
| 概念 | 一句话 |
|
||||
|------|--------|
|
||||
| TOOL_HANDLERS | 工具名 → 处理函数的字典。加工具 = 加一行映射 |
|
||||
| 工具定义 | 告诉模型"我能做什么"的 JSON schema |
|
||||
| 多工具调用 | 模型可一次返回多个 tool_use,教学版按原始顺序逐个执行 |
|
||||
| 循环不变 | s01 的 `while True` 循环一行都没改 |
|
||||
|
||||
---
|
||||
|
||||
## 相对 s01 的变更
|
||||
|
||||
| 组件 | 之前 (s01) | 之后 (s02) |
|
||||
|------|-----------|-----------|
|
||||
| 工具数量 | 1 (bash) | 5 (+read, write, edit, glob) |
|
||||
| 工具执行 | 硬编码 `run_bash()` | TOOL_HANDLERS 查表分发 |
|
||||
| 路径安全 | 无 | safe_path 校验(仅 file tools) |
|
||||
| 循环 | `while True` + `stop_reason` | 与 s01 完全一致 |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s02_tool_use/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Read the file README.md and tell me what this project is about`
|
||||
2. `Create a file called test.py that prints "hello", then read it back`
|
||||
3. `Find all Python files in this directory`
|
||||
4. `Read both README.md and requirements.txt, then create a summary file`
|
||||
|
||||
观察重点:模型什么时候只调一个工具,什么时候一次调多个?多个工具调用的顺序和结果是否正确?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
现在 Agent 有 5 个专用工具。file tools 受 `safe_path` 保护,但 bash 不受限制,`rm -rf /` 还是能跑。
|
||||
|
||||
s03 Permission → 在工具执行之前加一道门:这个操作安全吗?需要用户批准吗?
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `Tool.ts`、`tools.ts`、`toolOrchestration.ts`、`toolExecution.ts`、`StreamingToolExecutor.ts` 的核查。
|
||||
|
||||
### 一、工具定义方式
|
||||
|
||||
**教学版**:`TOOLS` 数组 + `TOOL_HANDLERS` 字典。定义和实现分开。
|
||||
**CC**:每个工具是 `buildTool()` 创建的独立对象,包含 schema、验证、权限、执行。`getAllBaseTools()` 汇总所有工具。
|
||||
|
||||
教学版的分离方式对教学更清晰——读者一眼看到"加一个工具 = 两条定义"。
|
||||
|
||||
### 二、并发安全判断:isConcurrencySafe()
|
||||
|
||||

|
||||
|
||||
教学版按原始顺序逐个执行,不做并发。CC 用 `isConcurrencySafe(input)` 判断能否并发——注意这不是简单的"只读 vs 写",而是按具体输入判断:
|
||||
|
||||
| | isReadOnly | isConcurrencySafe |
|
||||
|---|---|---|
|
||||
| FileRead | true | true |
|
||||
| Glob | true | true |
|
||||
| Bash `ls` | true | **true** ← 关键差异 |
|
||||
| Bash `rm` | false | false |
|
||||
| TaskCreate | false | **true** ← 改状态但可并发(TaskCreate 在 s12 介绍) |
|
||||
|
||||
CC 的 Bash tool 的 `isConcurrencySafe` 等于 `isReadOnly`——只读命令可并发,写命令不可。TaskCreate 虽然改了任务文件,但每次都写不同的文件,所以可以并发。
|
||||
|
||||
### 三、分区算法
|
||||
|
||||
CC 的 `partitionToolCalls()`(`toolOrchestration.ts:91-115`)不是分两组,而是把工具调用**按连续块分批**:
|
||||
|
||||
```
|
||||
[read A, read B, glob *.py, bash "rm x", read C]
|
||||
→ batch1(并发): [read A, read B, glob *.py]
|
||||
→ batch2(串行): [bash "rm x"]
|
||||
→ batch3(并发): [read C]
|
||||
```
|
||||
|
||||
并发安全的连续块编入同一个 batch,batch 内真正并发执行(`toolOrchestration.ts:152-176`,有并发上限)。遇到非并发安全的就开新 batch 串行执行。batch 之间严格顺序。
|
||||
|
||||
### 四、验证管线
|
||||
|
||||
CC 的每个工具调用经过严格的 5 步验证(`toolExecution.ts`):
|
||||
|
||||
1. **Zod schema 验证**(`614-680`,教学版用 JSON Schema 替代):参数类型/结构检查
|
||||
2. **工具级 validateInput()**(`682-733`):参数值验证(如路径是否在工作区内)
|
||||
3. **PreToolUse hooks**(`800-862`,s04 详细介绍):钩子可以返回消息、修改输入、阻止执行
|
||||
4. **权限检查**(`921-931`,s03 的核心内容):canUseTool + checkPermissions → allow/deny/ask
|
||||
5. **执行 tool.call()**(`1207-1222`)
|
||||
|
||||
教学版省略了 Zod(用 JSON Schema)、省略了 validateInput(用安全函数)、保留了权限检查和钩子概念。
|
||||
|
||||
### 五、流式工具执行
|
||||
|
||||
CC 的 `StreamingToolExecutor`(`StreamingToolExecutor.ts`)让工具在模型还在生成时就启动——不等模型说完。`read_file` 可能在模型还在输出"我来分析"的时候就跑完了。教学版不实现这个,目标和 s01 一致——概念清晰,不追求性能极致。
|
||||
|
||||
### 六、工具结果持久化
|
||||
|
||||
每个工具有一个 `maxResultSizeChars` 字段。结果超过这个值就落盘,模型看到的是预览 + 文件路径。FileRead 特殊——设为 `Infinity`,防止读文件的输出又被当成文件落盘。具体来说,如果 FileRead 的结果超过阈值被落盘,模型下次读那个落盘文件时又会触发落盘 → 无限循环(读文件 → 落盘 → 再读 → 再落盘 → ...)。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v0, ja@v0 -->
|
||||
189
s02_tool_use/code.py
Normal file
@@ -0,0 +1,189 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s02: Tool Use — 在 s01 基础上新增 4 个工具 + 分发映射。
|
||||
|
||||
运行: python s02_tool_use/code.py
|
||||
需要: pip install anthropic python-dotenv + .env 中配置 ANTHROPIC_API_KEY
|
||||
|
||||
本文件 = s01 的全部代码 + 以下新增:
|
||||
+ run_read / run_write / run_edit / run_glob 四个工具实现
|
||||
+ TOOL_HANDLERS 分发映射(替代 s01 中硬编码的 run_bash 调用)
|
||||
+ safe_path 路径安全校验
|
||||
|
||||
循环本身(agent_loop)与 s01 完全一致。
|
||||
"""
|
||||
|
||||
import os, subprocess
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
readline.parse_and_bind('set input-meta on')
|
||||
readline.parse_and_bind('set output-meta on')
|
||||
readline.parse_and_bind('set convert-meta off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks. Act, don't explain."
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s01 (unchanged)
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"]
|
||||
if any(d in command for d in dangerous):
|
||||
return "Error: Dangerous command blocked"
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
except (FileNotFoundError, OSError) as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s02: 4 个新工具
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text:
|
||||
return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s02: 工具定义(s01 只有一个 bash,现在扩展到 5 个)
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
]
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s02: 工具分发映射(s01 是硬编码 run_bash,现在改为查表)
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob,
|
||||
}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — 与 s01 结构完全一致,只改了工具执行那部分
|
||||
# s01: output = run_bash(block.input["command"])
|
||||
# s02: output = TOOL_HANDLERS[block.name](**block.input)
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
print(f"\033[33m> {block.name}\033[0m")
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
print(str(output)[:200])
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
|
||||
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s02: Tool Use — 在 s01 基础上加了 4 个工具")
|
||||
print("输入问题,回车发送。输入 q 退出。\n")
|
||||
|
||||
history = []
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms02 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
108
s02_tool_use/images/concurrency-comparison.en.svg
Normal file
@@ -0,0 +1,108 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 500" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="teach" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="cc" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dcfce7"/><stop offset="100%" stop-color="#bbf7d0"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-g" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="5" markerHeight="5" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="500" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="760" height="36" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="28" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="24" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Tool Concurrency — Teaching Version vs Claude Code</text>
|
||||
|
||||
<!-- Input tool blocks -->
|
||||
<rect x="180" y="52" width="400" height="28" rx="14" fill="#f1f5f9" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="380" y="71" fill="#475569" font-size="11" font-weight="600" text-anchor="middle">Model returns 5 tool calls at once</text>
|
||||
|
||||
<rect x="38" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="88" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">read A.py</text>
|
||||
|
||||
<rect x="148" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="198" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">glob *.py</text>
|
||||
|
||||
<rect x="258" y="92" width="110" height="36" rx="4" fill="#fef3c7" stroke="#fbbf24" stroke-width="1"/>
|
||||
<text x="313" y="114" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">bash "ls -la"</text>
|
||||
|
||||
<rect x="378" y="92" width="100" height="36" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="428" y="114" fill="#991b1b" font-size="10" font-weight="600" text-anchor="middle">write B.py</text>
|
||||
|
||||
<rect x="488" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="538" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">read C.py</text>
|
||||
|
||||
<!-- LEFT: Teaching Version -->
|
||||
<rect x="20" y="156" width="350" height="230" rx="8" fill="url(#teach)" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="195" y="180" fill="#92400e" font-size="13" font-weight="700" text-anchor="middle">Teaching: Original Order, One by One</text>
|
||||
|
||||
<rect x="35" y="192" width="320" height="46" rx="4" fill="#fff" stroke="#fbbf24" stroke-width="0.5"/>
|
||||
<text x="46" y="209" fill="#92400e" font-size="9" font-family="monospace">for block in response.content:</text>
|
||||
<text x="46" y="224" fill="#92400e" font-size="9" font-family="monospace"> TOOL_HANDLERS[name](**input)</text>
|
||||
|
||||
<text x="195" y="258" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">Result: 5 serial calls, no batches</text>
|
||||
|
||||
<rect x="45" y="270" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="284" fill="#92400e" font-size="8" font-weight="600">1. read A.py</text>
|
||||
|
||||
<rect x="45" y="294" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="308" fill="#92400e" font-size="8" font-weight="600">2. glob *.py</text>
|
||||
|
||||
<rect x="45" y="318" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="332" fill="#92400e" font-size="8" font-weight="600">3. bash "ls -la"</text>
|
||||
|
||||
<rect x="45" y="342" width="145" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="356" fill="#92400e" font-size="8" font-weight="600">4. write B.py</text>
|
||||
|
||||
<rect x="200" y="342" width="145" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="215" y="356" fill="#92400e" font-size="8" font-weight="600">5. read C.py</text>
|
||||
|
||||
<text x="195" y="378" fill="#dc2626" font-size="8" font-weight="600" text-anchor="middle">Teaching focus: tool dispatch first; concurrency omitted</text>
|
||||
|
||||
<!-- RIGHT: Claude Code -->
|
||||
<rect x="390" y="156" width="350" height="230" rx="8" fill="url(#cc)" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="565" y="180" fill="#166534" font-size="13" font-weight="700" text-anchor="middle">Claude Code: isConcurrencySafe(input)</text>
|
||||
|
||||
<rect x="405" y="192" width="320" height="38" rx="4" fill="#fff" stroke="#86efac" stroke-width="0.5"/>
|
||||
<text x="416" y="207" fill="#166534" font-size="9" font-family="monospace">Each tool call judged individually:</text>
|
||||
<text x="416" y="222" fill="#166534" font-size="9" font-family="monospace">tool.isConcurrencySafe(parsedInput) → bool</text>
|
||||
|
||||
<text x="565" y="250" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">Result: 3 batches (by consecutive blocks)</text>
|
||||
|
||||
<rect x="400" y="258" width="155" height="50" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="1"/>
|
||||
<text x="477" y="276" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">Batch 1</text>
|
||||
<text x="477" y="289" fill="#166534" font-size="8" text-anchor="middle">Concurrent</text>
|
||||
<text x="477" y="302" fill="#166534" font-size="7" text-anchor="middle">read A · glob · bash "ls"</text>
|
||||
|
||||
<line x1="560" y1="283" x2="575" y2="283" stroke="#16a34a" stroke-width="1" marker-end="url(#arrow-g)"/>
|
||||
|
||||
<rect x="580" y="258" width="65" height="50" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="612" y="276" fill="#991b1b" font-size="8" font-weight="600" text-anchor="middle">Batch 2</text>
|
||||
<text x="612" y="289" fill="#991b1b" font-size="8" text-anchor="middle">Serial</text>
|
||||
<text x="612" y="302" fill="#991b1b" font-size="7" text-anchor="middle">write B</text>
|
||||
|
||||
<line x1="650" y1="283" x2="665" y2="283" stroke="#16a34a" stroke-width="1" marker-end="url(#arrow-g)"/>
|
||||
|
||||
<rect x="670" y="258" width="55" height="50" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="1"/>
|
||||
<text x="697" y="276" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">Batch 3</text>
|
||||
<text x="697" y="289" fill="#166534" font-size="8" text-anchor="middle">Concurrent</text>
|
||||
<text x="697" y="302" fill="#166534" font-size="7" text-anchor="middle">read C</text>
|
||||
|
||||
<text x="565" y="332" fill="#16a34a" font-size="8" font-weight="600" text-anchor="middle">bash "ls" is safe and consecutive, so it stays in Batch 1</text>
|
||||
|
||||
<text x="565" y="366" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">✓ Input-dependent safety, not tool-name hardcoding</text>
|
||||
<text x="565" y="380" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">✓ Original order preserved; only safe consecutive calls run together</text>
|
||||
|
||||
<!-- Bottom Summary -->
|
||||
<rect x="20" y="402" width="720" height="82" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="40" y="424" fill="#1e3a5f" font-size="12" font-weight="600">Key Difference</text>
|
||||
<text x="40" y="444" fill="#475569" font-size="10">• Teaching: executes response.content in original order, one tool call at a time; no concurrency or batching</text>
|
||||
<text x="40" y="460" fill="#475569" font-size="10">• CC: checks isConcurrencySafe(input), then groups consecutive safe calls into one batch</text>
|
||||
<text x="40" y="476" fill="#475569" font-size="10">• Key difference: teaching focuses on dispatch; CC optimizes safe concurrency while preserving order semantics</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.7 KiB |
108
s02_tool_use/images/concurrency-comparison.ja.svg
Normal file
@@ -0,0 +1,108 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 500" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="teach" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="cc" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dcfce7"/><stop offset="100%" stop-color="#bbf7d0"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-g" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="5" markerHeight="5" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="500" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="760" height="36" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="28" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="24" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">ツール並列実行 — 教育版 vs Claude Code</text>
|
||||
|
||||
<!-- 入力ツールブロック -->
|
||||
<rect x="180" y="52" width="400" height="28" rx="14" fill="#f1f5f9" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="380" y="71" fill="#475569" font-size="11" font-weight="600" text-anchor="middle">モデルが一度に 5 つのツール呼び出しを返す</text>
|
||||
|
||||
<rect x="38" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="88" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">read A.py</text>
|
||||
|
||||
<rect x="148" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="198" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">glob *.py</text>
|
||||
|
||||
<rect x="258" y="92" width="110" height="36" rx="4" fill="#fef3c7" stroke="#fbbf24" stroke-width="1"/>
|
||||
<text x="313" y="114" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">bash "ls -la"</text>
|
||||
|
||||
<rect x="378" y="92" width="100" height="36" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="428" y="114" fill="#991b1b" font-size="10" font-weight="600" text-anchor="middle">write B.py</text>
|
||||
|
||||
<rect x="488" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="538" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">read C.py</text>
|
||||
|
||||
<!-- 左:教育版 -->
|
||||
<rect x="20" y="156" width="350" height="230" rx="8" fill="url(#teach)" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="195" y="180" fill="#92400e" font-size="13" font-weight="700" text-anchor="middle">教育版:元の順序で一つずつ実行</text>
|
||||
|
||||
<rect x="35" y="192" width="320" height="46" rx="4" fill="#fff" stroke="#fbbf24" stroke-width="0.5"/>
|
||||
<text x="46" y="209" fill="#92400e" font-size="9" font-family="monospace">for block in response.content:</text>
|
||||
<text x="46" y="224" fill="#92400e" font-size="9" font-family="monospace"> TOOL_HANDLERS[name](**input)</text>
|
||||
|
||||
<text x="195" y="258" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">結果:5 回の直列呼び出し、batch なし</text>
|
||||
|
||||
<rect x="45" y="270" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="284" fill="#92400e" font-size="8" font-weight="600">1. read A.py</text>
|
||||
|
||||
<rect x="45" y="294" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="308" fill="#92400e" font-size="8" font-weight="600">2. glob *.py</text>
|
||||
|
||||
<rect x="45" y="318" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="332" fill="#92400e" font-size="8" font-weight="600">3. bash "ls -la"</text>
|
||||
|
||||
<rect x="45" y="342" width="145" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="356" fill="#92400e" font-size="8" font-weight="600">4. write B.py</text>
|
||||
|
||||
<rect x="200" y="342" width="145" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="215" y="356" fill="#92400e" font-size="8" font-weight="600">5. read C.py</text>
|
||||
|
||||
<text x="195" y="378" fill="#dc2626" font-size="8" font-weight="600" text-anchor="middle">教育の焦点:まず tool_use 分配を理解し、並列は省略</text>
|
||||
|
||||
<!-- 右:Claude Code -->
|
||||
<rect x="390" y="156" width="350" height="230" rx="8" fill="url(#cc)" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="565" y="180" fill="#166534" font-size="13" font-weight="700" text-anchor="middle">Claude Code:isConcurrencySafe(input)</text>
|
||||
|
||||
<rect x="405" y="192" width="320" height="38" rx="4" fill="#fff" stroke="#86efac" stroke-width="0.5"/>
|
||||
<text x="416" y="207" fill="#166534" font-size="9" font-family="monospace">各ツール呼び出しを個別に判定:</text>
|
||||
<text x="416" y="222" fill="#166534" font-size="9" font-family="monospace">tool.isConcurrencySafe(parsedInput) → bool</text>
|
||||
|
||||
<text x="565" y="250" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">結果:3 バッチ(連続ブロックごと)</text>
|
||||
|
||||
<rect x="400" y="258" width="155" height="50" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="1"/>
|
||||
<text x="477" y="276" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">Batch 1</text>
|
||||
<text x="477" y="289" fill="#166534" font-size="8" text-anchor="middle">並列</text>
|
||||
<text x="477" y="302" fill="#166534" font-size="7" text-anchor="middle">read A · glob · bash "ls"</text>
|
||||
|
||||
<line x1="560" y1="283" x2="575" y2="283" stroke="#16a34a" stroke-width="1" marker-end="url(#arrow-g)"/>
|
||||
|
||||
<rect x="580" y="258" width="65" height="50" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="612" y="276" fill="#991b1b" font-size="8" font-weight="600" text-anchor="middle">Batch 2</text>
|
||||
<text x="612" y="289" fill="#991b1b" font-size="8" text-anchor="middle">直列</text>
|
||||
<text x="612" y="302" fill="#991b1b" font-size="7" text-anchor="middle">write B</text>
|
||||
|
||||
<line x1="650" y1="283" x2="665" y2="283" stroke="#16a34a" stroke-width="1" marker-end="url(#arrow-g)"/>
|
||||
|
||||
<rect x="670" y="258" width="55" height="50" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="1"/>
|
||||
<text x="697" y="276" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">Batch 3</text>
|
||||
<text x="697" y="289" fill="#166534" font-size="8" text-anchor="middle">並列</text>
|
||||
<text x="697" y="302" fill="#166534" font-size="7" text-anchor="middle">read C</text>
|
||||
|
||||
<text x="565" y="332" fill="#16a34a" font-size="8" font-weight="600" text-anchor="middle">bash "ls" は安全かつ連続しているため Batch 1 に入る</text>
|
||||
|
||||
<text x="565" y="366" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">✓ 入力に基づく安全判定、ツール名ハードコードではない</text>
|
||||
<text x="565" y="380" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">✓ 元の順序を保ち、連続する安全呼び出しだけ並列化</text>
|
||||
|
||||
<!-- 下部まとめ -->
|
||||
<rect x="20" y="402" width="720" height="82" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="40" y="424" fill="#1e3a5f" font-size="12" font-weight="600">核心的な違い</text>
|
||||
<text x="40" y="444" fill="#475569" font-size="10">• 教育版:response.content の元の順序で一つずつ実行し、並列処理も batch 化もしない</text>
|
||||
<text x="40" y="460" fill="#475569" font-size="10">• CC:isConcurrencySafe(input) で判定し、連続する安全呼び出しを同じ batch にまとめる</text>
|
||||
<text x="40" y="476" fill="#475569" font-size="10">• 差分の要点:教育版は分配に集中し、CC は順序意味を保ったまま安全な並列を最適化する</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.8 KiB |
108
s02_tool_use/images/concurrency-comparison.svg
Normal file
@@ -0,0 +1,108 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 500" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="teach" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fef3c7"/><stop offset="100%" stop-color="#fde68a"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="cc" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dcfce7"/><stop offset="100%" stop-color="#bbf7d0"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-g" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="5" markerHeight="5" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="500" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="760" height="36" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="28" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="24" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Tool Concurrency — 教学版 vs Claude Code</text>
|
||||
|
||||
<!-- Input tool blocks -->
|
||||
<rect x="180" y="52" width="400" height="28" rx="14" fill="#f1f5f9" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="380" y="71" fill="#475569" font-size="11" font-weight="600" text-anchor="middle">模型一次返回 5 个工具调用</text>
|
||||
|
||||
<rect x="38" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="88" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">read A.py</text>
|
||||
|
||||
<rect x="148" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="198" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">glob *.py</text>
|
||||
|
||||
<rect x="258" y="92" width="110" height="36" rx="4" fill="#fef3c7" stroke="#fbbf24" stroke-width="1"/>
|
||||
<text x="313" y="114" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">bash "ls -la"</text>
|
||||
|
||||
<rect x="378" y="92" width="100" height="36" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="428" y="114" fill="#991b1b" font-size="10" font-weight="600" text-anchor="middle">write B.py</text>
|
||||
|
||||
<rect x="488" y="92" width="100" height="36" rx="4" fill="#dbeafe" stroke="#93c5fd" stroke-width="1"/>
|
||||
<text x="538" y="114" fill="#1e40af" font-size="10" font-weight="600" text-anchor="middle">read C.py</text>
|
||||
|
||||
<!-- LEFT: Teaching Version -->
|
||||
<rect x="20" y="156" width="350" height="230" rx="8" fill="url(#teach)" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="195" y="180" fill="#92400e" font-size="13" font-weight="700" text-anchor="middle">教学版:按原始顺序逐个执行</text>
|
||||
|
||||
<rect x="35" y="192" width="320" height="46" rx="4" fill="#fff" stroke="#fbbf24" stroke-width="0.5"/>
|
||||
<text x="46" y="209" fill="#92400e" font-size="9" font-family="monospace">for block in response.content:</text>
|
||||
<text x="46" y="224" fill="#92400e" font-size="9" font-family="monospace"> TOOL_HANDLERS[name](**input)</text>
|
||||
|
||||
<text x="195" y="258" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">结果:5 次串行调用,不做 batch</text>
|
||||
|
||||
<rect x="45" y="270" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="284" fill="#92400e" font-size="8" font-weight="600">1. read A.py</text>
|
||||
|
||||
<rect x="45" y="294" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="308" fill="#92400e" font-size="8" font-weight="600">2. glob *.py</text>
|
||||
|
||||
<rect x="45" y="318" width="300" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="332" fill="#92400e" font-size="8" font-weight="600">3. bash "ls -la"</text>
|
||||
|
||||
<rect x="45" y="342" width="145" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="60" y="356" fill="#92400e" font-size="8" font-weight="600">4. write B.py</text>
|
||||
|
||||
<rect x="200" y="342" width="145" height="20" rx="4" fill="#fff7ed" stroke="#fbbf24" stroke-width="0.7"/>
|
||||
<text x="215" y="356" fill="#92400e" font-size="8" font-weight="600">5. read C.py</text>
|
||||
|
||||
<text x="195" y="378" fill="#dc2626" font-size="8" font-weight="600" text-anchor="middle">教学重点:先理解 tool_use 分发,暂不引入并发执行</text>
|
||||
|
||||
<!-- RIGHT: Claude Code -->
|
||||
<rect x="390" y="156" width="350" height="230" rx="8" fill="url(#cc)" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="565" y="180" fill="#166534" font-size="13" font-weight="700" text-anchor="middle">Claude Code:isConcurrencySafe(input)</text>
|
||||
|
||||
<rect x="405" y="192" width="320" height="38" rx="4" fill="#fff" stroke="#86efac" stroke-width="0.5"/>
|
||||
<text x="416" y="207" fill="#166534" font-size="9" font-family="monospace">每个工具调用单独判断:</text>
|
||||
<text x="416" y="222" fill="#166534" font-size="9" font-family="monospace">tool.isConcurrencySafe(parsedInput) → bool</text>
|
||||
|
||||
<text x="565" y="250" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">结果:3 个 batch(按连续块分批)</text>
|
||||
|
||||
<rect x="400" y="258" width="155" height="50" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="1"/>
|
||||
<text x="477" y="276" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">Batch 1</text>
|
||||
<text x="477" y="289" fill="#166534" font-size="8" text-anchor="middle">并发</text>
|
||||
<text x="477" y="302" fill="#166534" font-size="7" text-anchor="middle">read A · glob · bash "ls"</text>
|
||||
|
||||
<line x1="560" y1="283" x2="575" y2="283" stroke="#16a34a" stroke-width="1" marker-end="url(#arrow-g)"/>
|
||||
|
||||
<rect x="580" y="258" width="65" height="50" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="612" y="276" fill="#991b1b" font-size="8" font-weight="600" text-anchor="middle">Batch 2</text>
|
||||
<text x="612" y="289" fill="#991b1b" font-size="8" text-anchor="middle">串行</text>
|
||||
<text x="612" y="302" fill="#991b1b" font-size="7" text-anchor="middle">write B</text>
|
||||
|
||||
<line x1="650" y1="283" x2="665" y2="283" stroke="#16a34a" stroke-width="1" marker-end="url(#arrow-g)"/>
|
||||
|
||||
<rect x="670" y="258" width="55" height="50" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="1"/>
|
||||
<text x="697" y="276" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">Batch 3</text>
|
||||
<text x="697" y="289" fill="#166534" font-size="8" text-anchor="middle">并发</text>
|
||||
<text x="697" y="302" fill="#166534" font-size="7" text-anchor="middle">read C</text>
|
||||
|
||||
<text x="565" y="332" fill="#16a34a" font-size="8" font-weight="600" text-anchor="middle">bash "ls" 是并发安全调用,且和 read/glob 连续,所以留在 Batch 1</text>
|
||||
|
||||
<text x="565" y="366" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">✓ 按输入判断并发安全,不按工具名硬编码</text>
|
||||
<text x="565" y="380" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">✓ 保留原始顺序,只在连续安全块内部并发</text>
|
||||
|
||||
<!-- Bottom Summary -->
|
||||
<rect x="20" y="402" width="720" height="82" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="40" y="424" fill="#1e3a5f" font-size="12" font-weight="600">核心差异</text>
|
||||
<text x="40" y="444" fill="#475569" font-size="10">• 教学版:按 response.content 原始顺序逐个执行,不做并发,也不分 batch</text>
|
||||
<text x="40" y="460" fill="#475569" font-size="10">• CC:按 isConcurrencySafe(input) 判断,并把连续的并发安全调用合成同一个 batch</text>
|
||||
<text x="40" y="476" fill="#475569" font-size="10">• 差异重点:教学版聚焦工具分发;CC 在保持顺序语义的同时优化安全并发</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.7 KiB |
108
s02_tool_use/images/tool-dispatch.en.svg
Normal file
@@ -0,0 +1,108 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 560" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-orange" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="720" height="560" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Tool Use — Loop Unchanged, Just Add Dispatch Mapping</text>
|
||||
|
||||
<!-- ===== s01 (gray, preserved) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s01 Preserved</text>
|
||||
|
||||
<!-- User Input -->
|
||||
<rect x="60" y="86" width="140" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="130" y="105" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">User Query</text>
|
||||
<text x="130" y="121" fill="#64748b" font-size="10" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- Arrow: User → LLM -->
|
||||
<line x1="200" y1="108" x2="268" y2="108" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="270" y="82" width="150" height="52" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="345" y="104" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="345" y="122" fill="#64748b" font-size="10" text-anchor="middle">stop_reason check</text>
|
||||
|
||||
<!-- Arrow: LLM → Decision -->
|
||||
<line x1="345" y1="134" x2="345" y2="162" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Decision Diamond -->
|
||||
<polygon points="345,166 415,196 345,226 275,196" fill="#fff8f0" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="345" y="194" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">tool_use?</text>
|
||||
|
||||
<!-- No → Return -->
|
||||
<line x1="415" y1="196" x2="475" y2="196" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="445" y="189" fill="#16a34a" font-size="9" font-weight="600">No</text>
|
||||
<rect x="477" y="178" width="100" height="36" rx="18" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="527" y="200" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">Return Result</text>
|
||||
|
||||
<!-- Yes → Next Step -->
|
||||
<line x1="345" y1="226" x2="345" y2="260" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow-orange)"/>
|
||||
<text x="356" y="248" fill="#d97706" font-size="9" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ===== s02 New: TOOL_HANDLERS Dispatch Mapping ===== -->
|
||||
<text x="505" y="282" fill="#d97706" font-size="11" font-weight="600">s02 New</text>
|
||||
|
||||
<!-- Dispatch Mapping Outer Box -->
|
||||
<rect x="195" y="268" width="300" height="200" rx="10" fill="#fff7ed" stroke="#d97706" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="345" y="290" fill="#92400e" font-size="12" font-weight="700" text-anchor="middle">TOOL_HANDLERS Dispatch Mapping</text>
|
||||
|
||||
<!-- Arrow into dispatch mapping -->
|
||||
<line x1="345" y1="260" x2="345" y2="268" stroke="#d97706" stroke-width="1.5"/>
|
||||
|
||||
<!-- bash handler -->
|
||||
<rect x="220" y="300" width="120" height="34" rx="6" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.2"/>
|
||||
<text x="280" y="316" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">bash</text>
|
||||
<text x="280" y="328" fill="#64748b" font-size="9" text-anchor="middle">→ run_bash()</text>
|
||||
|
||||
<!-- read_file handler -->
|
||||
<rect x="360" y="300" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="420" y="316" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">read_file</text>
|
||||
<text x="420" y="328" fill="#64748b" font-size="9" text-anchor="middle">→ run_read()</text>
|
||||
|
||||
<!-- write_file handler -->
|
||||
<rect x="220" y="346" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="280" y="362" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">write_file</text>
|
||||
<text x="280" y="374" fill="#64748b" font-size="9" text-anchor="middle">→ run_write()</text>
|
||||
|
||||
<!-- edit_file handler -->
|
||||
<rect x="360" y="346" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="420" y="362" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">edit_file</text>
|
||||
<text x="420" y="374" fill="#64748b" font-size="9" text-anchor="middle">→ run_edit()</text>
|
||||
|
||||
<!-- glob handler -->
|
||||
<rect x="290" y="392" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="350" y="408" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">glob</text>
|
||||
<text x="350" y="420" fill="#64748b" font-size="9" text-anchor="middle">→ run_glob()</text>
|
||||
|
||||
<!-- Arrow: Dispatch Mapping → Back to Messages -->
|
||||
<path d="M 195 368 L 50 368 L 50 108 L 58 108" fill="none" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow-orange)" stroke-dasharray="6,3"/>
|
||||
<text x="28" y="300" fill="#92400e" font-size="9" font-weight="500" transform="rotate(-90, 28, 300)">Append tool_result to messages</text>
|
||||
|
||||
<!-- ===== Legend ===== -->
|
||||
<rect x="60" y="492" width="600" height="52" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="508" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="518" fill="#334155" font-size="10">s01 Preserved (loop, LLM, decision — completely unchanged)</text>
|
||||
<rect x="380" y="508" width="12" height="12" rx="2" fill="#ecfdf5" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="400" y="518" fill="#334155" font-size="10">s02 New (5 tools + dispatch mapping)</text>
|
||||
<text x="80" y="536" fill="#64748b" font-size="10">Only 1 line changed in the loop: run_bash() → TOOL_HANDLERS[block.name]()</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.6 KiB |
108
s02_tool_use/images/tool-dispatch.ja.svg
Normal file
@@ -0,0 +1,108 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 560" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-orange" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="720" height="560" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Tool Use — ループ不変、ディスパッチマッピングを追加</text>
|
||||
|
||||
<!-- ===== s01 (灰色、保持部分) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s01 保持</text>
|
||||
|
||||
<!-- ユーザー入力 -->
|
||||
<rect x="60" y="86" width="140" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="130" y="105" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">ユーザーの質問</text>
|
||||
<text x="130" y="121" fill="#64748b" font-size="10" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- 矢印:ユーザー → LLM -->
|
||||
<line x1="200" y1="108" x2="268" y2="108" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="270" y="82" width="150" height="52" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="345" y="104" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="345" y="122" fill="#64748b" font-size="10" text-anchor="middle">stop_reason 判定</text>
|
||||
|
||||
<!-- 矢印:LLM → 判定 -->
|
||||
<line x1="345" y1="134" x2="345" y2="162" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 判定ダイヤモンド -->
|
||||
<polygon points="345,166 415,196 345,226 275,196" fill="#fff8f0" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="345" y="194" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">tool_use?</text>
|
||||
|
||||
<!-- いいえ → 返却 -->
|
||||
<line x1="415" y1="196" x2="475" y2="196" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="445" y="189" fill="#16a34a" font-size="9" font-weight="600">No</text>
|
||||
<rect x="477" y="178" width="100" height="36" rx="18" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="527" y="200" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
|
||||
<!-- はい → 次のステップ -->
|
||||
<line x1="345" y1="226" x2="345" y2="260" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow-orange)"/>
|
||||
<text x="356" y="248" fill="#d97706" font-size="9" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ===== s02 新規:TOOL_HANDLERS ディスパッチマッピング ===== -->
|
||||
<text x="505" y="282" fill="#d97706" font-size="11" font-weight="600">s02 新規</text>
|
||||
|
||||
<!-- ディスパッチマッピング外枠 -->
|
||||
<rect x="195" y="268" width="300" height="200" rx="10" fill="#fff7ed" stroke="#d97706" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="345" y="290" fill="#92400e" font-size="12" font-weight="700" text-anchor="middle">TOOL_HANDLERS ディスパッチマッピング</text>
|
||||
|
||||
<!-- ディスパッチマッピングへの矢印 -->
|
||||
<line x1="345" y1="260" x2="345" y2="268" stroke="#d97706" stroke-width="1.5"/>
|
||||
|
||||
<!-- bash handler -->
|
||||
<rect x="220" y="300" width="120" height="34" rx="6" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.2"/>
|
||||
<text x="280" y="316" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">bash</text>
|
||||
<text x="280" y="328" fill="#64748b" font-size="9" text-anchor="middle">→ run_bash()</text>
|
||||
|
||||
<!-- read_file handler -->
|
||||
<rect x="360" y="300" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="420" y="316" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">read_file</text>
|
||||
<text x="420" y="328" fill="#64748b" font-size="9" text-anchor="middle">→ run_read()</text>
|
||||
|
||||
<!-- write_file handler -->
|
||||
<rect x="220" y="346" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="280" y="362" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">write_file</text>
|
||||
<text x="280" y="374" fill="#64748b" font-size="9" text-anchor="middle">→ run_write()</text>
|
||||
|
||||
<!-- edit_file handler -->
|
||||
<rect x="360" y="346" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="420" y="362" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">edit_file</text>
|
||||
<text x="420" y="374" fill="#64748b" font-size="9" text-anchor="middle">→ run_edit()</text>
|
||||
|
||||
<!-- glob handler -->
|
||||
<rect x="290" y="392" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="350" y="408" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">glob</text>
|
||||
<text x="350" y="420" fill="#64748b" font-size="9" text-anchor="middle">→ run_glob()</text>
|
||||
|
||||
<!-- 矢印:ディスパッチマッピング → メッセージリストに戻る -->
|
||||
<path d="M 195 368 L 50 368 L 50 108 L 58 108" fill="none" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow-orange)" stroke-dasharray="6,3"/>
|
||||
<text x="28" y="300" fill="#92400e" font-size="9" font-weight="500" transform="rotate(-90, 28, 300)">tool_result を messages に追加</text>
|
||||
|
||||
<!-- ===== 凡例 ===== -->
|
||||
<rect x="60" y="492" width="600" height="52" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="508" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="518" fill="#334155" font-size="10">s01 保持(ループ、LLM、判定 — 完全に不変)</text>
|
||||
<rect x="380" y="508" width="12" height="12" rx="2" fill="#ecfdf5" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="400" y="518" fill="#334155" font-size="10">s02 新規(5 つのツール + ディスパッチマッピング)</text>
|
||||
<text x="80" y="536" fill="#64748b" font-size="10">ループ内で変更されたのは 1 行だけ:run_bash() → TOOL_HANDLERS[block.name]()</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.9 KiB |
108
s02_tool_use/images/tool-dispatch.svg
Normal file
@@ -0,0 +1,108 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 560" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-orange" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="720" height="560" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Tool Use — 循环不变,只加分发映射</text>
|
||||
|
||||
<!-- ===== s01 (灰色,保留部分) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s01 保留</text>
|
||||
|
||||
<!-- 用户输入 -->
|
||||
<rect x="60" y="86" width="140" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="130" y="105" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">用户提问</text>
|
||||
<text x="130" y="121" fill="#64748b" font-size="10" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- 箭头:用户 → LLM -->
|
||||
<line x1="200" y1="108" x2="268" y2="108" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="270" y="82" width="150" height="52" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="345" y="104" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">大模型 (LLM)</text>
|
||||
<text x="345" y="122" fill="#64748b" font-size="10" text-anchor="middle">stop_reason 判断</text>
|
||||
|
||||
<!-- 箭头:LLM → 判断 -->
|
||||
<line x1="345" y1="134" x2="345" y2="162" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 判断菱形 -->
|
||||
<polygon points="345,166 415,196 345,226 275,196" fill="#fff8f0" stroke="#d97706" stroke-width="1.5"/>
|
||||
<text x="345" y="194" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">tool_use?</text>
|
||||
|
||||
<!-- 否 → 返回 -->
|
||||
<line x1="415" y1="196" x2="475" y2="196" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="445" y="189" fill="#16a34a" font-size="9" font-weight="600">否</text>
|
||||
<rect x="477" y="178" width="100" height="36" rx="18" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="527" y="200" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
|
||||
<!-- 是 → 下一步 -->
|
||||
<line x1="345" y1="226" x2="345" y2="260" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow-orange)"/>
|
||||
<text x="356" y="248" fill="#d97706" font-size="9" font-weight="600">是</text>
|
||||
|
||||
<!-- ===== s02 新增:TOOL_HANDLERS 分发映射 ===== -->
|
||||
<text x="505" y="282" fill="#d97706" font-size="11" font-weight="600">s02 新增</text>
|
||||
|
||||
<!-- 分发映射外框 -->
|
||||
<rect x="195" y="268" width="300" height="200" rx="10" fill="#fff7ed" stroke="#d97706" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="345" y="290" fill="#92400e" font-size="12" font-weight="700" text-anchor="middle">TOOL_HANDLERS 分发映射</text>
|
||||
|
||||
<!-- 箭头进入分发映射 -->
|
||||
<line x1="345" y1="260" x2="345" y2="268" stroke="#d97706" stroke-width="1.5"/>
|
||||
|
||||
<!-- bash handler -->
|
||||
<rect x="220" y="300" width="120" height="34" rx="6" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.2"/>
|
||||
<text x="280" y="316" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">bash</text>
|
||||
<text x="280" y="328" fill="#64748b" font-size="9" text-anchor="middle">→ run_bash()</text>
|
||||
|
||||
<!-- read_file handler -->
|
||||
<rect x="360" y="300" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="420" y="316" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">read_file</text>
|
||||
<text x="420" y="328" fill="#64748b" font-size="9" text-anchor="middle">→ run_read()</text>
|
||||
|
||||
<!-- write_file handler -->
|
||||
<rect x="220" y="346" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="280" y="362" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">write_file</text>
|
||||
<text x="280" y="374" fill="#64748b" font-size="9" text-anchor="middle">→ run_write()</text>
|
||||
|
||||
<!-- edit_file handler -->
|
||||
<rect x="360" y="346" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="420" y="362" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">edit_file</text>
|
||||
<text x="420" y="374" fill="#64748b" font-size="9" text-anchor="middle">→ run_edit()</text>
|
||||
|
||||
<!-- glob handler -->
|
||||
<rect x="290" y="392" width="120" height="34" rx="6" fill="#ecfdf5" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="350" y="408" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">glob</text>
|
||||
<text x="350" y="420" fill="#64748b" font-size="9" text-anchor="middle">→ run_glob()</text>
|
||||
|
||||
<!-- 箭头:分发映射 → 回到消息列表 -->
|
||||
<path d="M 195 368 L 50 368 L 50 108 L 58 108" fill="none" stroke="#d97706" stroke-width="1.5" marker-end="url(#arrow-orange)" stroke-dasharray="6,3"/>
|
||||
<text x="28" y="300" fill="#92400e" font-size="9" font-weight="500" transform="rotate(-90, 28, 300)">tool_result 追加到 messages</text>
|
||||
|
||||
<!-- ===== 图例 ===== -->
|
||||
<rect x="60" y="492" width="600" height="52" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="508" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="518" fill="#334155" font-size="10">s01 保留(循环、LLM、判断——完全不变)</text>
|
||||
<rect x="380" y="508" width="12" height="12" rx="2" fill="#ecfdf5" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="400" y="518" fill="#334155" font-size="10">s02 新增(5 个工具 + 分发映射)</text>
|
||||
<text x="80" y="536" fill="#64748b" font-size="10">循环里只改了 1 行:run_bash() → TOOL_HANDLERS[block.name]()</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.6 KiB |
232
s03_permission/README.en.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# s03: Permission — Check Permissions Before Execution
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → `s03` → [s04](../s04_hooks/) → s05 → ... → s20
|
||||
> *"Check permissions before executing"* — The permission pipeline decides which operations need approval.
|
||||
>
|
||||
> **Harness Layer**: Permission — a gate before tool execution.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
s02's Agent has 5 tools. File tools are protected by `safe_path`, but bash is unrestricted. Ask it to "clean up the project," and it might run `rm -rf /`.
|
||||
|
||||
Safety can't rely on trusting the model — it needs code: a check before every tool execution.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
s02's loop is fully preserved. The only change is inserting `check_permission()` before tool execution — each tool call passes through three gates in a fixed order: hard deny first, then soft ask, and if neither matches, allow.
|
||||
|
||||
The three gates correspond to three decisions:
|
||||
|
||||
| Gate | Purpose | On Match |
|
||||
|------|---------|----------|
|
||||
| 1. Deny List | Permanently forbidden operations (`rm -rf /`, `sudo`) | Denied immediately, not executed |
|
||||
| 2. Rule Matching | Context-dependent operations (writing outside workspace, `rm` files) | Passed to Gate 3 |
|
||||
| 3. User Approval | After Gate 2 matches, pauses for user confirmation | User decides allow or deny |
|
||||
|
||||
None of the three gates match → execute directly. Most routine operations take this path.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||

|
||||
|
||||
**Gate 1**: A hard deny list. Check first; if matched, return a block message. (Teaching demo: simple string matching is not a reliable security mechanism — command variants and shell expansion can bypass it. CC's approach is in the appendix.)
|
||||
|
||||
```python
|
||||
DENY_LIST = [
|
||||
"rm -rf /", "sudo", "shutdown", "reboot",
|
||||
"mkfs", "dd if=", "> /dev/sda",
|
||||
]
|
||||
|
||||
def check_deny_list(command: str) -> str | None:
|
||||
for pattern in DENY_LIST:
|
||||
if pattern in command:
|
||||
return f"Blocked: '{pattern}' is on the deny list"
|
||||
return None
|
||||
```
|
||||
|
||||
**Gate 2**: Rule matching — describes "when to ask the user." Each rule specifies a tool and a check condition.
|
||||
|
||||
```python
|
||||
PERMISSION_RULES = [
|
||||
{
|
||||
"tools": ["write_file", "edit_file"],
|
||||
"check": lambda args: not (WORKDIR / args.get("path", "")).resolve().is_relative_to(WORKDIR),
|
||||
"message": "Writing outside workspace",
|
||||
},
|
||||
{
|
||||
"tools": ["bash"],
|
||||
"check": lambda args: any(kw in args.get("command", "") for kw in ["rm ", "> /etc/", "chmod 777"]),
|
||||
"message": "Potentially destructive command",
|
||||
},
|
||||
]
|
||||
|
||||
def check_rules(tool_name: str, args: dict) -> str | None:
|
||||
for rule in PERMISSION_RULES:
|
||||
if tool_name in rule["tools"] and rule["check"](args):
|
||||
return rule["message"]
|
||||
return None
|
||||
```
|
||||
|
||||
**Gate 3**: After a rule matches, pause for user input.
|
||||
|
||||
```python
|
||||
def ask_user(tool_name: str, args: dict, reason: str) -> str:
|
||||
print(f"\n⚠ {reason}")
|
||||
print(f" Tool: {tool_name}({args})")
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
return "allow" if choice in ("y", "yes") else "deny"
|
||||
```
|
||||
|
||||
**All three gates chained together**, inserted before tool execution:
|
||||
|
||||
```python
|
||||
def check_permission(block) -> bool:
|
||||
# Gate 1: Hard deny
|
||||
if block.name == "bash":
|
||||
reason = check_deny_list(block.input.get("command", ""))
|
||||
if reason:
|
||||
print(f"\n⛔ {reason}")
|
||||
return False
|
||||
|
||||
# Gate 2 + 3: Rule matching → User approval
|
||||
reason = check_rules(block.name, block.input)
|
||||
if reason:
|
||||
decision = ask_user(block.name, block.input, reason)
|
||||
if decision == "deny":
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
# In agent_loop — s02's loop with just one line added:
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
if not check_permission(block): # ← NEW
|
||||
results.append({... "content": "Permission denied."})
|
||||
continue
|
||||
output = TOOL_HANDLERS[block.name](**block.input) # s02 original
|
||||
results.append(...)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Changes from s02
|
||||
|
||||
| Component | Before (s02) | After (s03) |
|
||||
|-----------|-------------|-------------|
|
||||
| Security model | None (trust the model) | Three-gate permission pipeline |
|
||||
| New functions | — | check_deny_list, check_rules, ask_user, check_permission |
|
||||
| Loop | Executes all tools directly | Inserts check_permission() before execution |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s03_permission/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Create a file called test.txt in the current directory` (should pass through)
|
||||
2. `Delete all temporary files in /tmp` (bash + rm triggers Gate 2)
|
||||
3. `What files are in the current directory?` (read-only, all pass)
|
||||
4. `Try to write a file to /etc/something` (writing outside workspace triggers Gate 2)
|
||||
|
||||
What to watch for: Which operations pass through? Which need your confirmation? Which are denied outright?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
Permission checks are in place — but every check is hardcoded as `check_permission()` inside the loop. What if you want to add logging before and after each tool execution? What if you want to auto-trigger a git commit after certain operations? Scattering this extension logic throughout the loop makes it bloat.
|
||||
|
||||
→ s04 Hooks: Add hooks to the loop. Extension logic hangs on hooks; the loop stays clean.
|
||||
|
||||
<details>
|
||||
<summary>Dive into CC Source Code</summary>
|
||||
|
||||
> The following is based on a review of CC source code `types/permissions.ts`, `utils/permissions/permissions.ts`, `toolExecution.ts`, `utils/permissions/yoloClassifier.ts`, `tools/AgentTool/forkSubagent.ts`.
|
||||
|
||||
### 1. PermissionResult: Not 3, but 4
|
||||
|
||||
The teaching version's three gates (deny → ask → allow) don't fully correspond to CC. CC's `PermissionResult` has 4 behaviors (`types/permissions.ts:241-266`):
|
||||
|
||||
| behavior | Meaning | Teaching Version Equivalent |
|
||||
|----------|---------|---------------------------|
|
||||
| `allow` | Allow directly | Gate 3 passes |
|
||||
| `deny` | Deny directly | Gate 1 matches |
|
||||
| `ask` | Show dialog to user | Gate 2 matches |
|
||||
| `passthrough` | Tool doesn't express opinion, passes to generic pipeline | Not in teaching version |
|
||||
|
||||
### 2. Production Verification Stages
|
||||
|
||||
CC's tool calls don't go through three gates — they go through multiple stages distributed across `checkPermissionsAndCallTool()` (`toolExecution.ts:599-1745`), hooks, `hasPermissionsToUseToolInner()` (`utils/permissions/permissions.ts:1158-1310`), and classifier logic:
|
||||
|
||||
1. **Zod schema validation** (`toolExecution.ts:614-680`) — parameter type checking
|
||||
2. **validateInput()** (`toolExecution.ts:682-733`) — tool-level semantic validation
|
||||
3. **backfillObservableInput()** (`toolExecution.ts:784`) — backfill legacy fields
|
||||
4. **PreToolUse hooks** (`toolExecution.ts:800-862`) — hooks can return allow/deny/ask
|
||||
5. **resolveHookPermissionDecision()** (`toolExecution.ts:921-931`) — coordinate hook + pipeline decisions
|
||||
6. **hasPermissionsToUseToolInner()** (`permissions.ts:1158-1310`) — multi-layer rule check:
|
||||
- Entire tool disabled by deny rule → `deny`
|
||||
- Entire tool flagged by ask rule → `ask`
|
||||
- `tool.checkPermissions()` tool's own judgment
|
||||
- Tool itself returns deny → `deny`
|
||||
- `requiresUserInteraction()` → `ask`
|
||||
- Content-related ask rules → `ask` (not bypassable)
|
||||
- Security check violation → `ask` (not bypassable)
|
||||
- bypassPermissions mode → `allow`
|
||||
- Entire tool allowed by allow rule → `allow`
|
||||
- passthrough → converted to `ask`
|
||||
|
||||
### 3. Deny List: Not One File, but 8 Sources
|
||||
|
||||
CC doesn't have a single deny list. Permission rules come from 8 sources (`types/permissions.ts:54-62`):
|
||||
|
||||
| Source | Configuration Location |
|
||||
|--------|----------------------|
|
||||
| `userSettings` | `~/.claude/settings.json` |
|
||||
| `projectSettings` | `.claude/settings.json` |
|
||||
| `localSettings` | `settings.local.json` |
|
||||
| `flagSettings` | Feature flags |
|
||||
| `policySettings` | Enterprise management policy |
|
||||
| `cliArg` | `--allowedTools` / `--deniedTools` |
|
||||
| `command` | Inline command |
|
||||
| `session` | In-session temporary authorization |
|
||||
|
||||
Each rule format: `{ toolName: "Bash", ruleBehavior: "deny", ruleContent: "npm publish:*" }`. Rules from multiple sources are merged, with higher-priority sources overriding lower ones (low to high: user < project < local < flag < policy, plus cliArg, command, session).
|
||||
|
||||
### 4. What is isDestructive()
|
||||
|
||||
In CC, `isDestructive` (`Tool.ts:405-406`) is **purely for UI display** — showing a `[destructive]` label in the tool list. It doesn't participate in permission decisions. All tools return `false` by default. Only ExitWorktree (on remove) and MCP tools (depending on `annotations.destructiveHint`) override it.
|
||||
|
||||
### 5. YoloClassifier (Auto-Approval)
|
||||
|
||||
In CC's auto mode, it doesn't pop a dialog every time. `classifyYoloAction` (`utils/permissions/yoloClassifier.ts:1012`) sends the tool call + conversation context to a classifier LLM to judge safety. It first tries acceptEdits mode simulation (`permissions.ts:620-656`, if acceptEdits allows → auto-approve), then checks the safe tool whitelist (`permissions.ts:658-686`), and finally calls the classifier. If the classifier rejects too many times in a row → falls back to manual approval.
|
||||
|
||||
### 6. Permission Bubbling
|
||||
|
||||
A sub-Agent's (forked via AgentTool) `permissionMode` is set to `'bubble'` (`forkSubagent.ts:50`). This means permission dialogs **bubble up to the parent Agent's terminal**, rather than being silently denied in the sub-Agent. The Bash classifier continues running during this process — displaying the permission dialog while judging in the background whether auto-approval is possible.
|
||||
|
||||
### The Teaching Version's Simplification Is Intentional
|
||||
|
||||
- Multi-stage pipeline → 3 gates: dramatically lower barrier to understanding
|
||||
- 8 rule sources → 1 local DENY_LIST: manageable concept count
|
||||
- isDestructive → omitted (teaching version has no UI layer, and it doesn't participate in permission decisions in CC either)
|
||||
- YoloClassifier → omitted (depends on additional LLM calls and telemetry)
|
||||
- Permission bubbling → omitted (s15 covers multi-Agent)
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
232
s03_permission/README.ja.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# s03: Permission — 実行前に権限を判断する
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → `s03` → [s04](../s04_hooks/) → s05 → ... → s20
|
||||
> *"ツール実行前に権限を判断"* — 権限パイプラインは、どの操作に承認が必要かを決める。
|
||||
>
|
||||
> **Harness レイヤー**: 権限 — ツール実行前に一つのゲートを追加。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
s02 の Agent は 5 つのツールを持つ。file tools は `safe_path` で保護されるが、bash は制限なし。「プロジェクトを掃除して」と頼むと、`rm -rf /` を実行しかねない。
|
||||
|
||||
安全性はモデルを信頼することではなく、コードに頼る — ツール実行前に判断を挟む。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
s02 のループは完全に維持される。唯一の変更は、ツール実行前に `check_permission()` を挿入すること — 各ツール呼び出しは 3 つのゲートを固定順序で通過する:ハード拒否が最優先、次にソフト確認、どちらも一致しなければ許可。
|
||||
|
||||
3 つのゲートは 3 つの決定に対応する:
|
||||
|
||||
| ゲート | 役割 | 一致時 |
|
||||
|--------|------|--------|
|
||||
| 1. 拒否リスト | 常に禁止される操作(`rm -rf /`、`sudo`) | 即座に拒否、実行しない |
|
||||
| 2. ルールマッチング | コンテキスト依存の操作(作業ディレクトリ外への書き込み、`rm` ファイル) | ゲート 3 へ |
|
||||
| 3. ユーザー承認 | ゲート 2 が一致した場合、ユーザー確認を待機 | ユーザーが許可または拒否を決定 |
|
||||
|
||||
3 つのゲートのどれにも一致しない → 直接実行。日常の操作の大部分はこの経路を通る。
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||

|
||||
|
||||
**ゲート 1**:ハード拒否リスト。最初に確認し、一致すればブロックメッセージを返す。(教育デモ:単純な文字列マッチングは信頼できるセキュリティ機構ではない — コマンドの変種やシェル展開で回避される可能性がある。CC のアプローチは付録を参照。)
|
||||
|
||||
```python
|
||||
DENY_LIST = [
|
||||
"rm -rf /", "sudo", "shutdown", "reboot",
|
||||
"mkfs", "dd if=", "> /dev/sda",
|
||||
]
|
||||
|
||||
def check_deny_list(command: str) -> str | None:
|
||||
for pattern in DENY_LIST:
|
||||
if pattern in command:
|
||||
return f"Blocked: '{pattern}' is on the deny list"
|
||||
return None
|
||||
```
|
||||
|
||||
**ゲート 2**:ルールマッチング — 「いつユーザーに聞くべきか」を記述する。各ルールはツールとチェック条件を指定する。
|
||||
|
||||
```python
|
||||
PERMISSION_RULES = [
|
||||
{
|
||||
"tools": ["write_file", "edit_file"],
|
||||
"check": lambda args: not (WORKDIR / args.get("path", "")).resolve().is_relative_to(WORKDIR),
|
||||
"message": "Writing outside workspace",
|
||||
},
|
||||
{
|
||||
"tools": ["bash"],
|
||||
"check": lambda args: any(kw in args.get("command", "") for kw in ["rm ", "> /etc/", "chmod 777"]),
|
||||
"message": "Potentially destructive command",
|
||||
},
|
||||
]
|
||||
|
||||
def check_rules(tool_name: str, args: dict) -> str | None:
|
||||
for rule in PERMISSION_RULES:
|
||||
if tool_name in rule["tools"] and rule["check"](args):
|
||||
return rule["message"]
|
||||
return None
|
||||
```
|
||||
|
||||
**ゲート 3**:ルールが一致した後、ユーザー入力を待機。
|
||||
|
||||
```python
|
||||
def ask_user(tool_name: str, args: dict, reason: str) -> str:
|
||||
print(f"\n⚠ {reason}")
|
||||
print(f" Tool: {tool_name}({args})")
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
return "allow" if choice in ("y", "yes") else "deny"
|
||||
```
|
||||
|
||||
**3 つのゲートを直列に接続**、ツール実行前に挿入する:
|
||||
|
||||
```python
|
||||
def check_permission(block) -> bool:
|
||||
# ゲート 1: ハード拒否
|
||||
if block.name == "bash":
|
||||
reason = check_deny_list(block.input.get("command", ""))
|
||||
if reason:
|
||||
print(f"\n⛔ {reason}")
|
||||
return False
|
||||
|
||||
# ゲート 2 + 3: ルールマッチング → ユーザー承認
|
||||
reason = check_rules(block.name, block.input)
|
||||
if reason:
|
||||
decision = ask_user(block.name, block.input, reason)
|
||||
if decision == "deny":
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
# agent_loop で — s02 のループに 1 行追加するだけ:
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
if not check_permission(block): # ← 新規
|
||||
results.append({... "content": "Permission denied."})
|
||||
continue
|
||||
output = TOOL_HANDLERS[block.name](**block.input) # s02 既存
|
||||
results.append(...)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## s02 からの変更点
|
||||
|
||||
| コンポーネント | 変更前 (s02) | 変更後 (s03) |
|
||||
|---------------|-------------|-------------|
|
||||
| セキュリティモデル | なし(モデルを信頼) | 3 ゲート権限パイプライン |
|
||||
| 新規関数 | — | check_deny_list, check_rules, ask_user, check_permission |
|
||||
| ループ | すべてのツールを直接実行 | 実行前に check_permission() を挿入 |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s03_permission/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみよう:
|
||||
|
||||
1. `Create a file called test.txt in the current directory`(そのまま通過するはず)
|
||||
2. `Delete all temporary files in /tmp`(bash + rm でゲート 2 が発動)
|
||||
3. `What files are in the current directory?`(読み取り専用、すべて通過)
|
||||
4. `Try to write a file to /etc/something`(作業ディレクトリ外への書き込みでゲート 2 が発動)
|
||||
|
||||
観察のポイント:どの操作がそのまま通過するか? どれに確認が必要か? どれが即座に拒否されるか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
権限チェックは実装された — しかし、毎回ループ内に `check_permission()` をハードコードしている。ツール実行の前後にログを追加したい場合は? 特定の操作後に自動的に git commit をトリガーしたい場合は? このような拡張ロジックがループ内に散らばると、ループはすぐに膨張する。
|
||||
|
||||
→ s04 Hooks:ループにフックを追加する。拡張ロジックはフックにぶら下げ、ループはクリーンに保つ。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードを深掘り</summary>
|
||||
|
||||
> 以下は CC ソースコード `types/permissions.ts`、`utils/permissions/permissions.ts`、`toolExecution.ts`、`utils/permissions/yoloClassifier.ts`、`tools/AgentTool/forkSubagent.ts` の検証に基づく。
|
||||
|
||||
### 一、PermissionResult:3 種ではなく、4 種
|
||||
|
||||
教育版の 3 つのゲート(deny → ask → allow)は CC と完全には対応しない。CC の `PermissionResult` には 4 つの behavior がある(`types/permissions.ts:241-266`):
|
||||
|
||||
| behavior | 意味 | 教育版の対応 |
|
||||
|----------|------|-------------|
|
||||
| `allow` | 直接許可 | ゲート 3 通過 |
|
||||
| `deny` | 直接拒否 | ゲート 1 一致 |
|
||||
| `ask` | ユーザーにダイアログを表示 | ゲート 2 一致 |
|
||||
| `passthrough` | ツールが意見を表明せず、汎用パイプラインに委ねる | 教育版にはなし |
|
||||
|
||||
### 二、本番環境の検証段階
|
||||
|
||||
CC のツール呼び出しは 3 つのゲートを通るのではなく、`checkPermissionsAndCallTool()`(`toolExecution.ts:599-1745`)、hooks、`hasPermissionsToUseToolInner()`(`utils/permissions/permissions.ts:1158-1310`)、classifier ロジックに分散する複数の段階を経る:
|
||||
|
||||
1. **Zod schema 検証**(`toolExecution.ts:614-680`)— パラメータの型チェック
|
||||
2. **validateInput()**(`toolExecution.ts:682-733`)— ツールレベルの意味的検証
|
||||
3. **backfillObservableInput()**(`toolExecution.ts:784`)— レガシーフィールドの補完
|
||||
4. **PreToolUse hooks**(`toolExecution.ts:800-862`)— フックが allow/deny/ask を返す
|
||||
5. **resolveHookPermissionDecision()**(`toolExecution.ts:921-931`)— フック + パイプラインの決定を調整
|
||||
6. **hasPermissionsToUseToolInner()**(`permissions.ts:1158-1310`)— 多層ルールチェック:
|
||||
- ツール全体が deny rule で無効 → `deny`
|
||||
- ツール全体が ask rule でマーク → `ask`
|
||||
- `tool.checkPermissions()` ツール自身の判断
|
||||
- ツール自身が deny を返す → `deny`
|
||||
- `requiresUserInteraction()` → `ask`
|
||||
- コンテンツ関連の ask ルール → `ask`(バイパス不可)
|
||||
- セキュリティチェック違反 → `ask`(バイパス不可)
|
||||
- bypassPermissions モード → `allow`
|
||||
- ツール全体が allow rule で許可 → `allow`
|
||||
- passthrough → `ask` に変換
|
||||
|
||||
### 三、拒否リスト:1 つのファイルではなく、8 つのソース
|
||||
|
||||
CC には単一の deny list はない。権限ルールは 8 つのソースから来る(`types/permissions.ts:54-62`):
|
||||
|
||||
| ソース | 設定場所 |
|
||||
|--------|---------|
|
||||
| `userSettings` | `~/.claude/settings.json` |
|
||||
| `projectSettings` | `.claude/settings.json` |
|
||||
| `localSettings` | `settings.local.json` |
|
||||
| `flagSettings` | フィーチャーフラグ |
|
||||
| `policySettings` | 企業管理ポリシー |
|
||||
| `cliArg` | `--allowedTools` / `--deniedTools` |
|
||||
| `command` | インラインコマンド |
|
||||
| `session` | セッション内一時承認 |
|
||||
|
||||
各ルールの形式:`{ toolName: "Bash", ruleBehavior: "deny", ruleContent: "npm publish:*" }`。複数ソースのルールは統合され、高優先度ソースが低優先度を上書きする(低→高:user < project < local < flag < policy、さらに cliArg、command、session)。
|
||||
|
||||
### 四、isDestructive() とは
|
||||
|
||||
CC では `isDestructive`(`Tool.ts:405-406`)は**純粋に UI 表示用** — ツール一覧に `[destructive]` ラベルを表示するだけ。権限決定には参加しない。デフォルトではすべてのツールが `false` を返す。ExitWorktree(remove 時)と MCP ツール(`annotations.destructiveHint` に依存)のみがオーバーライドする。
|
||||
|
||||
### 五、YoloClassifier(自動承認)
|
||||
|
||||
CC の auto モードでは、毎回ダイアログを表示するわけではない。`classifyYoloAction`(`utils/permissions/yoloClassifier.ts:1012`)はツール呼び出し + 会話コンテキストを分類器 LLM に送って安全性を判断する。まず acceptEdits モードのシミュレーションを試み(`permissions.ts:620-656`、acceptEdits が許可すれば → 自動承認)、次にセーフツールホワイトリストを確認し(`permissions.ts:658-686`)、最後に分類器を呼び出す。分類器が連続して拒否しすぎた場合 → 手動承認にフォールバック。
|
||||
|
||||
### 六、権限バブリング
|
||||
|
||||
サブ Agent(AgentTool 経由でフォークされたもの)の `permissionMode` は `'bubble'` に設定される(`forkSubagent.ts:50`)。これは権限ダイアログが**親 Agent のターミナルにバブルアップ**することを意味する。サブ Agent で黙って拒否されるのではない。Bash 分類器はこの過程で引き続き実行され — 権限ダイアログを表示しつつ、バックグラウンドで自動承認可能か判断する。
|
||||
|
||||
### 教育版の単純化は意図的
|
||||
|
||||
- 多段階パイプライン → 3 ゲート:理解のハードルが大幅に下がる
|
||||
- 8 ルールソース → 1 つのローカル DENY_LIST:概念量を制御可能
|
||||
- isDestructive → 省略(教育版には UI レイヤーがなく、CC でも権限決定には参加しない)
|
||||
- YoloClassifier → 省略(追加の LLM 呼び出しとテレメトリに依存)
|
||||
- 権限バブリング → 省略(s15 でマルチ Agent を扱う)
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
232
s03_permission/README.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# s03: Permission — 执行前做权限判断
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → `s03` → [s04](../s04_hooks/) → s05 → ... → s20
|
||||
> *"工具执行前先做权限判断"* — 权限管线决定哪些操作需要审批。
|
||||
>
|
||||
> **Harness 层**: 权限 — 在工具执行前加一道门。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
s02 的 Agent 有 5 个工具。file tools 受 `safe_path` 保护,但 bash 不受限制。让它"清理一下项目",可能执行 `rm -rf /`。
|
||||
|
||||
安全不能靠信任模型,要靠代码——在工具执行之前做判断。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
s02 的循环完全保留。唯一的变动在工具执行前插入 `check_permission()`——每个工具调用经过三道闸门,顺序固定:硬拒绝优先,软询问次之,都没命中就放行。
|
||||
|
||||
三道闸门对应三种决策:
|
||||
|
||||
| 闸门 | 作用 | 命中后 |
|
||||
|------|------|--------|
|
||||
| 1. 拒绝列表 | 永远禁止的操作(`rm -rf /`、`sudo`) | 直接拒绝,不执行 |
|
||||
| 2. 规则匹配 | 取决于上下文的操作(写工作区外、`rm` 文件) | 交给闸门 3 |
|
||||
| 3. 用户审批 | 闸门 2 命中后,暂停等用户确认 | 用户决定允许或拒绝 |
|
||||
|
||||
三道都没命中 → 直接执行。大部分日常操作走这条路。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||

|
||||
|
||||
**闸门 1**:一张硬拒绝表,先查,命中就返回阻止信息。(教学示意:简单字符串匹配不是可靠安全机制,命令变体和 shell 展开可能绕过。CC 的做法见附录。)
|
||||
|
||||
```python
|
||||
DENY_LIST = [
|
||||
"rm -rf /", "sudo", "shutdown", "reboot",
|
||||
"mkfs", "dd if=", "> /dev/sda",
|
||||
]
|
||||
|
||||
def check_deny_list(command: str) -> str | None:
|
||||
for pattern in DENY_LIST:
|
||||
if pattern in command:
|
||||
return f"Blocked: '{pattern}' is on the deny list"
|
||||
return None
|
||||
```
|
||||
|
||||
**闸门 2**:规则匹配——描述"什么时候需要问用户"。每条规则指定工具和检查条件。
|
||||
|
||||
```python
|
||||
PERMISSION_RULES = [
|
||||
{
|
||||
"tools": ["write_file", "edit_file"],
|
||||
"check": lambda args: not (WORKDIR / args.get("path", "")).resolve().is_relative_to(WORKDIR),
|
||||
"message": "Writing outside workspace",
|
||||
},
|
||||
{
|
||||
"tools": ["bash"],
|
||||
"check": lambda args: any(kw in args.get("command", "") for kw in ["rm ", "> /etc/", "chmod 777"]),
|
||||
"message": "Potentially destructive command",
|
||||
},
|
||||
]
|
||||
|
||||
def check_rules(tool_name: str, args: dict) -> str | None:
|
||||
for rule in PERMISSION_RULES:
|
||||
if tool_name in rule["tools"] and rule["check"](args):
|
||||
return rule["message"]
|
||||
return None
|
||||
```
|
||||
|
||||
**闸门 3**:规则命中后,暂停等用户输入。
|
||||
|
||||
```python
|
||||
def ask_user(tool_name: str, args: dict, reason: str) -> str:
|
||||
print(f"\n⚠ {reason}")
|
||||
print(f" Tool: {tool_name}({args})")
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
return "allow" if choice in ("y", "yes") else "deny"
|
||||
```
|
||||
|
||||
**三道闸门串在一起**,插在工具执行之前:
|
||||
|
||||
```python
|
||||
def check_permission(block) -> bool:
|
||||
# 闸门 1: 硬拒绝
|
||||
if block.name == "bash":
|
||||
reason = check_deny_list(block.input.get("command", ""))
|
||||
if reason:
|
||||
print(f"\n⛔ {reason}")
|
||||
return False
|
||||
|
||||
# 闸门 2 + 3: 规则匹配 → 用户审批
|
||||
reason = check_rules(block.name, block.input)
|
||||
if reason:
|
||||
decision = ask_user(block.name, block.input, reason)
|
||||
if decision == "deny":
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
# 在 agent_loop 中——s02 的循环只加了一行:
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
if not check_permission(block): # ← 新增
|
||||
results.append({... "content": "Permission denied."})
|
||||
continue
|
||||
output = TOOL_HANDLERS[block.name](**block.input) # s02 原有
|
||||
results.append(...)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 相对 s02 的变更
|
||||
|
||||
| 组件 | 之前 (s02) | 之后 (s03) |
|
||||
|------|-----------|-----------|
|
||||
| 安全模型 | 无(信任模型) | 三道闸门权限管线 |
|
||||
| 新函数 | — | check_deny_list, check_rules, ask_user, check_permission |
|
||||
| 循环 | 直接执行所有工具 | 执行前插入 check_permission() |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s03_permission/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Create a file called test.txt in the current directory`(应该直接通过)
|
||||
2. `Delete all temporary files in /tmp`(bash + rm 会触发闸门 2)
|
||||
3. `What files are in the current directory?`(只读,全部通过)
|
||||
4. `Try to write a file to /etc/something`(写工作区外,触发闸门 2)
|
||||
|
||||
观察重点:哪些操作直接通过?哪些需要你确认?哪些被直接拒绝?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
权限检查做了——但每次都在循环里硬编码 `check_permission()`。如果我想在每次工具执行前后加日志?如果想在某些操作后自动触发 git commit?这些扩展逻辑散落在 loop 里,循环很快就会膨胀。
|
||||
|
||||
s04 Hooks → 给循环加钩子,扩展逻辑挂在钩子上,循环保持干净。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `types/permissions.ts`、`utils/permissions/permissions.ts`、`toolExecution.ts`、`utils/permissions/yoloClassifier.ts`、`tools/AgentTool/forkSubagent.ts` 的核查。
|
||||
|
||||
### 一、PermissionResult:不是 3 种,是 4 种
|
||||
|
||||
教学版的三道闸门(deny → ask → allow)和 CC 不完全对应。CC 的 `PermissionResult` 有 4 个 behavior(`types/permissions.ts:241-266`):
|
||||
|
||||
| behavior | 含义 | 教学版对应 |
|
||||
|----------|------|-----------|
|
||||
| `allow` | 直接允许 | 闸门 3 通过 |
|
||||
| `deny` | 直接拒绝 | 闸门 1 命中 |
|
||||
| `ask` | 弹出对话框问用户 | 闸门 2 命中 |
|
||||
| `passthrough` | 工具不表态,交给通用管线决定 | 教学版无 |
|
||||
|
||||
### 二、生产版的验证阶段
|
||||
|
||||
CC 的工具调用不是经过三道闸门,而是经过多个阶段,分布在 `checkPermissionsAndCallTool()`(`toolExecution.ts:599-1745`)、hooks、`hasPermissionsToUseToolInner()`(`utils/permissions/permissions.ts:1158-1310`)和 classifier 逻辑里:
|
||||
|
||||
1. **Zod schema 验证**(`toolExecution.ts:614-680`)— 参数类型检查
|
||||
2. **validateInput()**(`toolExecution.ts:682-733`)— 工具级语义验证
|
||||
3. **backfillObservableInput()**(`toolExecution.ts:784`)— 补全遗留字段
|
||||
4. **PreToolUse hooks**(`toolExecution.ts:800-862`)— 钩子可以返回 allow/deny/ask
|
||||
5. **resolveHookPermissionDecision()**(`toolExecution.ts:921-931`)— 协调钩子+管线决策
|
||||
6. **hasPermissionsToUseToolInner()**(`permissions.ts:1158-1310`)— 多层规则检查:
|
||||
- 整个工具被 deny rule 禁用 → `deny`
|
||||
- 整个工具被 ask rule 标记 → `ask`
|
||||
- `tool.checkPermissions()` 工具自己的判断
|
||||
- 工具自己返回 deny → `deny`
|
||||
- `requiresUserInteraction()` → `ask`
|
||||
- 内容相关的 ask 规则 → `ask`(不可绕过)
|
||||
- 安全检查违规 → `ask`(不可绕过)
|
||||
- bypassPermissions 模式 → `allow`
|
||||
- 整个工具被 allow rule 放行 → `allow`
|
||||
- passthrough → 转为 `ask`
|
||||
|
||||
### 三、拒绝列表:不是一个文件,是 8 个来源
|
||||
|
||||
CC 没有单一的 deny list。权限规则来自 8 个来源(`types/permissions.ts:54-62`):
|
||||
|
||||
| 来源 | 配置位置 |
|
||||
|------|---------|
|
||||
| `userSettings` | `~/.claude/settings.json` |
|
||||
| `projectSettings` | `.claude/settings.json` |
|
||||
| `localSettings` | `settings.local.json` |
|
||||
| `flagSettings` | Feature flags |
|
||||
| `policySettings` | 企业管理策略 |
|
||||
| `cliArg` | `--allowedTools` / `--deniedTools` |
|
||||
| `command` | 内联命令 |
|
||||
| `session` | 会话内临时授权 |
|
||||
|
||||
每条规则格式:`{ toolName: "Bash", ruleBehavior: "deny", ruleContent: "npm publish:*" }`。多个来源的规则合并,高优先级来源覆盖低优先级(从低到高:user < project < local < flag < policy,加上 cliArg、command、session)。
|
||||
|
||||
### 四、isDestructive() 是什么
|
||||
|
||||
CC 中 `isDestructive`(`Tool.ts:405-406`)**纯粹是 UI 展示用的**——在工具列表里显示 `[destructive]` 标签。它不参与权限决策。默认所有工具都返回 `false`。只有 ExitWorktree(remove 时)和 MCP 工具(依赖 `annotations.destructiveHint`)覆写了它。
|
||||
|
||||
### 五、YoloClassifier(自动审批)
|
||||
|
||||
CC 的 auto 模式下,不会每次都弹对话框。`classifyYoloAction`(`utils/permissions/yoloClassifier.ts:1012`)把工具调用 + 对话上下文发给一个分类器 LLM 判断是否安全。先尝试 acceptEdits 模式模拟(`permissions.ts:620-656`,如果 acceptEdits 允许 → 直接批准),再查安全工具白名单(`permissions.ts:658-686`),最后才调分类器。分类器连续拒绝太多次 → 回退到人工审批。
|
||||
|
||||
### 六、权限冒泡
|
||||
|
||||
子 Agent(通过 AgentTool fork 出来的)的 `permissionMode` 设为 `'bubble'`(`forkSubagent.ts:50`)。意思是权限弹窗**冒泡到父 Agent 的终端**,而不是在子 Agent 里静默拒绝。Bash 分类器在这个过程中继续跑——给权限对话框显示的同时在后台判断是否可以自动批准。
|
||||
|
||||
### 教学版的简化是刻意的
|
||||
|
||||
- 多阶段管线 → 3 道闸门:理解门槛大幅降低
|
||||
- 8 个规则来源 → 1 个本地 DENY_LIST:概念量可控
|
||||
- isDestructive → 忽略(教学版没有 UI 层,CC 里它也不参与权限决策)
|
||||
- YoloClassifier → 省略(依赖于额外的 LLM 调用和遥测系统)
|
||||
- 权限冒泡 → 省略(s15 才涉及多 Agent)
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
251
s03_permission/code.py
Normal file
@@ -0,0 +1,251 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s03_permission.py - Permission System
|
||||
|
||||
Three gates inserted before tool execution:
|
||||
|
||||
Gate 1: Hard deny list (rm -rf /, sudo, ...)
|
||||
Gate 2: Rule matching (write outside workspace? destructive cmd?)
|
||||
Gate 3: User approval (pause and wait for confirmation)
|
||||
|
||||
+-------+ +--------+ +--------+ +--------+ +------+
|
||||
| Tool | -> | Gate 1 | -> | Gate 2 | -> | Gate 3 | -> | Exec |
|
||||
| call | | deny? | | match? | | allow? | | |
|
||||
+-------+ +--------+ +--------+ +--------+ +------+
|
||||
| | | |
|
||||
v v v v
|
||||
(normal) (blocked) (ask user) (user says no?)
|
||||
|
||||
Only one line added to the agent loop:
|
||||
|
||||
if not check_permission(block):
|
||||
continue
|
||||
|
||||
Builds on s02 (multi-tool). Usage:
|
||||
|
||||
python s03_permission/code.py
|
||||
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
|
||||
"""
|
||||
|
||||
import os, subprocess
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
readline.parse_and_bind('set input-meta on')
|
||||
readline.parse_and_bind('set output-meta on')
|
||||
readline.parse_and_bind('set convert-meta off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
SYSTEM = f"You are a coding agent at {WORKDIR}. All destructive operations require user approval."
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02 (unchanged): Tool Implementations
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text:
|
||||
return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02 (unchanged): Tool Definitions & Dispatch
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob,
|
||||
}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s03: Three-Gate Permission Pipeline
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
# Gate 1: Hard deny list — always forbidden
|
||||
DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if=", "> /dev/sda"]
|
||||
|
||||
def check_deny_list(command: str) -> str | None:
|
||||
for pattern in DENY_LIST:
|
||||
if pattern in command:
|
||||
return f"Blocked: '{pattern}' is on the deny list"
|
||||
return None
|
||||
|
||||
|
||||
# Gate 2: Rule matching — context-dependent checks
|
||||
PERMISSION_RULES = [
|
||||
{"tools": ["write_file", "edit_file"],
|
||||
"check": lambda args: not (WORKDIR / args.get("path", "")).resolve().is_relative_to(WORKDIR),
|
||||
"message": "Writing outside workspace"},
|
||||
{"tools": ["bash"],
|
||||
"check": lambda args: any(kw in args.get("command", "") for kw in ["rm ", "> /etc/", "chmod 777"]),
|
||||
"message": "Potentially destructive command"},
|
||||
]
|
||||
|
||||
def check_rules(tool_name: str, args: dict) -> str | None:
|
||||
for rule in PERMISSION_RULES:
|
||||
if tool_name in rule["tools"] and rule["check"](args):
|
||||
return rule["message"]
|
||||
return None
|
||||
|
||||
|
||||
# Gate 3: User approval — wait for confirmation after rule match
|
||||
def ask_user(tool_name: str, args: dict, reason: str) -> str:
|
||||
print(f"\n\033[33m⚠ {reason}\033[0m")
|
||||
print(f" Tool: {tool_name}({args})")
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
return "allow" if choice in ("y", "yes") else "deny"
|
||||
|
||||
|
||||
# Pipeline: all three gates chained
|
||||
def check_permission(block) -> bool:
|
||||
if block.name == "bash":
|
||||
reason = check_deny_list(block.input.get("command", ""))
|
||||
if reason:
|
||||
print(f"\n\033[31m⛔ {reason}\033[0m")
|
||||
return False
|
||||
reason = check_rules(block.name, block.input)
|
||||
if reason:
|
||||
decision = ask_user(block.name, block.input, reason)
|
||||
if decision == "deny":
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — same as s02, with check_permission() inserted
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
|
||||
print(f"\033[36m> {block.name}\033[0m")
|
||||
|
||||
# s03 change: run through permission pipeline before executing
|
||||
if not check_permission(block):
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": "Permission denied."})
|
||||
continue
|
||||
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
print(str(output)[:200])
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
|
||||
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s03: Permission")
|
||||
print("输入问题,回车发送。输入 q 退出。\n")
|
||||
|
||||
history = []
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms03 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
97
s03_permission/images/permission-overview.en.svg
Normal file
@@ -0,0 +1,97 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 320" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="720" height="320" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Permission — Loop unchanged, a gate before tool execution</text>
|
||||
|
||||
<!-- ===== s02 preserved (gray) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s02 preserved</text>
|
||||
|
||||
<!-- User input -->
|
||||
<rect x="60" y="88" width="120" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="120" y="113" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- Arrow → LLM -->
|
||||
<line x1="180" y1="108" x2="228" y2="108" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="230" y="84" width="130" height="48" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="295" y="104" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="295" y="122" fill="#64748b" font-size="10" text-anchor="middle">stop_reason?</text>
|
||||
|
||||
<!-- No → return -->
|
||||
<line x1="295" y1="132" x2="295" y2="156" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="308" y="150" fill="#16a34a" font-size="9" font-weight="600">No</text>
|
||||
|
||||
<rect x="240" y="158" width="110" height="32" rx="16" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="295" y="178" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">Return result</text>
|
||||
|
||||
<!-- Yes → next step -->
|
||||
<line x1="360" y1="108" x2="400" y2="108" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="380" y="100" fill="#d97706" font-size="9" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ===== s03 new: Permission check ===== -->
|
||||
<text x="482" y="72" fill="#dc2626" font-size="11" font-weight="600" text-anchor="middle">s03 new</text>
|
||||
|
||||
<!-- Permission check box -->
|
||||
<rect x="402" y="78" width="160" height="120" rx="10" fill="#fef2f2" stroke="#dc2626" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="482" y="100" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">check_permission()</text>
|
||||
|
||||
<!-- Gate 1 -->
|
||||
<rect x="416" y="110" width="132" height="24" rx="4" fill="#fee2e2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="482" y="126" fill="#991b1b" font-size="9" font-weight="600" text-anchor="middle">Gate 1: Deny List</text>
|
||||
|
||||
<!-- Gate 2 -->
|
||||
<rect x="416" y="140" width="132" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="482" y="156" fill="#92400e" font-size="9" font-weight="600" text-anchor="middle">Gate 2: Rule Matching</text>
|
||||
|
||||
<!-- Gate 3 -->
|
||||
<rect x="416" y="170" width="132" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="482" y="186" fill="#92400e" font-size="9" font-weight="600" text-anchor="middle">Gate 3: User Approval</text>
|
||||
|
||||
<!-- Deny → return deny message -->
|
||||
<path d="M 402 188 L 376 188 L 376 174 L 350 174" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
|
||||
<text x="378" y="184" fill="#dc2626" font-size="8" font-weight="600">Deny</text>
|
||||
|
||||
<!-- Pass → tool execution -->
|
||||
<line x1="562" y1="138" x2="598" y2="138" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="575" y="132" fill="#16a34a" font-size="8" font-weight="600">Pass</text>
|
||||
|
||||
<!-- ===== s02 preserved: Tool execution ===== -->
|
||||
<text x="608" y="124" fill="#94a3b8" font-size="9">s02</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="600" y="130" width="100" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="650" y="152" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_</text>
|
||||
<text x="650" y="166" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">HANDLERS</text>
|
||||
<text x="650" y="184" fill="#64748b" font-size="8" text-anchor="middle">bash/read/write/...</text>
|
||||
|
||||
<!-- Arrow: tool results → back to messages -->
|
||||
<path d="M 700 162 L 710 162 L 710 230 L 120 230 L 120 128" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
|
||||
<!-- ===== Legend ===== -->
|
||||
<rect x="60" y="260" width="600" height="44" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="276" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="286" fill="#334155" font-size="10">s02 preserved (loop, LLM, dispatch — unchanged)</text>
|
||||
<rect x="400" y="276" width="12" height="12" rx="2" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="420" y="286" fill="#334155" font-size="10">s03 new (three-gate permission pipeline)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.6 KiB |
97
s03_permission/images/permission-overview.ja.svg
Normal file
@@ -0,0 +1,97 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 320" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="720" height="320" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Permission — ループは変更なし、ツール実行前にゲートを追加</text>
|
||||
|
||||
<!-- ===== s02 維持(灰色) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s02 維持</text>
|
||||
|
||||
<!-- ユーザー入力 -->
|
||||
<rect x="60" y="88" width="120" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="120" y="113" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- 矢印 → LLM -->
|
||||
<line x1="180" y1="108" x2="228" y2="108" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="230" y="84" width="130" height="48" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="295" y="104" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="295" y="122" fill="#64748b" font-size="10" text-anchor="middle">stop_reason?</text>
|
||||
|
||||
<!-- No → 戻る -->
|
||||
<line x1="295" y1="132" x2="295" y2="156" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="308" y="150" fill="#16a34a" font-size="9" font-weight="600">No</text>
|
||||
|
||||
<rect x="240" y="158" width="110" height="32" rx="16" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="295" y="178" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
|
||||
<!-- Yes → 次へ -->
|
||||
<line x1="360" y1="108" x2="400" y2="108" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="380" y="100" fill="#d97706" font-size="9" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ===== s03 新規:権限チェック ===== -->
|
||||
<text x="482" y="72" fill="#dc2626" font-size="11" font-weight="600" text-anchor="middle">s03 新規</text>
|
||||
|
||||
<!-- 権限チェック枠 -->
|
||||
<rect x="402" y="78" width="160" height="120" rx="10" fill="#fef2f2" stroke="#dc2626" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="482" y="100" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">check_permission()</text>
|
||||
|
||||
<!-- ゲート 1 -->
|
||||
<rect x="416" y="110" width="132" height="24" rx="4" fill="#fee2e2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="482" y="126" fill="#991b1b" font-size="9" font-weight="600" text-anchor="middle">ゲート 1: 拒否リスト</text>
|
||||
|
||||
<!-- ゲート 2 -->
|
||||
<rect x="416" y="140" width="132" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="482" y="156" fill="#92400e" font-size="9" font-weight="600" text-anchor="middle">ゲート 2: ルール照合</text>
|
||||
|
||||
<!-- ゲート 3 -->
|
||||
<rect x="416" y="170" width="132" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="482" y="186" fill="#92400e" font-size="9" font-weight="600" text-anchor="middle">ゲート 3: ユーザー承認</text>
|
||||
|
||||
<!-- 拒否 → 拒否メッセージを返す -->
|
||||
<path d="M 402 188 L 376 188 L 376 174 L 350 174" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
|
||||
<text x="378" y="184" fill="#dc2626" font-size="8" font-weight="600">拒否</text>
|
||||
|
||||
<!-- 通過 → ツール実行 -->
|
||||
<line x1="562" y1="138" x2="598" y2="138" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="575" y="132" fill="#16a34a" font-size="8" font-weight="600">通過</text>
|
||||
|
||||
<!-- ===== s02 維持:ツール実行 ===== -->
|
||||
<text x="608" y="124" fill="#94a3b8" font-size="9">s02</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="600" y="130" width="100" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="650" y="152" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_</text>
|
||||
<text x="650" y="166" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">HANDLERS</text>
|
||||
<text x="650" y="184" fill="#64748b" font-size="8" text-anchor="middle">bash/read/write/...</text>
|
||||
|
||||
<!-- 矢印:ツール結果 → メッセージリストに戻る -->
|
||||
<path d="M 700 162 L 710 162 L 710 230 L 120 230 L 120 128" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
|
||||
<!-- ===== 凡例 ===== -->
|
||||
<rect x="60" y="260" width="600" height="44" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="276" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="286" fill="#334155" font-size="10">s02 維持(ループ、LLM、ディスパッチ — 変更なし)</text>
|
||||
<rect x="400" y="276" width="12" height="12" rx="2" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="420" y="286" fill="#334155" font-size="10">s03 新規(3 ゲート権限パイプライン)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.8 KiB |
97
s03_permission/images/permission-overview.svg
Normal file
@@ -0,0 +1,97 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 320" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="720" height="320" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="720" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Permission — 循环不变,工具执行前加一道门</text>
|
||||
|
||||
<!-- ===== s02 保留(灰色) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s02 保留</text>
|
||||
|
||||
<!-- 用户输入 -->
|
||||
<rect x="60" y="88" width="120" height="40" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="120" y="113" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- 箭头 → LLM -->
|
||||
<line x1="180" y1="108" x2="228" y2="108" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="230" y="84" width="130" height="48" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="295" y="104" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="295" y="122" fill="#64748b" font-size="10" text-anchor="middle">stop_reason?</text>
|
||||
|
||||
<!-- 否 → 返回 -->
|
||||
<line x1="295" y1="132" x2="295" y2="156" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="308" y="150" fill="#16a34a" font-size="9" font-weight="600">否</text>
|
||||
|
||||
<rect x="240" y="158" width="110" height="32" rx="16" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="295" y="178" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
|
||||
<!-- 是 → 下一步 -->
|
||||
<line x1="360" y1="108" x2="400" y2="108" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="380" y="100" fill="#d97706" font-size="9" font-weight="600">是</text>
|
||||
|
||||
<!-- ===== s03 新增:权限检查 ===== -->
|
||||
<text x="482" y="72" fill="#dc2626" font-size="11" font-weight="600" text-anchor="middle">s03 新增</text>
|
||||
|
||||
<!-- 权限检查框 -->
|
||||
<rect x="402" y="78" width="160" height="120" rx="10" fill="#fef2f2" stroke="#dc2626" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="482" y="100" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">check_permission()</text>
|
||||
|
||||
<!-- 闸门 1 -->
|
||||
<rect x="416" y="110" width="132" height="24" rx="4" fill="#fee2e2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="482" y="126" fill="#991b1b" font-size="9" font-weight="600" text-anchor="middle">闸门 1: 拒绝列表</text>
|
||||
|
||||
<!-- 闸门 2 -->
|
||||
<rect x="416" y="140" width="132" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="482" y="156" fill="#92400e" font-size="9" font-weight="600" text-anchor="middle">闸门 2: 规则匹配</text>
|
||||
|
||||
<!-- 闸门 3 -->
|
||||
<rect x="416" y="170" width="132" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="482" y="186" fill="#92400e" font-size="9" font-weight="600" text-anchor="middle">闸门 3: 用户审批</text>
|
||||
|
||||
<!-- 拒绝 → 返回拒绝信息 -->
|
||||
<path d="M 402 188 L 376 188 L 376 174 L 350 174" fill="none" stroke="#dc2626" stroke-width="1.5" marker-end="url(#arrow-red)"/>
|
||||
<text x="378" y="184" fill="#dc2626" font-size="8" font-weight="600">拒绝</text>
|
||||
|
||||
<!-- 通过 → 工具执行 -->
|
||||
<line x1="562" y1="138" x2="598" y2="138" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="575" y="132" fill="#16a34a" font-size="8" font-weight="600">通过</text>
|
||||
|
||||
<!-- ===== s02 保留:工具执行 ===== -->
|
||||
<text x="608" y="124" fill="#94a3b8" font-size="9">s02</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="600" y="130" width="100" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="650" y="152" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_</text>
|
||||
<text x="650" y="166" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">HANDLERS</text>
|
||||
<text x="650" y="184" fill="#64748b" font-size="8" text-anchor="middle">bash/read/write/...</text>
|
||||
|
||||
<!-- 箭头:工具结果 → 回到消息列表 -->
|
||||
<path d="M 700 162 L 710 162 L 710 230 L 120 230 L 120 128" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
|
||||
<!-- ===== 图例 ===== -->
|
||||
<rect x="60" y="260" width="600" height="44" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="276" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="286" fill="#334155" font-size="10">s02 保留(循环、LLM、分发——完全不变)</text>
|
||||
<rect x="400" y="276" width="12" height="12" rx="2" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="420" y="286" fill="#334155" font-size="10">s03 新增(三道闸门权限管线)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.6 KiB |
61
s03_permission/images/permission-pipeline.en.svg
Normal file
@@ -0,0 +1,61 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 280" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="280" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Permission Pipeline — Three Gates</text>
|
||||
|
||||
<!-- Tool call enters -->
|
||||
<rect x="40" y="62" width="120" height="36" rx="6" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="84" fill="#1e40af" font-size="12" font-weight="600" text-anchor="middle">Tool call enters</text>
|
||||
|
||||
<line x1="160" y1="80" x2="210" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Gate 1: Deny list -->
|
||||
<rect x="214" y="56" width="145" height="48" rx="6" fill="#fee2e2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="286" y="76" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">Gate 1: Deny List</text>
|
||||
<text x="286" y="94" fill="#991b1b" font-size="9" text-anchor="middle">rm -rf /, sudo, shutdown</text>
|
||||
|
||||
<line x1="359" y1="80" x2="409" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Gate 2: Rule check -->
|
||||
<rect x="413" y="56" width="145" height="48" rx="6" fill="#fef3c7" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="485" y="76" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">Gate 2: Rule Matching</text>
|
||||
<text x="485" y="94" fill="#92400e" font-size="9" text-anchor="middle">Write outside ws? Destructive?</text>
|
||||
<text x="485" y="116" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">no match → allow</text>
|
||||
|
||||
<line x1="558" y1="80" x2="608" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="583" y="72" fill="#92400e" font-size="8" font-weight="600" text-anchor="middle">match</text>
|
||||
|
||||
<!-- Gate 3: User approval -->
|
||||
<rect x="612" y="56" width="90" height="48" rx="6" fill="#fef3c7" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="657" y="76" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">Gate 3</text>
|
||||
<text x="657" y="94" fill="#92400e" font-size="9" text-anchor="middle">User approval</text>
|
||||
<text x="657" y="116" fill="#64748b" font-size="8" font-weight="600" text-anchor="middle">allow / deny</text>
|
||||
|
||||
<!-- Results area -->
|
||||
<rect x="40" y="130" width="662" height="130" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="60" y="152" fill="#1e3a5f" font-size="12" font-weight="600">Three Decisions</text>
|
||||
|
||||
<rect x="60" y="166" width="200" height="42" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="160" y="184" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">Deny</text>
|
||||
<text x="160" y="200" fill="#991b1b" font-size="9" text-anchor="middle">Gate 1 hit, or user denied</text>
|
||||
|
||||
<rect x="280" y="166" width="200" height="42" rx="4" fill="#fef3c7" stroke="#fbbf24" stroke-width="0.5"/>
|
||||
<text x="380" y="184" fill="#92400e" font-size="11" font-weight="600" text-anchor="middle">Ask</text>
|
||||
<text x="380" y="200" fill="#92400e" font-size="9" text-anchor="middle">Gate 2 matched, enter Gate 3</text>
|
||||
|
||||
<rect x="500" y="166" width="182" height="42" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="0.5"/>
|
||||
<text x="591" y="184" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">Allow</text>
|
||||
<text x="591" y="200" fill="#166534" font-size="9" text-anchor="middle">No rule hit, or user approved</text>
|
||||
|
||||
<text x="371" y="248" fill="#64748b" font-size="10" text-anchor="middle">Priority: hard deny → rule matching → if matched ask user; if unmatched allow by default</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.1 KiB |
61
s03_permission/images/permission-pipeline.ja.svg
Normal file
@@ -0,0 +1,61 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 280" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="280" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Permission Pipeline — 3 つのゲート</text>
|
||||
|
||||
<!-- Tool call enters -->
|
||||
<rect x="40" y="62" width="120" height="36" rx="6" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="84" fill="#1e40af" font-size="12" font-weight="600" text-anchor="middle">ツール呼び出し</text>
|
||||
|
||||
<line x1="160" y1="80" x2="210" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Gate 1: Deny list -->
|
||||
<rect x="214" y="56" width="145" height="48" rx="6" fill="#fee2e2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="286" y="76" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">ゲート 1: 拒否リスト</text>
|
||||
<text x="286" y="94" fill="#991b1b" font-size="9" text-anchor="middle">rm -rf /, sudo, shutdown</text>
|
||||
|
||||
<line x1="359" y1="80" x2="409" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Gate 2: Rule check -->
|
||||
<rect x="413" y="56" width="145" height="48" rx="6" fill="#fef3c7" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="485" y="76" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">ゲート 2: ルール照合</text>
|
||||
<text x="485" y="94" fill="#92400e" font-size="9" text-anchor="middle">ws 外への書き込み?破壊的?</text>
|
||||
<text x="485" y="116" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">不一致 → allow</text>
|
||||
|
||||
<line x1="558" y1="80" x2="608" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="583" y="72" fill="#92400e" font-size="8" font-weight="600" text-anchor="middle">一致</text>
|
||||
|
||||
<!-- Gate 3: User approval -->
|
||||
<rect x="612" y="56" width="90" height="48" rx="6" fill="#fef3c7" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="657" y="76" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">ゲート 3</text>
|
||||
<text x="657" y="94" fill="#92400e" font-size="9" text-anchor="middle">ユーザー承認</text>
|
||||
<text x="657" y="116" fill="#64748b" font-size="8" font-weight="600" text-anchor="middle">allow / deny</text>
|
||||
|
||||
<!-- Results area -->
|
||||
<rect x="40" y="130" width="662" height="130" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="60" y="152" fill="#1e3a5f" font-size="12" font-weight="600">3 つの決定</text>
|
||||
|
||||
<rect x="60" y="166" width="200" height="42" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="160" y="184" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">拒否 (deny)</text>
|
||||
<text x="160" y="200" fill="#991b1b" font-size="9" text-anchor="middle">ゲート 1 一致、またはユーザー拒否</text>
|
||||
|
||||
<rect x="280" y="166" width="200" height="42" rx="4" fill="#fef3c7" stroke="#fbbf24" stroke-width="0.5"/>
|
||||
<text x="380" y="184" fill="#92400e" font-size="11" font-weight="600" text-anchor="middle">確認 (ask)</text>
|
||||
<text x="380" y="200" fill="#92400e" font-size="9" text-anchor="middle">ゲート 2 一致、ゲート 3 へ</text>
|
||||
|
||||
<rect x="500" y="166" width="182" height="42" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="0.5"/>
|
||||
<text x="591" y="184" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">許可 (allow)</text>
|
||||
<text x="591" y="200" fill="#166534" font-size="9" text-anchor="middle">ルール不一致、またはユーザー許可</text>
|
||||
|
||||
<text x="371" y="248" fill="#64748b" font-size="10" text-anchor="middle">優先順位:ハード拒否 → ルール照合 → 一致ならユーザー承認、不一致ならデフォルト許可</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.2 KiB |
61
s03_permission/images/permission-pipeline.svg
Normal file
@@ -0,0 +1,61 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 280" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="280" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Permission Pipeline — 三道闸门</text>
|
||||
|
||||
<!-- Tool call enters -->
|
||||
<rect x="40" y="62" width="120" height="36" rx="6" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="84" fill="#1e40af" font-size="12" font-weight="600" text-anchor="middle">工具调用进入</text>
|
||||
|
||||
<line x1="160" y1="80" x2="210" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Gate 1: Deny list -->
|
||||
<rect x="214" y="56" width="145" height="48" rx="6" fill="#fee2e2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="286" y="76" fill="#991b1b" font-size="11" font-weight="700" text-anchor="middle">闸门 1: 拒绝列表</text>
|
||||
<text x="286" y="94" fill="#991b1b" font-size="9" text-anchor="middle">rm -rf /, sudo, shutdown</text>
|
||||
|
||||
<line x1="359" y1="80" x2="409" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Gate 2: Rule check -->
|
||||
<rect x="413" y="56" width="145" height="48" rx="6" fill="#fef3c7" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="485" y="76" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">闸门 2: 规则匹配</text>
|
||||
<text x="485" y="94" fill="#92400e" font-size="9" text-anchor="middle">写工作区外?读敏感路径?</text>
|
||||
<text x="485" y="116" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">未命中 → allow</text>
|
||||
|
||||
<line x1="558" y1="80" x2="608" y2="80" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="583" y="72" fill="#92400e" font-size="8" font-weight="600" text-anchor="middle">命中</text>
|
||||
|
||||
<!-- Gate 3: User approval -->
|
||||
<rect x="612" y="56" width="90" height="48" rx="6" fill="#fef3c7" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="657" y="76" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">闸门 3</text>
|
||||
<text x="657" y="94" fill="#92400e" font-size="9" text-anchor="middle">用户审批</text>
|
||||
<text x="657" y="116" fill="#64748b" font-size="8" font-weight="600" text-anchor="middle">允许 / 拒绝</text>
|
||||
|
||||
<!-- Results area -->
|
||||
<rect x="40" y="130" width="662" height="130" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="60" y="152" fill="#1e3a5f" font-size="12" font-weight="600">三种决策</text>
|
||||
|
||||
<rect x="60" y="166" width="200" height="42" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="160" y="184" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">阻止 (deny)</text>
|
||||
<text x="160" y="200" fill="#991b1b" font-size="9" text-anchor="middle">闸门 1 命中,或用户拒绝</text>
|
||||
|
||||
<rect x="280" y="166" width="200" height="42" rx="4" fill="#fef3c7" stroke="#fbbf24" stroke-width="0.5"/>
|
||||
<text x="380" y="184" fill="#92400e" font-size="11" font-weight="600" text-anchor="middle">询问 (ask)</text>
|
||||
<text x="380" y="200" fill="#92400e" font-size="9" text-anchor="middle">闸门 2 命中,进入闸门 3</text>
|
||||
|
||||
<rect x="500" y="166" width="182" height="42" rx="4" fill="#dcfce7" stroke="#86efac" stroke-width="0.5"/>
|
||||
<text x="591" y="184" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">允许 (allow)</text>
|
||||
<text x="591" y="200" fill="#166534" font-size="9" text-anchor="middle">规则未命中,或用户允许</text>
|
||||
|
||||
<text x="371" y="248" fill="#64748b" font-size="10" text-anchor="middle">规则优先:闸门 1 硬拒绝 → 闸门 2 规则匹配 → 命中则用户审批,未命中默认允许</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.1 KiB |
282
s04_hooks/README.en.md
Normal file
@@ -0,0 +1,282 @@
|
||||
# s04: Hooks — Hang on the Loop, Don't Write into It
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → `s04` → [s05](../s05_todo_write/) → s06 → ... → s20
|
||||
|
||||
> *"Hang on the loop, don't write into it"* — Hooks inject extension logic before and after tool execution.
|
||||
>
|
||||
> **Harness Layer**: Hooks — Extension points that don't invade the loop.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
The s03 Agent has permission checks. But every new check, "log every bash call", "auto git add after writes", requires modifying the `agent_loop` function.
|
||||
|
||||
The loop quickly becomes this:
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
# ... LLM call ...
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
log_to_file(block) # added a line
|
||||
check_permission(block) # added a line
|
||||
notify_slack(block) # added another line
|
||||
output = execute(block)
|
||||
auto_git_add(block) # yet another line
|
||||
# ... the loop is unrecognizable
|
||||
```
|
||||
|
||||
What you want to extend is the Agent's behavior, but what you're modifying is the loop itself. The loop should be a stable core; extensions should hang on the outside.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
The s03 loop and permission logic are fully preserved. The only change is moving `check_permission()` from inside the loop body onto a hook. The loop no longer directly calls any check function. Instead it calls `trigger_hooks("PreToolUse", block)`, and the registry decides what to run.
|
||||
|
||||
Four events, covering a complete agent cycle:
|
||||
|
||||
| Event | Trigger Timing | Typical Use |
|
||||
|-------|---------------|-------------|
|
||||
| UserPromptSubmit | After user input, before entering LLM | Input validation, context injection |
|
||||
| PreToolUse | Before tool execution | Permission checks, logging |
|
||||
| PostToolUse | After tool execution | Side effects (auto git add etc.), output checking |
|
||||
| Stop | When the loop is about to exit | Cleanup (CC also supports force continuation) |
|
||||
|
||||
Extensions are added via `register_hook()`. The loop only calls `trigger_hooks()`.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
**Hook registry**: a dict mapping event names to callback lists.
|
||||
|
||||
```python
|
||||
HOOKS = {
|
||||
"UserPromptSubmit": [],
|
||||
"PreToolUse": [],
|
||||
"PostToolUse": [],
|
||||
"Stop": [],
|
||||
}
|
||||
|
||||
def register_hook(event: str, callback):
|
||||
HOOKS[event].append(callback)
|
||||
|
||||
def trigger_hooks(event: str, *args):
|
||||
for callback in HOOKS[event]:
|
||||
result = callback(*args)
|
||||
if result is not None: # return value ≠ None → hook says "stop"
|
||||
return result
|
||||
return None
|
||||
```
|
||||
|
||||
In the teaching version, PreToolUse returning non-None means block execution; Stop returning non-None means force continuation. UserPromptSubmit and PostToolUse return values are unused.
|
||||
|
||||
**UserPromptSubmit**, triggers after user input, before entering the LLM. CC can intercept or modify input; the teaching version only logs:
|
||||
|
||||
```python
|
||||
def context_inject_hook(query: str) -> str | None:
|
||||
"""Inject current working directory info into every prompt."""
|
||||
print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m")
|
||||
return None # return None = no modification, let prompt through
|
||||
|
||||
register_hook("UserPromptSubmit", context_inject_hook)
|
||||
```
|
||||
|
||||
In the main loop, triggered right after user input:
|
||||
|
||||
```python
|
||||
query = input("s04 >> ")
|
||||
trigger_hooks("UserPromptSubmit", query) # ← before entering LLM
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
```
|
||||
|
||||
**PreToolUse / PostToolUse**, hooks before and after tool execution. s03's permission check logic is now wrapped as a PreToolUse hook, plus a logging hook and a large-output reminder:
|
||||
|
||||
```python
|
||||
# PreToolUse: permission check (s03 logic, moved from loop to hook)
|
||||
def permission_hook(block):
|
||||
if block.name == "bash":
|
||||
for pattern in DENY_LIST:
|
||||
if pattern in block.input.get("command", ""):
|
||||
return "Permission denied by deny list"
|
||||
if block.name in ("write_file", "edit_file"):
|
||||
path = block.input.get("path", "")
|
||||
if not (WORKDIR / path).resolve().is_relative_to(WORKDIR):
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
if choice not in ("y", "yes"):
|
||||
return "Permission denied by user"
|
||||
return None
|
||||
|
||||
# PreToolUse: logging
|
||||
def log_hook(block):
|
||||
print(f"[HOOK] {block.name}(...)")
|
||||
|
||||
# PostToolUse: large output reminder
|
||||
def large_output_hook(block, output):
|
||||
if len(str(output)) > 100000:
|
||||
print(f"[HOOK] ⚠ Large output from {block.name}")
|
||||
|
||||
register_hook("PreToolUse", permission_hook)
|
||||
register_hook("PreToolUse", log_hook)
|
||||
register_hook("PostToolUse", large_output_hook)
|
||||
```
|
||||
|
||||
**Stop**, triggers when the loop is about to exit (`stop_reason != "tool_use"`). The teaching version prints a cleanup summary:
|
||||
|
||||
```python
|
||||
def summary_hook(messages: list) -> str | None:
|
||||
"""Print a summary when the loop is about to stop."""
|
||||
tool_count = sum(1 for m in messages
|
||||
for b in (m.get("content") if isinstance(m.get("content"), list) else [])
|
||||
if isinstance(b, dict) and b.get("type") == "tool_result")
|
||||
print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m")
|
||||
return None # return None = allow stop, return string = force continuation
|
||||
|
||||
register_hook("Stop", summary_hook)
|
||||
```
|
||||
|
||||
In agent_loop, triggered before exit:
|
||||
|
||||
```python
|
||||
if response.stop_reason != "tool_use":
|
||||
force = trigger_hooks("Stop", messages) # ← before exiting
|
||||
if force:
|
||||
# hook returned a message → inject it and continue
|
||||
messages.append({"role": "user", "content": force})
|
||||
continue
|
||||
return
|
||||
```
|
||||
|
||||
**Only one change in the loop**: s03 directly called `check_permission(block)`, s04 replaces it with `trigger_hooks("PreToolUse", block)`:
|
||||
|
||||
```python
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
|
||||
# s03: if not check_permission(block): ...
|
||||
# s04: hooks replace hardcoding
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": output})
|
||||
```
|
||||
|
||||
Four hooks cover the critical nodes of the agent cycle: input → before execution → after execution → exit. The loop only calls trigger_hooks(); all logic lives in hook callbacks.
|
||||
|
||||
---
|
||||
|
||||
## Changes from s03
|
||||
|
||||
| Component | Before (s03) | After (s04) |
|
||||
|-----------|-------------|-------------|
|
||||
| Extension method | check_permission() hardcoded in the loop | HOOKS registry + trigger_hooks() |
|
||||
| New functions | — | register_hook, trigger_hooks |
|
||||
| Hook callbacks | — | context_inject_hook, permission_hook, log_hook, large_output_hook, summary_hook |
|
||||
| Loop | Directly calls check_permission() | Calls trigger_hooks("PreToolUse", ...) |
|
||||
| Exit control | None | trigger_hooks("Stop", ...) can prevent exit |
|
||||
| Input interception | None | trigger_hooks("UserPromptSubmit", ...) can inject context |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s04_hooks/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Read the file README.md` (should pass directly, observe hook logs)
|
||||
2. `Create a file called test.txt` (after creation, observe if PostToolUse fires)
|
||||
3. `Delete all temporary files in /tmp` (bash + rm triggers permission hook)
|
||||
|
||||
What to watch for: Before each tool execution, does the `[HOOK]` log appear? When permission is denied, was it intercepted by a hook or hardcoded in the loop?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
The Agent can now safely execute operations. But does it ever stop to think "what should I do first, and what next?" Given a complex task, does it jump straight in, or plan first?
|
||||
|
||||
→ s05 TodoWrite: Give the Agent a planning tool. Make a list first, then execute.
|
||||
|
||||
<details>
|
||||
<summary>Dive into CC Source Code</summary>
|
||||
|
||||
> The following is based on a complete analysis of CC source code `toolHooks.ts` (650 lines), `hooks.ts`, `stopHooks.ts`, and `coreTypes.ts`.
|
||||
|
||||
### 1. Hook Events: Not Just 4, but 27
|
||||
|
||||
The teaching version covers only PreToolUse and PostToolUse. CC actually has 27 hook events (`coreTypes.ts:25-53`):
|
||||
|
||||
| Category | Events |
|
||||
|----------|--------|
|
||||
| Tool-related | `PreToolUse`, `PostToolUse`, `PostToolUseFailure` |
|
||||
| Session-related | `SessionStart`, `SessionEnd`, `Stop`, `StopFailure`, `Setup` |
|
||||
| User interaction | `UserPromptSubmit`, `Notification`, `PermissionRequest`, `PermissionDenied` |
|
||||
| Sub-agents | `SubagentStart`, `SubagentStop` |
|
||||
| Compaction-related | `PreCompact`, `PostCompact` |
|
||||
| Team-related | `TeammateIdle`, `TaskCreated`, `TaskCompleted` |
|
||||
| Other | `Elicitation`, `ElicitationResult`, `ConfigChange`, `WorktreeCreate`, `WorktreeRemove`, `InstructionsLoaded`, `CwdChanged`, `FileChanged` |
|
||||
|
||||
The teaching version covers only 4 core events (UserPromptSubmit, PreToolUse, PostToolUse, Stop) because they cover every critical node of a complete agent cycle. The other 23 follow the same pattern.
|
||||
|
||||
### 2. HookResult Common Fields
|
||||
|
||||
CC's `HookResult` (`types/hooks.ts:260-275`) has 14 fields. Common ones:
|
||||
|
||||
| Field | Type | Purpose |
|
||||
|-------|------|---------|
|
||||
| `message` | Message | Optional UI message |
|
||||
| `blockingError` | HookBlockingError | Blocking error → injected into conversation for model self-correction |
|
||||
| `outcome` | success/blocking/non_blocking_error/cancelled | Execution result |
|
||||
| `preventContinuation` | boolean | Prevent subsequent execution |
|
||||
| `stopReason` | string | Stop reason description |
|
||||
| `permissionBehavior` | allow/deny/ask/passthrough | Hook returns permission decision |
|
||||
| `updatedInput` | Record | Modify tool input |
|
||||
| `additionalContext` | string | Additional context |
|
||||
| `updatedMCPToolOutput` | unknown | MCP tool output modification |
|
||||
|
||||
### 3. Key Invariant: Hook 'allow' Cannot Bypass deny/ask Rules
|
||||
|
||||
This is the most important security design in CC's permission system (`toolHooks.ts:325-331`): **when a hook returns allow, it still checks settings.json deny/ask rules.** Even if the user's hook script says "allow", if the tool is disabled in settings.json, the operation is still blocked.
|
||||
|
||||
The teaching version doesn't have this layer; hooks returning non-None directly interrupt. This is sufficient for teaching, but would create a security vulnerability in production.
|
||||
|
||||
### 4. stopHookActive Mechanism
|
||||
|
||||
CC's Stop hooks have an infinite-loop prevention mechanism (`query.ts:212,1300`): the `stopHookActive` state field. When stop hooks produce a blockingError, the loop re-enters with `stopHookActive: true`. Subsequent iterations see this flag and don't trigger stop hooks again. This prevents a never-stopping bug: model self-corrects → stop hook errors again → model self-corrects again → stop hook errors again...
|
||||
|
||||
### 5. hook_stopped_continuation
|
||||
|
||||
When PostToolUse hooks return `preventContinuation: true`, a `hook_stopped_continuation` attachment is produced (`toolHooks.ts:117-130`). query.ts (L1388-1393) detects it and sets `shouldPreventContinuation = true`, causing the loop to exit. This is the mechanism for "hooks gracefully shut down the Agent" — not a crash, but a completion.
|
||||
|
||||
### Teaching Version Simplifications Are Intentional
|
||||
|
||||
- 27 events → 4 (UserPromptSubmit/PreToolUse/PostToolUse/Stop): covers agent cycle critical nodes
|
||||
- 14 fields → simple return values (None = continue, non-None = interrupt/continue): minimal cognitive load
|
||||
- Hook allow vs deny/ask invariant → omitted: teaching version has no settings.json layer
|
||||
- stopHookActive → omitted: teaching version Stop hook only does simple continuation, no infinite-loop prevention needed
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
282
s04_hooks/README.ja.md
Normal file
@@ -0,0 +1,282 @@
|
||||
# s04: Hooks — ループに掛ける、ループには書き込まない
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → `s04` → [s05](../s05_todo_write/) → s06 → ... → s20
|
||||
|
||||
> *"ループに掛ける、ループには書き込まない"* — フックがツール実行の前後に拡張ロジックを注入する。
|
||||
>
|
||||
> **Harness レイヤー**: フック — ループを侵襲しない拡張ポイント。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
s03 の Agent には権限チェックがある。しかし新しいチェックを追加するたび、「bash 呼び出しを毎回ログに記録」「操作後に自動 git add」、`agent_loop` 関数を修正する必要がある。
|
||||
|
||||
ループはすぐにこうなる:
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
# ... LLM call ...
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
log_to_file(block) # 一行追加
|
||||
check_permission(block) # 一行追加
|
||||
notify_slack(block) # さらに一行追加
|
||||
output = execute(block)
|
||||
auto_git_add(block) # さらに一行追加
|
||||
# ... もうループが見えない
|
||||
```
|
||||
|
||||
拡張したいのは Agent の振る舞いなのに、変更しているのはループそのもの。ループは安定した核心であるべき。拡張は外側に掛ける。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
s03 のループと権限ロジックは完全に保持される。唯一の変更点は `check_permission()` をループ本体内からフックに移動したこと。ループはもうチェック関数を直接呼び出さず、代わりに `trigger_hooks("PreToolUse", block)` を呼び、登録済みのフックが何を実行するかを決める。
|
||||
|
||||
4 つのイベントで、完全な agent cycle をカバー:
|
||||
|
||||
| イベント | 発火タイミング | 典型的な用途 |
|
||||
|----------|--------------|-------------|
|
||||
| UserPromptSubmit | ユーザー入力後、LLM に入る前 | 入力バリデーション、コンテキスト注入 |
|
||||
| PreToolUse | ツール実行前 | 権限チェック、ログ記録 |
|
||||
| PostToolUse | ツール実行後 | 副作用(自動 git add など)、出力チェック |
|
||||
| Stop | ループが終了する直前 | クリーンアップ(CC は強制続行もサポート) |
|
||||
|
||||
拡張は `register_hook()` で追加する。ループは `trigger_hooks()` を呼ぶだけ。
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||
**フック登録簿**:イベント名をコールバックリストにマッピングする辞書。
|
||||
|
||||
```python
|
||||
HOOKS = {
|
||||
"UserPromptSubmit": [],
|
||||
"PreToolUse": [],
|
||||
"PostToolUse": [],
|
||||
"Stop": [],
|
||||
}
|
||||
|
||||
def register_hook(event: str, callback):
|
||||
HOOKS[event].append(callback)
|
||||
|
||||
def trigger_hooks(event: str, *args):
|
||||
for callback in HOOKS[event]:
|
||||
result = callback(*args)
|
||||
if result is not None: # 戻り値 ≠ None → フックが「止め」と指示
|
||||
return result
|
||||
return None
|
||||
```
|
||||
|
||||
教学版では、PreToolUse の非 None 戻り値は実行阻止を意味し、Stop の非 None 戻り値は強制続行を意味する。UserPromptSubmit と PostToolUse の戻り値は未使用。
|
||||
|
||||
**UserPromptSubmit**、ユーザー入力後、LLM に入る前に発火。CC では入力の横取りや変更が可能、教学版はログ出力のみ:
|
||||
|
||||
```python
|
||||
def context_inject_hook(query: str) -> str | None:
|
||||
"""Inject current working directory info into every prompt."""
|
||||
print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m")
|
||||
return None # return None = 変更なし、プロンプトを通す
|
||||
|
||||
register_hook("UserPromptSubmit", context_inject_hook)
|
||||
```
|
||||
|
||||
メインループでは、ユーザー入力直後に発火:
|
||||
|
||||
```python
|
||||
query = input("s04 >> ")
|
||||
trigger_hooks("UserPromptSubmit", query) # ← LLM に入る前
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
```
|
||||
|
||||
**PreToolUse / PostToolUse**、ツール実行の前後のフック。s03 の権限チェックロジックは PreToolUse フックに包まれ、さらにログフックと大出力リマインダーが追加される:
|
||||
|
||||
```python
|
||||
# PreToolUse: 権限チェック(s03 のロジック、ループからフックに移動)
|
||||
def permission_hook(block):
|
||||
if block.name == "bash":
|
||||
for pattern in DENY_LIST:
|
||||
if pattern in block.input.get("command", ""):
|
||||
return "Permission denied by deny list"
|
||||
if block.name in ("write_file", "edit_file"):
|
||||
path = block.input.get("path", "")
|
||||
if not (WORKDIR / path).resolve().is_relative_to(WORKDIR):
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
if choice not in ("y", "yes"):
|
||||
return "Permission denied by user"
|
||||
return None
|
||||
|
||||
# PreToolUse: ログ
|
||||
def log_hook(block):
|
||||
print(f"[HOOK] {block.name}(...)")
|
||||
|
||||
# PostToolUse: 大ファイルリマインダー
|
||||
def large_output_hook(block, output):
|
||||
if len(str(output)) > 100000:
|
||||
print(f"[HOOK] ⚠ Large output from {block.name}")
|
||||
|
||||
register_hook("PreToolUse", permission_hook)
|
||||
register_hook("PreToolUse", log_hook)
|
||||
register_hook("PostToolUse", large_output_hook)
|
||||
```
|
||||
|
||||
**Stop**、ループが終了する直前に発火(`stop_reason != "tool_use"`)。教学版ではクリーンアップ統計を印刷:
|
||||
|
||||
```python
|
||||
def summary_hook(messages: list) -> str | None:
|
||||
"""Print a summary when the loop is about to stop."""
|
||||
tool_count = sum(1 for m in messages
|
||||
for b in (m.get("content") if isinstance(m.get("content"), list) else [])
|
||||
if isinstance(b, dict) and b.get("type") == "tool_result")
|
||||
print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m")
|
||||
return None # return None = 終了を許可、return 文字列 = 強制続行
|
||||
|
||||
register_hook("Stop", summary_hook)
|
||||
```
|
||||
|
||||
agent_loop 内では、終了前に発火:
|
||||
|
||||
```python
|
||||
if response.stop_reason != "tool_use":
|
||||
force = trigger_hooks("Stop", messages) # ← 終了する前に
|
||||
if force:
|
||||
# フックがメッセージを返した → 注入して続行
|
||||
messages.append({"role": "user", "content": force})
|
||||
continue
|
||||
return
|
||||
```
|
||||
|
||||
**ループ内で変更されたのは一箇所だけ**:s03 は直接 `check_permission(block)` を呼び出していたが、s04 は `trigger_hooks("PreToolUse", block)` に置き換えた:
|
||||
|
||||
```python
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
|
||||
# s03: if not check_permission(block): ...
|
||||
# s04: フックがハードコードを代替
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": output})
|
||||
```
|
||||
|
||||
4 つのフックが agent cycle の重要ノードをカバー:入力→実行前→実行後→終了。ループは trigger_hooks() を呼ぶだけで、具体的なロジックは全てフックコールバックにある。
|
||||
|
||||
---
|
||||
|
||||
## s03 からの変更
|
||||
|
||||
| コンポーネント | 変更前 (s03) | 変更後 (s04) |
|
||||
|--------------|-------------|-------------|
|
||||
| 拡張方式 | check_permission() をループ内にハードコード | HOOKS 登録簿 + trigger_hooks() |
|
||||
| 新規関数 | — | register_hook, trigger_hooks |
|
||||
| フックコールバック | — | context_inject_hook, permission_hook, log_hook, large_output_hook, summary_hook |
|
||||
| ループ | check_permission() を直接呼び出し | trigger_hooks("PreToolUse", ...) を呼び出し |
|
||||
| 終了制御 | なし | trigger_hooks("Stop", ...) が終了を阻止可能 |
|
||||
| 入力横取り | なし | trigger_hooks("UserPromptSubmit", ...) がコンテキスト注入可能 |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s04_hooks/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみよう:
|
||||
|
||||
1. `Read the file README.md`(そのまま通過するはず、フックログを観察)
|
||||
2. `Create a file called test.txt`(作成後、PostToolUse が発火するか観察)
|
||||
3. `Delete all temporary files in /tmp`(bash + rm で権限フックが発動)
|
||||
|
||||
観察のポイント:各ツール実行前に `[HOOK]` ログが表示されるか? 権限が拒否されたとき、フックが拦截したのか、ループ内のハードコードが拦截したのか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
Agent は安全に操作を実行できるようになった。しかし「まず何をして、次に何をすべきか」を立ち止まって考えたことはあるか? 複雑なタスクを与えたとき、すぐに取り掛かるのか、まず計画を立てるのか?
|
||||
|
||||
→ s05 TodoWrite:Agent に計画ツールを与える。まずリストを作り、それから実行。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードを深掘り</summary>
|
||||
|
||||
> 以下は CC ソースコード `toolHooks.ts`(650 行)、`hooks.ts`、`stopHooks.ts`、`coreTypes.ts` の完全分析に基づく。
|
||||
|
||||
### 一、Hook イベント:4 つではなく 27 個
|
||||
|
||||
教育版は PreToolUse と PostToolUse のみを取り上げる。CC には実際に 27 のフックイベントがある(`coreTypes.ts:25-53`):
|
||||
|
||||
| カテゴリ | イベント |
|
||||
|----------|---------|
|
||||
| ツール関連 | `PreToolUse`, `PostToolUse`, `PostToolUseFailure` |
|
||||
| セッション関連 | `SessionStart`, `SessionEnd`, `Stop`, `StopFailure`, `Setup` |
|
||||
| ユーザー対話 | `UserPromptSubmit`, `Notification`, `PermissionRequest`, `PermissionDenied` |
|
||||
| サブエージェント | `SubagentStart`, `SubagentStop` |
|
||||
| 圧縮関連 | `PreCompact`, `PostCompact` |
|
||||
| チーム関連 | `TeammateIdle`, `TaskCreated`, `TaskCompleted` |
|
||||
| その他 | `Elicitation`, `ElicitationResult`, `ConfigChange`, `WorktreeCreate`, `WorktreeRemove`, `InstructionsLoaded`, `CwdChanged`, `FileChanged` |
|
||||
|
||||
教育版は 4 つのコアイベント(UserPromptSubmit、PreToolUse、PostToolUse、Stop)のみを取り上げる。これらで agent cycle の重要ノードを全てカバーできる。残り 23 個は同じパターン。
|
||||
|
||||
### 二、HookResult よく使うフィールド抜粋
|
||||
|
||||
CC の `HookResult`(`types/hooks.ts:260-275`)には 14 のフィールドがある。よく使うもの:
|
||||
|
||||
| フィールド | 型 | 用途 |
|
||||
|-----------|-----|------|
|
||||
| `message` | Message | オプションの UI メッセージ |
|
||||
| `blockingError` | HookBlockingError | ブロッキングエラー → 会話に注入してモデルが自己修正 |
|
||||
| `outcome` | success/blocking/non_blocking_error/cancelled | 実行結果 |
|
||||
| `preventContinuation` | boolean | 後続実行を阻止 |
|
||||
| `stopReason` | string | 停止理由の説明 |
|
||||
| `permissionBehavior` | allow/deny/ask/passthrough | フックが権限決定を返す |
|
||||
| `updatedInput` | Record | ツール入力の変更 |
|
||||
| `additionalContext` | string | 追加コンテキスト |
|
||||
| `updatedMCPToolOutput` | unknown | MCP ツール出力の変更 |
|
||||
|
||||
### 三、重要な不変条件:Hook 'allow' は deny/ask ルールをバイパスできない
|
||||
|
||||
これは CC 権限システムで最も重要なセキュリティ設計(`toolHooks.ts:325-331`):**フックが allow を返しても、settings.json の deny/ask ルールをチェックする。** ユーザーのフックスクリプトが「許可」と言っても、settings.json でそのツールが無効になっていれば、操作は阻止される。
|
||||
|
||||
教育版にはこの階層がない。フックが非 None を返せば直接中断。教育目的では十分だが、本番環境ではセキュリティホールになる。
|
||||
|
||||
### 四、stopHookActive 機構
|
||||
|
||||
CC の Stop フックには無限ループ防止機構がある(`query.ts:212,1300`):`stopHookActive` 状態フィールド。Stop フックが blockingError を発生させると、ループは `stopHookActive: true` で次のラウンドに再入する。後続のイテレーションではこのフラグを見て Stop フックを再トリガーしない。これで「永久に止まらない」バグを防ぐ:モデルが自己修正 → Stop フックが再度エラー → モデルが再修正 → Stop フックが再度エラー... を防止。
|
||||
|
||||
### 五、hook_stopped_continuation
|
||||
|
||||
PostToolUse フックが `preventContinuation: true` を返すと、`hook_stopped_continuation` アタッチメントが生成される(`toolHooks.ts:117-130`)。query.ts(L1388-1393)はそれを検出して `shouldPreventContinuation = true` を設定し、ループが終了する。これは「フックが Agent を優雅に停止させる」機構 — クラッシュではなく、完了。
|
||||
|
||||
### 教育版の簡略化は意図的
|
||||
|
||||
- 27 イベント → 4(UserPromptSubmit/PreToolUse/PostToolUse/Stop):agent cycle の重要ノードをカバー
|
||||
- 14 フィールド → 単純な戻り値(None = 続行、非 None = 中断/続行):認知負荷を最小限に
|
||||
- Hook allow vs deny/ask の不変条件 → 省略:教育版に settings.json 層はない
|
||||
- stopHookActive → 省略:教育版の Stop フックは単純な続行のみ、無限ループ防止は不要
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
282
s04_hooks/README.md
Normal file
@@ -0,0 +1,282 @@
|
||||
# s04: Hooks — 挂在循环上,不写进循环里
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → `s04` → [s05](../s05_todo_write/) → s06 → ... → s20
|
||||
|
||||
> *"挂在循环上, 不写进循环里"* — hook 在工具执行前后注入扩展逻辑。
|
||||
>
|
||||
> **Harness 层**: hook — 扩展点不侵入循环。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
s03 的 Agent 有权限检查了。但每次加一个新检查,比如"记录每次 bash 调用"、"操作后自动 git add",都要修改 `agent_loop` 函数。
|
||||
|
||||
循环很快就变成了这样:
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
# ... LLM call ...
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
log_to_file(block) # 加一行
|
||||
check_permission(block) # 加一行
|
||||
notify_slack(block) # 又加一行
|
||||
output = execute(block)
|
||||
auto_git_add(block) # 再加一行
|
||||
# ... 很快循环就认不出来了
|
||||
```
|
||||
|
||||
你想扩展的是 Agent 的行为,但你改的却是循环本身。循环应该是一个稳定的核心,扩展应该挂在外面。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
s03 的循环和权限逻辑完全保留。唯一的变动是把 `check_permission()` 从循环体内移到了 hook 上,循环不再直接调用任何检查函数,改为 `trigger_hooks("PreToolUse", block)`,由注册表决定跑什么。
|
||||
|
||||
四个事件,覆盖一个完整的 agent cycle:
|
||||
|
||||
| 事件 | 触发时机 | 典型用途 |
|
||||
|------|---------|---------|
|
||||
| UserPromptSubmit | 用户输入提交后、进入 LLM 前 | 输入验证、注入上下文 |
|
||||
| PreToolUse | 工具执行前 | 权限检查、日志记录 |
|
||||
| PostToolUse | 工具执行后 | 副作用(自动 git add 等)、输出检查 |
|
||||
| Stop | 循环即将退出时 | 收尾清理(CC 还支持强制续跑) |
|
||||
|
||||
扩展通过 `register_hook()` 添加,循环只调用 `trigger_hooks()`。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||
**hook 注册表**:一个字典,事件名映射到回调列表。
|
||||
|
||||
```python
|
||||
HOOKS = {
|
||||
"UserPromptSubmit": [],
|
||||
"PreToolUse": [],
|
||||
"PostToolUse": [],
|
||||
"Stop": [],
|
||||
}
|
||||
|
||||
def register_hook(event: str, callback):
|
||||
HOOKS[event].append(callback)
|
||||
|
||||
def trigger_hooks(event: str, *args):
|
||||
for callback in HOOKS[event]:
|
||||
result = callback(*args)
|
||||
if result is not None: # 返回值 ≠ None → hook 说"停"
|
||||
return result
|
||||
return None
|
||||
```
|
||||
|
||||
教学版中,PreToolUse 的非 None 返回值会阻止本次工具执行,Stop 的非 None 返回值会强制续跑。UserPromptSubmit 和 PostToolUse 的返回值未被使用。
|
||||
|
||||
**UserPromptSubmit**,用户输入提交后、进入 LLM 前触发。CC 中可以拦截或修改输入,教学版只做日志演示:
|
||||
|
||||
```python
|
||||
def context_inject_hook(query: str) -> str | None:
|
||||
"""Inject current working directory info into every prompt."""
|
||||
print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m")
|
||||
return None # return None = no modification, let prompt through
|
||||
|
||||
register_hook("UserPromptSubmit", context_inject_hook)
|
||||
```
|
||||
|
||||
在主循环中,用户输入后立即触发:
|
||||
|
||||
```python
|
||||
query = input("s04 >> ")
|
||||
trigger_hooks("UserPromptSubmit", query) # ← 进入 LLM 之前
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
```
|
||||
|
||||
**PreToolUse / PostToolUse**,工具执行前后的 hook。s03 的权限检查逻辑现在包装成 PreToolUse hook,再加一个日志 hook 和一个大输出提醒:
|
||||
|
||||
```python
|
||||
# PreToolUse: 权限检查(s03 的逻辑,从循环移到 hook)
|
||||
def permission_hook(block):
|
||||
if block.name == "bash":
|
||||
for pattern in DENY_LIST:
|
||||
if pattern in block.input.get("command", ""):
|
||||
return "Permission denied by deny list"
|
||||
if block.name in ("write_file", "edit_file"):
|
||||
path = block.input.get("path", "")
|
||||
if not (WORKDIR / path).resolve().is_relative_to(WORKDIR):
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
if choice not in ("y", "yes"):
|
||||
return "Permission denied by user"
|
||||
return None
|
||||
|
||||
# PreToolUse: 日志
|
||||
def log_hook(block):
|
||||
print(f"[HOOK] {block.name}(...)")
|
||||
|
||||
# PostToolUse: 大文件提醒
|
||||
def large_output_hook(block, output):
|
||||
if len(str(output)) > 100000:
|
||||
print(f"[HOOK] ⚠ Large output from {block.name}")
|
||||
|
||||
register_hook("PreToolUse", permission_hook)
|
||||
register_hook("PreToolUse", log_hook)
|
||||
register_hook("PostToolUse", large_output_hook)
|
||||
```
|
||||
|
||||
**Stop**,循环即将退出时触发(`stop_reason != "tool_use"`)。教学版用于打印收尾统计:
|
||||
|
||||
```python
|
||||
def summary_hook(messages: list) -> str | None:
|
||||
"""Print a summary when the loop is about to stop."""
|
||||
tool_count = sum(1 for m in messages
|
||||
for b in (m.get("content") if isinstance(m.get("content"), list) else [])
|
||||
if isinstance(b, dict) and b.get("type") == "tool_result")
|
||||
print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m")
|
||||
return None # return None = allow stop, return string = force continuation
|
||||
|
||||
register_hook("Stop", summary_hook)
|
||||
```
|
||||
|
||||
在 agent_loop 中,退出前触发:
|
||||
|
||||
```python
|
||||
if response.stop_reason != "tool_use":
|
||||
force = trigger_hooks("Stop", messages) # ← 退出之前
|
||||
if force:
|
||||
# hook returned a message → inject it and continue
|
||||
messages.append({"role": "user", "content": force})
|
||||
continue
|
||||
return
|
||||
```
|
||||
|
||||
**循环里只改了一处**:s03 直接调用 `check_permission(block)`,s04 改为 `trigger_hooks("PreToolUse", block)`:
|
||||
|
||||
```python
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
|
||||
# s03: if not check_permission(block): ...
|
||||
# s04: hook 替代硬编码
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": output})
|
||||
```
|
||||
|
||||
四个 hook 覆盖了 agent cycle 的关键节点:输入→执行前→执行后→退出。循环只负责调用 trigger_hooks(),具体逻辑全在 hook 回调里。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s03 的变更
|
||||
|
||||
| 组件 | 之前 (s03) | 之后 (s04) |
|
||||
|------|-----------|-----------|
|
||||
| 扩展方式 | check_permission() 硬编码在循环里 | HOOKS 注册表 + trigger_hooks() |
|
||||
| 新函数 | — | register_hook, trigger_hooks |
|
||||
| hook 回调 | — | context_inject_hook, permission_hook, log_hook, large_output_hook, summary_hook |
|
||||
| 循环 | 直接调用 check_permission() | 调用 trigger_hooks("PreToolUse", ...) |
|
||||
| 退出控制 | 无 | trigger_hooks("Stop", ...) 可阻止退出 |
|
||||
| 输入拦截 | 无 | trigger_hooks("UserPromptSubmit", ...) 可注入上下文 |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s04_hooks/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Read the file README.md`(应该直接通过,观察 hook 日志)
|
||||
2. `Create a file called test.txt`(通过后观察 PostToolUse 是否触发)
|
||||
3. `Delete all temporary files in /tmp`(bash + rm 触发权限 hook)
|
||||
|
||||
观察重点:每次工具执行前,是否出现了 `[HOOK]` 日志?权限被拒时,是 hook 拦截的还是循环里硬编码的?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
Agent 现在能安全执行操作了。但它有没有停下来想过"我应该先做什么,再做什么"?给它一个复杂任务,它是一上来就动手,还是先列个计划?
|
||||
|
||||
s05 TodoWrite → 给 Agent 一个计划工具。先列清单,再做。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `toolHooks.ts`(650 行)、`hooks.ts`、`stopHooks.ts`、`coreTypes.ts` 的完整分析。
|
||||
|
||||
### 一、Hook 事件:不止这 4 个,而是 27 个
|
||||
|
||||
教学版只讲了 PreToolUse 和 PostToolUse。CC 实际有 27 个 hook 事件(`coreTypes.ts:25-53`):
|
||||
|
||||
| 类别 | 事件 |
|
||||
|------|------|
|
||||
| 工具相关 | `PreToolUse`, `PostToolUse`, `PostToolUseFailure` |
|
||||
| 会话相关 | `SessionStart`, `SessionEnd`, `Stop`, `StopFailure`, `Setup` |
|
||||
| 用户交互 | `UserPromptSubmit`, `Notification`, `PermissionRequest`, `PermissionDenied` |
|
||||
| 子 Agent | `SubagentStart`, `SubagentStop` |
|
||||
| 压缩相关 | `PreCompact`, `PostCompact` |
|
||||
| 团队相关 | `TeammateIdle`, `TaskCreated`, `TaskCompleted` |
|
||||
| 其他 | `Elicitation`, `ElicitationResult`, `ConfigChange`, `WorktreeCreate`, `WorktreeRemove`, `InstructionsLoaded`, `CwdChanged`, `FileChanged` |
|
||||
|
||||
教学版只讲 4 个核心事件(UserPromptSubmit、PreToolUse、PostToolUse、Stop),因为它们覆盖了一个完整 agent cycle 的关键节点。其他 23 个都是同样的模式。
|
||||
|
||||
### 二、HookResult 常用字段摘录
|
||||
|
||||
CC 的 `HookResult`(`types/hooks.ts:260-275`)有 14 个字段,以下是常用字段:
|
||||
|
||||
| 字段 | 类型 | 用途 |
|
||||
|------|------|------|
|
||||
| `message` | Message | 可选 UI 消息 |
|
||||
| `blockingError` | HookBlockingError | 阻塞错误 → 注入对话让模型自纠 |
|
||||
| `outcome` | success/blocking/non_blocking_error/cancelled | 执行结果 |
|
||||
| `preventContinuation` | boolean | 阻止后续执行 |
|
||||
| `stopReason` | string | 停止原因描述 |
|
||||
| `permissionBehavior` | allow/deny/ask/passthrough | hook 返回权限决策 |
|
||||
| `updatedInput` | Record | 修改工具输入 |
|
||||
| `additionalContext` | string | 附加上下文 |
|
||||
| `updatedMCPToolOutput` | unknown | MCP 工具输出修改 |
|
||||
|
||||
### 三、关键不变式:Hook 'allow' 不能绕过 deny/ask 规则
|
||||
|
||||
这是 CC 权限系统最重要的安全设计(`toolHooks.ts:325-331`):**hook 返回 allow 时,仍然要检查 settings.json 的 deny/ask 规则**。即使用户的 hook 脚本说"允许",如果在 settings.json 中禁用了这个工具,操作仍然会被阻止。
|
||||
|
||||
教学版没有这个层次,只把 PreToolUse 的非 None 返回值解释为阻止本次工具执行。这在教学场景中够了,但在生产环境中会形成安全漏洞。
|
||||
|
||||
### 四、stopHookActive 机制
|
||||
|
||||
CC 的 Stop hooks 有一个防无限循环机制(`query.ts:212,1300`):`stopHookActive` 状态字段。当 stop hooks 产生 blockingError 时,循环带 `stopHookActive: true` 重入下一轮。后续迭代中 stop hooks 看到这个标志就不会再次触发。这防止了一个永不停机的 bug:模型自纠后 stop hook 再次报错 → 模型再自纠 → stop hook 再报错...
|
||||
|
||||
### 五、hook_stopped_continuation
|
||||
|
||||
PostToolUse hooks 返回 `preventContinuation: true` 时,会产生一个 `hook_stopped_continuation` 附件(`toolHooks.ts:117-130`)。query.ts(L1388-1393)检测到后设置 `shouldPreventContinuation = true`,循环退出。这是 "hook 优雅地让 Agent 停机" 的机制,不是崩溃,是完成。
|
||||
|
||||
### 教学版的简化是刻意的
|
||||
|
||||
- 27 个事件 → 4 个(UserPromptSubmit/PreToolUse/PostToolUse/Stop):覆盖 agent cycle 关键节点
|
||||
- 14 个字段 → 简单的返回值(None = 继续,非 None = 阻止/续跑):心智负担降到最低
|
||||
- Hook allow vs deny/ask 不变式 → 省略:教学版没有 settings.json 层
|
||||
- stopHookActive → 省略:教学版 Stop hook 只做简单续跑,不涉及防无限循环机制
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v0, ja@v0 -->
|
||||
293
s04_hooks/code.py
Normal file
@@ -0,0 +1,293 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s04: Hooks — move extension logic out of the loop, onto hooks.
|
||||
|
||||
User types query
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ UserPromptSubmit │ ── trigger_hooks() before LLM
|
||||
└────────┬─────────┘
|
||||
▼
|
||||
┌────────────┐ ┌─────────────────────────────┐
|
||||
│ messages │────▶│ LLM (stop_reason=tool_use?)│
|
||||
└────────────┘ │ No ──▶ Stop hooks ──▶ exit │
|
||||
│ Yes ──▶ tool_use block ──┐ │
|
||||
└────────────────────────────┘ │
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ trigger_hooks() │
|
||||
│ PreToolUse: │
|
||||
│ permission_hook │
|
||||
│ log_hook │
|
||||
└───────┬──────────┘
|
||||
│ (not blocked)
|
||||
┌───────▼──────────┐
|
||||
│ TOOL_HANDLERS[x] │
|
||||
└───────┬──────────┘
|
||||
│
|
||||
┌───────▼──────────┐
|
||||
│ trigger_hooks() │
|
||||
│ PostToolUse: │
|
||||
│ large_output │
|
||||
└───────┬──────────┘
|
||||
│
|
||||
results ──▶ back to messages
|
||||
|
||||
Changes from s03:
|
||||
+ HOOKS registry (event -> list of callbacks)
|
||||
+ register_hook() / trigger_hooks()
|
||||
+ context_inject_hook (UserPromptSubmit)
|
||||
+ permission_hook, log_hook (PreToolUse)
|
||||
+ large_output_hook (PostToolUse)
|
||||
+ summary_hook (Stop)
|
||||
- check_permission() removed from loop body
|
||||
(logic moved into permission_hook, triggered via PreToolUse)
|
||||
|
||||
Run: python s04_hooks/code.py
|
||||
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
|
||||
"""
|
||||
|
||||
import os, subprocess
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
readline.parse_and_bind('set input-meta on')
|
||||
readline.parse_and_bind('set output-meta on')
|
||||
readline.parse_and_bind('set convert-meta off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks. Act, don't explain."
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02-s03 (unchanged): Tool Implementations
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text:
|
||||
return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob,
|
||||
}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s04: Hook System (s03 permission logic now via hooks)
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
HOOKS = {"UserPromptSubmit": [], "PreToolUse": [], "PostToolUse": [], "Stop": []}
|
||||
|
||||
def register_hook(event: str, callback):
|
||||
HOOKS[event].append(callback)
|
||||
|
||||
def trigger_hooks(event: str, *args):
|
||||
for callback in HOOKS[event]:
|
||||
result = callback(*args)
|
||||
if result is not None: # teaching shortcut: block this tool call
|
||||
return result
|
||||
return None
|
||||
|
||||
|
||||
# s03 permission check logic, now wrapped as a hook
|
||||
DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if="]
|
||||
DESTRUCTIVE = ["rm ", "> /etc/", "chmod 777"]
|
||||
|
||||
def permission_hook(block):
|
||||
"""PreToolUse: s03 check_permission() logic moved here."""
|
||||
if block.name == "bash":
|
||||
for pattern in DENY_LIST:
|
||||
if pattern in block.input.get("command", ""):
|
||||
print(f"\n\033[31m⛔ Blocked: '{pattern}'\033[0m")
|
||||
return "Permission denied by deny list"
|
||||
for kw in DESTRUCTIVE:
|
||||
if kw in block.input.get("command", ""):
|
||||
print(f"\n\033[33m⚠ Potentially destructive command\033[0m")
|
||||
print(f" Tool: {block.name}({block.input})")
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
if choice not in ("y", "yes"):
|
||||
return "Permission denied by user"
|
||||
if block.name in ("write_file", "edit_file"):
|
||||
path = block.input.get("path", "")
|
||||
if not (WORKDIR / path).resolve().is_relative_to(WORKDIR):
|
||||
print(f"\n\033[33m⚠ Writing outside workspace\033[0m")
|
||||
print(f" Tool: {block.name}({block.input})")
|
||||
choice = input(" Allow? [y/N] ").strip().lower()
|
||||
if choice not in ("y", "yes"):
|
||||
return "Permission denied by user"
|
||||
return None
|
||||
|
||||
def log_hook(block):
|
||||
"""PreToolUse: log every tool call."""
|
||||
args_preview = str(list(block.input.values())[:2])[:60]
|
||||
print(f"\033[90m[HOOK] {block.name}({args_preview})\033[0m")
|
||||
return None
|
||||
|
||||
def large_output_hook(block, output):
|
||||
"""PostToolUse: warn on large output."""
|
||||
if len(str(output)) > 100000:
|
||||
print(f"\033[33m[HOOK] ⚠ Large output from {block.name}: {len(str(output))} chars\033[0m")
|
||||
return None
|
||||
|
||||
# UserPromptSubmit hook: log user input before it reaches the LLM
|
||||
def context_inject_hook(query: str):
|
||||
print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m")
|
||||
return None
|
||||
|
||||
# Stop hook: print summary when loop is about to exit
|
||||
def summary_hook(messages: list):
|
||||
tool_count = sum(1 for m in messages
|
||||
for b in (m.get("content") if isinstance(m.get("content"), list) else [])
|
||||
if isinstance(b, dict) and b.get("type") == "tool_result")
|
||||
print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m")
|
||||
return None
|
||||
|
||||
register_hook("UserPromptSubmit", context_inject_hook)
|
||||
register_hook("PreToolUse", permission_hook)
|
||||
register_hook("PreToolUse", log_hook)
|
||||
register_hook("PostToolUse", large_output_hook)
|
||||
register_hook("Stop", summary_hook)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — same structure as s03, but no hard-coded check
|
||||
# s03: if not check_permission(block): ...
|
||||
# s04: if trigger_hooks("PreToolUse", block): ...
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
force = trigger_hooks("Stop", messages)
|
||||
if force:
|
||||
messages.append({"role": "user", "content": force})
|
||||
continue
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
|
||||
# s04 change: hook replaces hard-coded check_permission()
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
|
||||
trigger_hooks("PostToolUse", block, output) # s04: post hook
|
||||
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
|
||||
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s04: Hooks — extension logic on hooks, loop stays clean")
|
||||
print("Type a question, press Enter. Type q to quit.\n")
|
||||
|
||||
history = []
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms04 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
trigger_hooks("UserPromptSubmit", query)
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
100
s04_hooks/images/hooks-overview.en.svg
Normal file
@@ -0,0 +1,100 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 460" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="800" height="460" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Hooks — Extension Logic Hangs Outside, Loop Unchanged</text>
|
||||
|
||||
<!-- ===== Main Flow Line (y=140 horizontal) ===== -->
|
||||
|
||||
<!-- ① messages[] -->
|
||||
<rect x="40" y="112" width="110" height="56" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="95" y="138" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="95" y="156" fill="#64748b" font-size="9" text-anchor="middle">(s01 preserved)</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="150" y1="140" x2="198" y2="140" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ② LLM -->
|
||||
<rect x="200" y="108" width="120" height="64" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="260" y="134" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="260" y="154" fill="#64748b" font-size="10" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM No → Return -->
|
||||
<line x1="260" y1="172" x2="260" y2="200" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="275" y="192" fill="#16a34a" font-size="10" font-weight="600">No</text>
|
||||
<rect x="205" y="202" width="110" height="28" rx="14" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="260" y="220" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">Return Result</text>
|
||||
|
||||
<!-- LLM Yes → PreToolUse -->
|
||||
<line x1="320" y1="140" x2="378" y2="140" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="345" y="132" fill="#d97706" font-size="10" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ③ PreToolUse Hook (s04 new) -->
|
||||
<rect x="380" y="96" width="160" height="88" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="460" y="116" fill="#166534" font-size="11" font-weight="700" text-anchor="middle">trigger_hooks()</text>
|
||||
<text x="460" y="132" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">PreToolUse</text>
|
||||
<rect x="396" y="140" width="128" height="18" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="0.8"/>
|
||||
<text x="460" y="153" fill="#166534" font-size="8" text-anchor="middle">permission_hook · log_hook</text>
|
||||
<text x="460" y="176" fill="#64748b" font-size="8" text-anchor="middle">Teaching: non-None → block</text>
|
||||
|
||||
<!-- PreToolUse Block → branch down -->
|
||||
<line x1="460" y1="184" x2="460" y2="218" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow-red)"/>
|
||||
<rect x="405" y="220" width="110" height="24" rx="12" fill="#fef2f2" stroke="#dc2626" stroke-width="1.5"/>
|
||||
<text x="460" y="236" fill="#991b1b" font-size="10" font-weight="600" text-anchor="middle">Write tool_result</text>
|
||||
|
||||
<!-- PreToolUse Pass → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="140" x2="588" y2="140" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="558" y="132" fill="#16a34a" font-size="9" font-weight="600">Pass</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS (s02 preserved) -->
|
||||
<rect x="590" y="108" width="100" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="640" y="134" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_</text>
|
||||
<text x="640" y="148" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">HANDLERS</text>
|
||||
<text x="640" y="164" fill="#64748b" font-size="8" text-anchor="middle">bash/read/...</text>
|
||||
|
||||
<!-- TOOL_HANDLERS → PostToolUse (down) -->
|
||||
<line x1="640" y1="172" x2="640" y2="268" stroke="#16a34a" stroke-width="2"/>
|
||||
<text x="648" y="224" fill="#16a34a" font-size="9" font-weight="600">After exec</text>
|
||||
|
||||
<!-- ⑤ PostToolUse Hook (s04 new) -->
|
||||
<rect x="560" y="270" width="160" height="56" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="640" y="290" fill="#166534" font-size="11" font-weight="700" text-anchor="middle">trigger_hooks()</text>
|
||||
<text x="640" y="306" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">PostToolUse</text>
|
||||
<rect x="576" y="310" width="128" height="12" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="0.8"/>
|
||||
<text x="640" y="320" fill="#166534" font-size="7" text-anchor="middle">large_output_hook</text>
|
||||
|
||||
<!-- ===== Loop: results back to messages ===== -->
|
||||
<path d="M 720 298 L 760 298 L 760 350 L 95 350 L 95 168" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="400" y="370" fill="#64748b" font-size="10" text-anchor="middle">Results appended to messages[], loop continues</text>
|
||||
|
||||
<!-- ===== Bottom Comparison ===== -->
|
||||
<rect x="60" y="396" width="680" height="48" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="100" y="416" fill="#94a3b8" font-size="10" font-weight="600">s03:</text>
|
||||
<text x="130" y="416" fill="#64748b" font-size="10" font-family="monospace">if not check_permission(block): ...</text>
|
||||
<text x="400" y="416" fill="#94a3b8" font-size="10">← every new check requires modifying the loop</text>
|
||||
<text x="100" y="436" fill="#16a34a" font-size="10" font-weight="600">s04:</text>
|
||||
<text x="130" y="436" fill="#16a34a" font-size="10" font-family="monospace">blocked = trigger_hooks("PreToolUse", block)</text>
|
||||
<text x="520" y="436" fill="#16a34a" font-size="10">← add check = register_hook(), loop unchanged</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.6 KiB |
100
s04_hooks/images/hooks-overview.ja.svg
Normal file
@@ -0,0 +1,100 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 460" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="800" height="460" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Hooks — 拡張ロジックは外側に、ループは一文字も変更しない</text>
|
||||
|
||||
<!-- ===== メインフロー(y=140 水平) ===== -->
|
||||
|
||||
<!-- ① messages[] -->
|
||||
<rect x="40" y="112" width="110" height="56" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="95" y="138" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="95" y="156" fill="#64748b" font-size="9" text-anchor="middle">(s01 保持)</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="150" y1="140" x2="198" y2="140" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ② LLM -->
|
||||
<rect x="200" y="108" width="120" height="64" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="260" y="134" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="260" y="154" fill="#64748b" font-size="10" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM No → 返却 -->
|
||||
<line x1="260" y1="172" x2="260" y2="200" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="275" y="192" fill="#16a34a" font-size="10" font-weight="600">No</text>
|
||||
<rect x="205" y="202" width="110" height="28" rx="14" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="260" y="220" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
|
||||
<!-- LLM Yes → PreToolUse -->
|
||||
<line x1="320" y1="140" x2="378" y2="140" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="345" y="132" fill="#d97706" font-size="10" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ③ PreToolUse フック(s04 新規) -->
|
||||
<rect x="380" y="96" width="160" height="88" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="460" y="116" fill="#166534" font-size="11" font-weight="700" text-anchor="middle">trigger_hooks()</text>
|
||||
<text x="460" y="132" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">PreToolUse</text>
|
||||
<rect x="396" y="140" width="128" height="18" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="0.8"/>
|
||||
<text x="460" y="153" fill="#166534" font-size="8" text-anchor="middle">permission_hook · log_hook</text>
|
||||
<text x="460" y="176" fill="#64748b" font-size="8" text-anchor="middle">教育版: 非 None → ブロック</text>
|
||||
|
||||
<!-- PreToolUse 中断 → 下に分岐 -->
|
||||
<line x1="460" y1="184" x2="460" y2="218" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow-red)"/>
|
||||
<rect x="405" y="220" width="110" height="24" rx="12" fill="#fef2f2" stroke="#dc2626" stroke-width="1.5"/>
|
||||
<text x="460" y="236" fill="#991b1b" font-size="10" font-weight="600" text-anchor="middle">tool_result に返す</text>
|
||||
|
||||
<!-- PreToolUse 通過 → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="140" x2="588" y2="140" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="558" y="132" fill="#16a34a" font-size="9" font-weight="600">通過</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS (s02 保持) -->
|
||||
<rect x="590" y="108" width="100" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="640" y="134" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_</text>
|
||||
<text x="640" y="148" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">HANDLERS</text>
|
||||
<text x="640" y="164" fill="#64748b" font-size="8" text-anchor="middle">bash/read/...</text>
|
||||
|
||||
<!-- TOOL_HANDLERS → PostToolUse (下) -->
|
||||
<line x1="640" y1="172" x2="640" y2="268" stroke="#16a34a" stroke-width="2"/>
|
||||
<text x="648" y="224" fill="#16a34a" font-size="9" font-weight="600">実行後</text>
|
||||
|
||||
<!-- ⑤ PostToolUse フック(s04 新規) -->
|
||||
<rect x="560" y="270" width="160" height="56" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="640" y="290" fill="#166534" font-size="11" font-weight="700" text-anchor="middle">trigger_hooks()</text>
|
||||
<text x="640" y="306" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">PostToolUse</text>
|
||||
<rect x="576" y="310" width="128" height="12" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="0.8"/>
|
||||
<text x="640" y="320" fill="#166534" font-size="7" text-anchor="middle">large_output_hook</text>
|
||||
|
||||
<!-- ===== ループ:結果を messages に戻す ===== -->
|
||||
<path d="M 720 298 L 760 298 L 760 350 L 95 350 L 95 168" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="400" y="370" fill="#64748b" font-size="10" text-anchor="middle">結果を messages[] に追加、ループ継続</text>
|
||||
|
||||
<!-- ===== 下部比較 ===== -->
|
||||
<rect x="60" y="396" width="680" height="48" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="100" y="416" fill="#94a3b8" font-size="10" font-weight="600">s03:</text>
|
||||
<text x="130" y="416" fill="#64748b" font-size="10" font-family="monospace">if not check_permission(block): ...</text>
|
||||
<text x="400" y="416" fill="#94a3b8" font-size="10">← チェックを追加するたびにループを修正</text>
|
||||
<text x="100" y="436" fill="#16a34a" font-size="10" font-weight="600">s04:</text>
|
||||
<text x="130" y="436" fill="#16a34a" font-size="10" font-family="monospace">blocked = trigger_hooks("PreToolUse", block)</text>
|
||||
<text x="520" y="436" fill="#16a34a" font-size="10">← チェック追加 = register_hook()、ループ不変</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.7 KiB |
100
s04_hooks/images/hooks-overview.svg
Normal file
@@ -0,0 +1,100 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 460" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="800" height="460" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Hooks — 扩展逻辑挂在外面,循环本身一字不改</text>
|
||||
|
||||
<!-- ===== 主流程线(y=140 水平) ===== -->
|
||||
|
||||
<!-- ① messages[] -->
|
||||
<rect x="40" y="112" width="110" height="56" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="95" y="138" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="95" y="156" fill="#64748b" font-size="9" text-anchor="middle">(s01 保留)</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="150" y1="140" x2="198" y2="140" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ② LLM -->
|
||||
<rect x="200" y="108" width="120" height="64" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="260" y="134" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="260" y="154" fill="#64748b" font-size="10" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM 否 → 返回 -->
|
||||
<line x1="260" y1="172" x2="260" y2="200" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="275" y="192" fill="#16a34a" font-size="10" font-weight="600">否</text>
|
||||
<rect x="205" y="202" width="110" height="28" rx="14" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="260" y="220" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
|
||||
<!-- LLM 是 → PreToolUse -->
|
||||
<line x1="320" y1="140" x2="378" y2="140" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="345" y="132" fill="#d97706" font-size="10" font-weight="600">是</text>
|
||||
|
||||
<!-- ③ PreToolUse hook(s04 新增) -->
|
||||
<rect x="380" y="96" width="160" height="88" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="460" y="116" fill="#166534" font-size="11" font-weight="700" text-anchor="middle">trigger_hooks()</text>
|
||||
<text x="460" y="132" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">PreToolUse</text>
|
||||
<rect x="396" y="140" width="128" height="18" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="0.8"/>
|
||||
<text x="460" y="153" fill="#166534" font-size="8" text-anchor="middle">permission_hook · log_hook</text>
|
||||
<text x="460" y="176" fill="#64748b" font-size="8" text-anchor="middle">教学版:非 None → 阻止</text>
|
||||
|
||||
<!-- PreToolUse 阻止 → 向下引出 -->
|
||||
<line x1="460" y1="184" x2="460" y2="218" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow-red)"/>
|
||||
<rect x="405" y="220" width="110" height="24" rx="12" fill="#fef2f2" stroke="#dc2626" stroke-width="1.5"/>
|
||||
<text x="460" y="236" fill="#991b1b" font-size="10" font-weight="600" text-anchor="middle">写入 tool_result</text>
|
||||
|
||||
<!-- PreToolUse 通过 → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="140" x2="588" y2="140" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="558" y="132" fill="#16a34a" font-size="9" font-weight="600">通过</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS (s02 保留) -->
|
||||
<rect x="590" y="108" width="100" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="640" y="134" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_</text>
|
||||
<text x="640" y="148" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">HANDLERS</text>
|
||||
<text x="640" y="164" fill="#64748b" font-size="8" text-anchor="middle">bash/read/...</text>
|
||||
|
||||
<!-- TOOL_HANDLERS → PostToolUse (向下) -->
|
||||
<line x1="640" y1="172" x2="640" y2="268" stroke="#16a34a" stroke-width="2"/>
|
||||
<text x="648" y="224" fill="#16a34a" font-size="9" font-weight="600">执行后</text>
|
||||
|
||||
<!-- ⑤ PostToolUse hook(s04 新增) -->
|
||||
<rect x="560" y="270" width="160" height="56" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="640" y="290" fill="#166534" font-size="11" font-weight="700" text-anchor="middle">trigger_hooks()</text>
|
||||
<text x="640" y="306" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">PostToolUse</text>
|
||||
<rect x="576" y="310" width="128" height="12" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="0.8"/>
|
||||
<text x="640" y="320" fill="#166534" font-size="7" text-anchor="middle">large_output_hook</text>
|
||||
|
||||
<!-- ===== 回环:结果回到 messages ===== -->
|
||||
<path d="M 720 298 L 760 298 L 760 350 L 95 350 L 95 168" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="400" y="370" fill="#64748b" font-size="10" text-anchor="middle">结果追加到 messages[],循环继续</text>
|
||||
|
||||
<!-- ===== 底部对比 ===== -->
|
||||
<rect x="60" y="396" width="680" height="48" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="100" y="416" fill="#94a3b8" font-size="10" font-weight="600">s03:</text>
|
||||
<text x="130" y="416" fill="#64748b" font-size="10" font-family="monospace">if not check_permission(block): ...</text>
|
||||
<text x="400" y="416" fill="#94a3b8" font-size="10">← 每加一个检查就要改循环</text>
|
||||
<text x="100" y="436" fill="#16a34a" font-size="10" font-weight="600">s04:</text>
|
||||
<text x="130" y="436" fill="#16a34a" font-size="10" font-family="monospace">blocked = trigger_hooks("PreToolUse", block)</text>
|
||||
<text x="520" y="436" fill="#16a34a" font-size="10">← 加检查 = register_hook(),循环不改</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.6 KiB |
156
s05_todo_write/README.en.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# s05: TodoWrite — An Agent Without a Plan Drifts Off Course
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → `s05` → [s06](../s06_subagent/) → s07 → ... → s20
|
||||
|
||||
> *"An agent without a plan goes wherever the wind blows"* — List the steps first, then execute. Complex tasks are less likely to miss steps.
|
||||
>
|
||||
> **Harness Layer**: Planning — Let the Agent think before it acts.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
Give the Agent a complex task: "Rename all Python files to snake_case, run tests, and fix failures."
|
||||
|
||||
The Agent starts working, renames 3 files, runs a test, finds 2 failures, starts fixing. While fixing, it forgets the original goal was "rename to snake_case", the test failures have consumed all its attention.
|
||||
|
||||
The longer the conversation, the worse it gets: tool results keep filling the context, diluting the system prompt's influence. A 10-step refactoring: after steps 1-3, the Agent starts improvising because steps 4-10 have been pushed out of its attention.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
The minimal hook structure from the previous chapter is preserved, focusing on the new `todo_write` tool and reminder mechanism. `todo_write` does no actual work, can't read files or run commands, it simply lets the Agent organize its thoughts before diving in.
|
||||
|
||||
The dispatch mechanism is unchanged; the new tool is still routed through `TOOL_HANDLERS[block.name]`. However, to demonstrate the todo reminder, a counter was added to the loop: after 3 consecutive rounds without calling `todo_write`, a reminder is injected.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
**The todo_write tool**, accepts a list with statuses, persists to `.tasks/current_todos.json` (teaching version writes to disk for observability), and displays progress in the terminal:
|
||||
|
||||
```python
|
||||
def run_todo_write(todos: list) -> str:
|
||||
tasks_file = TASKS_DIR / "current_todos.json"
|
||||
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
|
||||
|
||||
lines = ["\n## Current Tasks"]
|
||||
for t in todos:
|
||||
icon = {"pending": " ", "in_progress": "▸", "completed": "✓"}[t["status"]]
|
||||
lines.append(f" [{icon}] {t['content']}")
|
||||
print("\n".join(lines))
|
||||
return f"Updated {len(todos)} tasks"
|
||||
```
|
||||
|
||||
The tool definition joins the other 5 in the dispatch map:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", ...},
|
||||
{"name": "read_file", ...},
|
||||
{"name": "write_file", ...},
|
||||
{"name": "edit_file", ...},
|
||||
{"name": "glob", ...},
|
||||
# s05: new entry
|
||||
{"name": "todo_write", "description": "Create and manage a task list ...",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"todos": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {"type": "string"},
|
||||
"status": {"type": "string", "enum": ["pending", "in_progress", "completed"]},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS["todo_write"] = run_todo_write
|
||||
```
|
||||
|
||||
**Nag reminder**, when the model hasn't called `todo_write` for 3 consecutive rounds, a reminder is automatically injected (teaching mechanism; CC source has no fixed round-count logic):
|
||||
|
||||
```python
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
messages.append({
|
||||
"role": "user",
|
||||
"content": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
rounds_since_todo = 0
|
||||
```
|
||||
|
||||
Typical flow when the Agent receives a task: first call `todo_write` to list all steps (all `pending`) → pick one step, set it to `in_progress` → complete it, set to `completed` → look at the next `pending` → continue. After 3 rounds without `todo_write`, the loop appends a reminder before the next LLM call.
|
||||
|
||||
**Key insight**: todo_write doesn't give the Agent any additional **execution capability**. What it adds is **planning capability**.
|
||||
|
||||
---
|
||||
|
||||
## Changes from s04
|
||||
|
||||
| Component | Before (s04) | After (s05) |
|
||||
|-----------|-------------|-------------|
|
||||
| Tool count | 5 (bash, read, write, edit, glob) | 6 (+todo_write) |
|
||||
| Planning | None | Stateful TODO list + nag reminder |
|
||||
| SYSTEM prompt | Generic prompt | Added "plan before executing" guidance |
|
||||
| Loop | Unchanged | Dispatch unchanged, added rounds_since_todo counter and reminder injection |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s05_todo_write/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Refactor s05_todo_write/example/hello.py: add type hints, docstrings, and a main guard` (should list 3 steps first, then execute)
|
||||
2. `Create a Python package under s05_todo_write/example/demo_pkg with __init__.py, utils.py, and tests/test_utils.py`
|
||||
3. `Review Python files under s05_todo_write/example and fix any style issues`
|
||||
|
||||
What to watch for: Was the first tool call `todo_write`? How many TODO steps were listed? Did statuses move from `pending` to `in_progress` / `completed` during execution?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
The Agent can plan now. But if a task is too large, say "refactor the entire auth module", a TODO list alone isn't enough. That task is itself a collection of dozens of subtasks that would drown in a single conversation's context.
|
||||
|
||||
→ s06 Subagent: Break large tasks into subtasks, each handled by an independent Agent with its own clean context, no cross-contamination.
|
||||
|
||||
<details>
|
||||
<summary>Dive into CC Source Code</summary>
|
||||
|
||||
CC has two task systems coexisting (`tasks.ts:133-139`):
|
||||
|
||||
- **TodoWrite (V1)**: A simple list tool, data maintained in memory AppState (`TodoWriteTool.ts:65-103`). The teaching version writes to `.tasks/current_todos.json` for observability; the real V1 does not write to disk.
|
||||
- **Task System (V2 = s12)**: File-persisted, dependency graph, concurrency locks, ownership.
|
||||
|
||||
The switch is controlled by `isTodoV2Enabled()`. In the current source: V2 is enabled by default in interactive sessions, V1 in non-interactive (SDK) sessions; setting `CLAUDE_CODE_ENABLE_TASKS` forces V2 regardless. Note the source comment "Force-enable tasks in non-interactive mode" describes the env var path's purpose, not the default branch's return semantics.
|
||||
|
||||
The teaching version omits the `activeForm` field from the real source (`utils/todo/types.ts:8-15`). CC uses it for the UI spinner to show "what's being done"; the teaching version only has terminal output and doesn't need this field.
|
||||
|
||||
The teaching version's nag reminder (3 rounds without update triggers injection) is an educational mechanism. The CC source has no fixed "3 rounds" logic; the closest is `TodoWriteTool.ts:72-107` which appends a verification nudge when 3+ todos are all completed without a verification item.
|
||||
|
||||
Core increments of the Task System over TodoWrite:
|
||||
- File persistence (Claude config directory `tasks/{taskListId}/{taskId}.json`) instead of in-memory list
|
||||
- `blockedBy` dependency graph instead of flat list
|
||||
- `proper-lockfile` concurrency safety instead of no locking
|
||||
- Four separate tools (Create/Get/Update/List) instead of one
|
||||
- TaskCreated / TaskCompleted hooks (`TaskCreateTool.ts:80-129`, `TaskUpdateTool.ts:231-260`) for external system integration
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
156
s05_todo_write/README.ja.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# s05: TodoWrite — 計画なき Agent は途中で道を外れる
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → `s05` → [s06](../s06_subagent/) → s07 → ... → s20
|
||||
|
||||
> *"計画なき agent は風の向くままに"* — まず手順を列挙してから実行。長いタスクで見落としが減る。
|
||||
>
|
||||
> **Harness レイヤー**: 計画 — Agent が行動する前に考えさせる。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
Agent に複雑なタスクを与える:「全 Python ファイルを snake_case にリネームし、テストを実行し、失敗を修正して。」
|
||||
|
||||
Agent は作業を開始する。3 つのファイルをリネーム、テストを実行、2 つの失敗を発見、修正を開始。修正しているうちに、本来の目的が「snake_case にリネーム」だったことを忘れる。テストの失敗に注意を全て持っていかれる。
|
||||
|
||||
会話が長くなるほど悪化する:ツールの結果がコンテキストを埋め続け、システムプロンプトの影響力が希釈される。10 ステップのリファクタリング:ステップ 1-3 を終えた時点で Agent は即興で動き始める。ステップ 4-10 は既に注意の外に追い出されているから。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
前章の最小フック構造を保持し、本章では新規の `todo_write` ツールとリマインダー機構に注目する。`todo_write` は実際の作業を何もしない。ファイルを読めない、コマンドを実行できない。Agent が手を動かす前に思考を整理できるようにするだけ。
|
||||
|
||||
ディスパッチ機構は変わらず、新ツールも `TOOL_HANDLERS[block.name]` を経由する。ただし、todo リマインダーのデモのため、ループにカウンターを追加した:連続 3 ラウンド `todo_write` を呼び出さないとリマインダーが注入される。
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||
**todo_write ツール**、ステータス付きのリストを受け取り、`.tasks/current_todos.json` に永続化(教育版は観察用にディスクに書き込む)、端末に進捗を表示する:
|
||||
|
||||
```python
|
||||
def run_todo_write(todos: list) -> str:
|
||||
tasks_file = TASKS_DIR / "current_todos.json"
|
||||
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
|
||||
|
||||
lines = ["\n## Current Tasks"]
|
||||
for t in todos:
|
||||
icon = {"pending": " ", "in_progress": "▸", "completed": "✓"}[t["status"]]
|
||||
lines.append(f" [{icon}] {t['content']}")
|
||||
print("\n".join(lines))
|
||||
return f"Updated {len(todos)} tasks"
|
||||
```
|
||||
|
||||
ツール定義は他の 5 つと一緒にディスパッチマップに追加される:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", ...},
|
||||
{"name": "read_file", ...},
|
||||
{"name": "write_file", ...},
|
||||
{"name": "edit_file", ...},
|
||||
{"name": "glob", ...},
|
||||
# s05: 新規追加
|
||||
{"name": "todo_write", "description": "Create and manage a task list ...",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"todos": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {"type": "string"},
|
||||
"status": {"type": "string", "enum": ["pending", "in_progress", "completed"]},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS["todo_write"] = run_todo_write
|
||||
```
|
||||
|
||||
**Nag リマインダー**、モデルが連続 3 ラウンド `todo_write` を呼び出さないとき、リマインダーが自動的に注入される(教育用機構、CC ソースコードに固定ラウンド数のロジックはない):
|
||||
|
||||
```python
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
messages.append({
|
||||
"role": "user",
|
||||
"content": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
rounds_since_todo = 0
|
||||
```
|
||||
|
||||
Agent がタスクを受け取った後の典型的な流れ:まず `todo_write` を呼び出して全手順を列挙(全て `pending`)→ 一つの手順に取り掛かり、`in_progress` に変更 → 完了したら `completed` に変更 → 次の `pending` を見る → 続行。3 ラウンド `todo_write` がない場合、次の LLM 呼び出し前にリマインダーが追加される。
|
||||
|
||||
**重要な洞察**:todo_write は Agent に**実行能力**を何も追加しない。追加するのは**計画能力**だ。
|
||||
|
||||
---
|
||||
|
||||
## s04 からの変更
|
||||
|
||||
| コンポーネント | 変更前 (s04) | 変更後 (s05) |
|
||||
|--------------|-------------|-------------|
|
||||
| ツール数 | 5 (bash, read, write, edit, glob) | 6 (+todo_write) |
|
||||
| 計画能力 | なし | ステータス付き TODO リスト + Nag リマインダー |
|
||||
| SYSTEM プロンプト | 汎用プロンプト | 「先に計画してから実行」のガイダンスを追加 |
|
||||
| ループ | 不変 | ディスパッチは不変、rounds_since_todo カウンターとリマインダー注入を追加 |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s05_todo_write/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみよう:
|
||||
|
||||
1. `Refactor s05_todo_write/example/hello.py: add type hints, docstrings, and a main guard`(まず 3 手順を列挙してから実行するはず)
|
||||
2. `Create a Python package under s05_todo_write/example/demo_pkg with __init__.py, utils.py, and tests/test_utils.py`
|
||||
3. `Review Python files under s05_todo_write/example and fix any style issues`
|
||||
|
||||
観察のポイント:最初のツール呼び出しは `todo_write` か? TODO は何手順列挙されたか? 実行中にステータスが `pending` から `in_progress` / `completed` に変わったか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
Agent は計画できるようになった。しかしタスクが大きすぎる場合、例えば「認証モジュール全体をリファクタリング」、TODO リストだけでは不十分。そのタスク自体が数十のサブタスクの集合体で、同じ会話のコンテキストに押し込めると溢れてしまう。
|
||||
|
||||
→ s06 Subagent:大きなタスクをサブタスクに分割し、それぞれを独立した Agent に任せる。それぞれが独自のクリーンなコンテキストを持ち、相互汚染がない。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードを深掘り</summary>
|
||||
|
||||
CC には二つのタスクシステムが共存している(`tasks.ts:133-139`):
|
||||
|
||||
- **TodoWrite(V1)**:シンプルなリストツール、データはメモリ AppState で管理(`TodoWriteTool.ts:65-103`)。教育版は観察用に `.tasks/current_todos.json` に書き込むが、実際の V1 はディスクに書き込まない
|
||||
- **Task System(V2 = s12)**:ファイル永続化、依存グラフ、並行ロック、ownership
|
||||
|
||||
切り替えは `isTodoV2Enabled()` で制御される。現在のソースコードの実装:対話型セッションでは V2 がデフォルトで有効、非対話型セッション(SDK)では V1 がデフォルトで有効。`CLAUDE_CODE_ENABLE_TASKS` 環境変数を設定するとセッション種別に関わらず V2 が強制有効になる。ソースコメント「Force-enable tasks in non-interactive mode」は環境変数パスの用途を説明しており、デフォルト分岐の戻り値のセマンティクスとは異なるため注意。
|
||||
|
||||
教育版は実際のソースコードにある `activeForm` フィールドを省略している(`utils/todo/types.ts:8-15`)。CC は UI スピナーに「何をしているか」を表示するために使用するが、教育版は端末出力のみでこのフィールドは不要。
|
||||
|
||||
教育版の Nag リマインダー(3 ラウンド未更新で注入)は教育用機構。CC ソースコードに固定「3 ラウンド」のロジックはなく、最も近いのは `TodoWriteTool.ts:72-107` で 3 つ以上の todo が全て完了しているのに verification 項目がない場合に verification nudge を追加する処理。
|
||||
|
||||
Task System の TodoWrite に対する核心的な増分:
|
||||
- メモリリストではなくファイル永続化(Claude 設定ディレクトリ下 `tasks/{taskListId}/{taskId}.json`)
|
||||
- 平坦なリストではなく `blockedBy` 依存グラフ
|
||||
- ロックなしではなく `proper-lockfile` による並行安全性
|
||||
- 一つのツールではなく四つの独立ツール(Create/Get/Update/List)
|
||||
- TaskCreated / TaskCompleted フック(`TaskCreateTool.ts:80-129`、`TaskUpdateTool.ts:231-260`)による外部システム統合
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
156
s05_todo_write/README.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# s05: TodoWrite — 没有计划的 Agent,做着做着就偏了
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → `s05` → [s06](../s06_subagent/) → s07 → ... → s20
|
||||
|
||||
> *"没有计划的 agent 走哪算哪"* — 先列步骤再动手,长任务更不容易漏项。
|
||||
>
|
||||
> **Harness 层**: 规划 — 让 Agent 在动手之前先想清楚。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
给 Agent 一个复杂任务:"把所有 Python 文件改成 snake_case 命名,然后跑测试,修好失败。"
|
||||
|
||||
Agent 开始干活,改了 3 个文件,跑了个测试,发现 2 个失败,开始修。修着修着,它忘了最初是"改成 snake_case",测试失败把注意力全吸走了。
|
||||
|
||||
对话越长越严重:工具结果不断填满上下文,系统提示的影响力被稀释。一个 10 步重构,做完 1-3 步就开始即兴发挥,因为 4-10 步已经被挤出注意力了。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
保留上一章的最小 hook 结构,重点看新增的 `todo_write` 工具和 reminder 机制。`todo_write` 本身不做任何实际工作,不能读文件、不能跑命令,只是让 Agent 在动手之前先理清思路。
|
||||
|
||||
dispatch 机制不变,新工具仍然走 `TOOL_HANDLERS[block.name]` 分发。但为了演示 todo reminder,循环里加了一个计数器:连续 3 轮没调 `todo_write` 就注入一条提醒。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||
**todo_write 工具**,接收一个带状态的列表,持久化到 `.tasks/current_todos.json`(教学版写盘以便观察),同时在终端显示进度:
|
||||
|
||||
```python
|
||||
def run_todo_write(todos: list) -> str:
|
||||
tasks_file = TASKS_DIR / "current_todos.json"
|
||||
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
|
||||
|
||||
lines = ["\n## Current Tasks"]
|
||||
for t in todos:
|
||||
icon = {"pending": " ", "in_progress": "▸", "completed": "✓"}[t["status"]]
|
||||
lines.append(f" [{icon}] {t['content']}")
|
||||
print("\n".join(lines))
|
||||
return f"Updated {len(todos)} tasks"
|
||||
```
|
||||
|
||||
工具定义和其他 5 个工具一起加入 dispatch map:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", ...},
|
||||
{"name": "read_file", ...},
|
||||
{"name": "write_file", ...},
|
||||
{"name": "edit_file", ...},
|
||||
{"name": "glob", ...},
|
||||
# s05: 新增一条
|
||||
{"name": "todo_write", "description": "Create and manage a task list ...",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"todos": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {"type": "string"},
|
||||
"status": {"type": "string", "enum": ["pending", "in_progress", "completed"]},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS["todo_write"] = run_todo_write
|
||||
```
|
||||
|
||||
**Nag reminder**,模型连续 3 轮没调 `todo_write` 时,自动注入一条提醒(教学版机制,CC 源码中没有这个固定轮数逻辑):
|
||||
|
||||
```python
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
messages.append({
|
||||
"role": "user",
|
||||
"content": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
rounds_since_todo = 0
|
||||
```
|
||||
|
||||
Agent 收到任务后的典型流程:先调 `todo_write` 列出所有步骤(全 `pending`)→ 做一个步骤,改成 `in_progress` → 做完改成 `completed` → 看下一个 `pending` → 继续。连续 3 轮没有调用 `todo_write` 时,循环会在下一次 LLM 调用前追加一条 reminder。
|
||||
|
||||
**关键洞察**:todo_write 不给 Agent 增加任何**执行能力**。它增加的是**规划能力**。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s04 的变更
|
||||
|
||||
| 组件 | 之前 (s04) | 之后 (s05) |
|
||||
|------|-----------|-----------|
|
||||
| 工具数量 | 5 (bash, read, write, edit, glob) | 6 (+todo_write) |
|
||||
| 规划能力 | 无 | 带状态的 TODO 列表 + nag reminder |
|
||||
| SYSTEM 提示 | 通用提示 | 加入 "先计划再执行" 引导 |
|
||||
| 循环 | 不变 | dispatch 不变,新增 rounds_since_todo 计数器和 reminder 注入 |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s05_todo_write/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Refactor s05_todo_write/example/hello.py: add type hints, docstrings, and a main guard`(先列 3 步再执行)
|
||||
2. `Create a Python package under s05_todo_write/example/demo_pkg with __init__.py, utils.py, and tests/test_utils.py`
|
||||
3. `Review Python files under s05_todo_write/example and fix any style issues`
|
||||
|
||||
观察重点:第一次工具调用是不是 `todo_write`?TODO 列了几步?执行过程中状态有没有从 `pending` 变成 `in_progress` / `completed`?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
Agent 能计划了。但如果一个任务太大,比如"重构整个认证模块",光靠 TODO 列表不够。这个任务本身就是几十个小任务的集合,放在同一个对话里会被上下文淹没。
|
||||
|
||||
s06 Subagent → 把大任务拆成子任务,每个子任务派一个独立的 Agent。它们有自己的干净上下文,不会互相污染。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
CC 中有两套任务系统并存(`tasks.ts:133-139`):
|
||||
|
||||
- **TodoWrite(V1)**:一个简单的列表工具,数据在内存 AppState 中维护(`TodoWriteTool.ts:65-103`)。教学版写盘到 `.tasks/current_todos.json` 是为了可观察性,真实 V1 不写盘
|
||||
- **Task System(V2 = s12)**:文件持久化、依赖图、并发锁、ownership
|
||||
|
||||
切换由 `isTodoV2Enabled()` 控制。当前源码的实现逻辑:交互式会话中 V2 默认启用,非交互式会话(SDK)中 V1 默认启用;设置 `CLAUDE_CODE_ENABLE_TASKS` 环境变量可强制启用 V2。注意源码注释 "Force-enable tasks in non-interactive mode" 描述的是 env var 路径的用途,和默认分支的返回值语义不同,阅读时需区分。
|
||||
|
||||
教学版省略了真实源码中的 `activeForm` 字段(`utils/todo/types.ts:8-15`)。CC 用它给 UI spinner 展示"正在做什么",教学版只有终端输出,不需要这个字段。
|
||||
|
||||
教学版的 nag reminder(3 轮未更新就注入提醒)是教学机制。CC 源码中没有固定的"3 轮"逻辑,更接近的是 `TodoWriteTool.ts:72-107` 中当 3 个以上 todo 全部完成但没有 verification 项时,追加 verification nudge。
|
||||
|
||||
Task System 相比 TodoWrite 的核心增量:
|
||||
- 文件持久化(Claude 配置目录下 `tasks/{taskListId}/{taskId}.json`)而非内存列表
|
||||
- `blockedBy` 依赖图而非平铺列表
|
||||
- `proper-lockfile` 并发安全而非无锁
|
||||
- 四个独立工具(Create/Get/Update/List)而非一个
|
||||
- TaskCreated / TaskCompleted hooks(`TaskCreateTool.ts:80-129`、`TaskUpdateTool.ts:231-260`)供外部系统集成
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v0, ja@v0 -->
|
||||
287
s05_todo_write/code.py
Normal file
@@ -0,0 +1,287 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s05: TodoWrite — add a planning tool on top of s04 hooks.
|
||||
|
||||
+---------+ +-------+ +------------------+
|
||||
| User | ---> | LLM | ---> | TOOL_HANDLERS |
|
||||
| prompt | | | | bash |
|
||||
+---------+ +---+---+ | read_file |
|
||||
^ | write_file |
|
||||
| result | edit_file |
|
||||
+---------+ glob |
|
||||
todo_write ← NEW
|
||||
+------------------+
|
||||
|
|
||||
.tasks/current_todos.json
|
||||
|
|
||||
if rounds_since_todo >= 3:
|
||||
inject <reminder>
|
||||
|
||||
Changes from s04:
|
||||
+ todo_write tool + run_todo_write() implementation
|
||||
+ Nag reminder (inject reminder after 3 rounds without todo update)
|
||||
+ SYSTEM prompt includes "plan before execute" guidance
|
||||
+ rounds_since_todo counter in agent_loop
|
||||
Loop unchanged: new tool auto-dispatches via TOOL_HANDLERS.
|
||||
|
||||
Run: python s05_todo_write/code.py
|
||||
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
|
||||
"""
|
||||
|
||||
import os, subprocess, json
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True)
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
# s05 change: SYSTEM prompt adds planning guidance
|
||||
SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Before starting any multi-step task, use todo_write to plan your steps. "
|
||||
"Update status as you go."
|
||||
)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02-s04 (unchanged): Tool Implementations
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text:
|
||||
return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s05: todo_write tool — plan only, no execution
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def run_todo_write(todos: list) -> str:
|
||||
# validate required fields
|
||||
for i, t in enumerate(todos):
|
||||
if "content" not in t or "status" not in t:
|
||||
return f"Error: todos[{i}] missing 'content' or 'status'"
|
||||
if t["status"] not in ("pending", "in_progress", "completed"):
|
||||
return f"Error: todos[{i}] has invalid status '{t['status']}'"
|
||||
tasks_file = TASKS_DIR / "current_todos.json"
|
||||
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
|
||||
lines = ["\n\033[33m## Current Tasks\033[0m"]
|
||||
for t in todos:
|
||||
icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]]
|
||||
lines.append(f" [{icon}] {t['content']}")
|
||||
print("\n".join(lines))
|
||||
return f"Updated {len(todos)} tasks"
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
# s05: new tool
|
||||
{"name": "todo_write", "description": "Create and manage a task list for your current coding session.",
|
||||
"input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}, "required": ["content", "status"]}}}, "required": ["todos"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write,
|
||||
}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s04 (unchanged): Hook System
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
HOOKS = {"UserPromptSubmit": [], "PreToolUse": [], "PostToolUse": [], "Stop": []}
|
||||
|
||||
def register_hook(event: str, callback):
|
||||
HOOKS[event].append(callback)
|
||||
|
||||
def trigger_hooks(event: str, *args):
|
||||
for callback in HOOKS[event]:
|
||||
result = callback(*args)
|
||||
if result is not None:
|
||||
return result
|
||||
return None
|
||||
|
||||
# s04 hooks preserved
|
||||
DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if="]
|
||||
|
||||
def permission_hook(block):
|
||||
"""PreToolUse: deny list check."""
|
||||
if block.name == "bash":
|
||||
for p in DENY_LIST:
|
||||
if p in block.input.get("command", ""):
|
||||
print(f"\n\033[31m⛔ Blocked: '{p}'\033[0m")
|
||||
return "Permission denied"
|
||||
return None
|
||||
|
||||
def log_hook(block):
|
||||
"""PreToolUse: log tool calls."""
|
||||
print(f"\033[90m[HOOK] {block.name}\033[0m")
|
||||
return None
|
||||
|
||||
def context_inject_hook(query: str):
|
||||
"""UserPromptSubmit: log working directory."""
|
||||
print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m")
|
||||
return None
|
||||
|
||||
def summary_hook(messages: list):
|
||||
"""Stop: print tool call count."""
|
||||
tool_count = sum(1 for m in messages
|
||||
for b in (m.get("content") if isinstance(m.get("content"), list) else [])
|
||||
if isinstance(b, dict) and b.get("type") == "tool_result")
|
||||
print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m")
|
||||
return None
|
||||
|
||||
register_hook("UserPromptSubmit", context_inject_hook)
|
||||
register_hook("PreToolUse", permission_hook)
|
||||
register_hook("PreToolUse", log_hook)
|
||||
register_hook("Stop", summary_hook)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — same as s04 + nag reminder counter
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
rounds_since_todo = 0
|
||||
|
||||
def agent_loop(messages: list):
|
||||
global rounds_since_todo
|
||||
while True:
|
||||
# s05: nag reminder — inject if model hasn't updated todos for 3 rounds
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
messages.append({"role": "user",
|
||||
"content": "<reminder>Update your todos.</reminder>"})
|
||||
rounds_since_todo = 0
|
||||
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
force = trigger_hooks("Stop", messages)
|
||||
if force:
|
||||
messages.append({"role": "user", "content": force})
|
||||
continue
|
||||
return
|
||||
|
||||
rounds_since_todo += 1
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
|
||||
# s05: reset nag counter when todo_write is called
|
||||
if block.name == "todo_write":
|
||||
rounds_since_todo = 0
|
||||
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": output})
|
||||
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s05: TodoWrite — plan before execute, nag if you forget")
|
||||
print("Type a question, press Enter. Type q to quit.\n")
|
||||
|
||||
history = []
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms05 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
trigger_hooks("UserPromptSubmit", query)
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
6
s05_todo_write/example/hello.py
Normal file
@@ -0,0 +1,6 @@
|
||||
def greet(name):
|
||||
message = "Hello, " + name
|
||||
print(message)
|
||||
|
||||
|
||||
greet("Claude")
|
||||
93
s05_todo_write/images/todo-overview.en.svg
Normal file
@@ -0,0 +1,93 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 420" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="800" height="420" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">TodoWrite — Loop Unchanged, One More Tool Auto-Dispatched</text>
|
||||
|
||||
<!-- ===== s04 Preserved (gray) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s04 Preserved</text>
|
||||
|
||||
<!-- messages -->
|
||||
<rect x="40" y="90" width="100" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="117" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="140" y1="112" x2="188" y2="112" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="190" y="86" width="110" height="52" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="245" y="108" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="245" y="126" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- No → Return -->
|
||||
<line x1="245" y1="138" x2="245" y2="162" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="258" y="156" fill="#16a34a" font-size="9" font-weight="600">No</text>
|
||||
<rect x="195" y="164" width="100" height="28" rx="14" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="245" y="182" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">Return Result</text>
|
||||
|
||||
<!-- Yes → PreToolUse -->
|
||||
<line x1="300" y1="112" x2="348" y2="112" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="320" y="104" fill="#d97706" font-size="9" font-weight="600">Yes</text>
|
||||
|
||||
<!-- PreToolUse (s04) -->
|
||||
<rect x="350" y="88" width="100" height="48" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="400" y="110" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">trigger_hooks</text>
|
||||
<text x="400" y="124" fill="#166534" font-size="8" text-anchor="middle">PreToolUse</text>
|
||||
|
||||
<!-- → TOOL_HANDLERS -->
|
||||
<line x1="450" y1="112" x2="498" y2="112" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== s05 New: todo_write ===== -->
|
||||
<!-- TOOL_HANDLERS box (expanded, includes todo_write) -->
|
||||
<rect x="500" y="74" width="120" height="140" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="560" y="94" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- s04 preserved tools -->
|
||||
<rect x="512" y="102" width="96" height="22" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="560" y="117" fill="#1e3a5f" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
|
||||
<rect x="512" y="130" width="96" height="22" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="560" y="145" fill="#1e3a5f" font-size="9" text-anchor="middle">edit · glob</text>
|
||||
|
||||
<!-- s05 new: todo_write -->
|
||||
<rect x="512" y="158" width="96" height="22" rx="4" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="560" y="173" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">todo_write</text>
|
||||
<text x="560" y="232" fill="#16a34a" font-size="11" font-weight="600" text-anchor="middle">s05 New</text>
|
||||
|
||||
<text x="560" y="196" fill="#64748b" font-size="8" text-anchor="middle">→ .tasks/current_todos.json</text>
|
||||
|
||||
<!-- Loop back -->
|
||||
<path d="M 620 112 L 660 112 L 660 260 L 90 260 L 90 134" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="370" y="280" fill="#64748b" font-size="10" text-anchor="middle">Results appended to messages[], loop continues</text>
|
||||
|
||||
<!-- ===== Nag Reminder ===== -->
|
||||
<rect x="100" y="310" width="600" height="56" rx="8" fill="#fffbeb" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="120" y="332" fill="#92400e" font-size="11" font-weight="700">Nag Reminder</text>
|
||||
<text x="120" y="352" fill="#92400e" font-size="10">Model hasn't called todo_write for 3 rounds → auto-inject <reminder>Update your todos.</reminder></text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="100" y="384" width="600" height="28" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="120" y="392" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="140" y="402" fill="#334155" font-size="10">s04 Preserved (loop, hooks, 5 base tools)</text>
|
||||
<rect x="400" y="392" width="12" height="12" rx="2" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="420" y="402" fill="#334155" font-size="10">s05 New (todo_write + nag reminder)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.6 KiB |
93
s05_todo_write/images/todo-overview.ja.svg
Normal file
@@ -0,0 +1,93 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 420" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="800" height="420" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">TodoWrite — ループ不変、ツール一つ追加で自動ディスパッチ</text>
|
||||
|
||||
<!-- ===== s04 保持(灰色) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s04 保持</text>
|
||||
|
||||
<!-- messages -->
|
||||
<rect x="40" y="90" width="100" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="117" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="140" y1="112" x2="188" y2="112" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="190" y="86" width="110" height="52" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="245" y="108" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="245" y="126" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- No → 返却 -->
|
||||
<line x1="245" y1="138" x2="245" y2="162" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="258" y="156" fill="#16a34a" font-size="9" font-weight="600">No</text>
|
||||
<rect x="195" y="164" width="100" height="28" rx="14" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="245" y="182" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
|
||||
<!-- Yes → PreToolUse -->
|
||||
<line x1="300" y1="112" x2="348" y2="112" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="320" y="104" fill="#d97706" font-size="9" font-weight="600">Yes</text>
|
||||
|
||||
<!-- PreToolUse (s04) -->
|
||||
<rect x="350" y="88" width="100" height="48" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="400" y="110" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">trigger_hooks</text>
|
||||
<text x="400" y="124" fill="#166534" font-size="8" text-anchor="middle">PreToolUse</text>
|
||||
|
||||
<!-- → TOOL_HANDLERS -->
|
||||
<line x1="450" y1="112" x2="498" y2="112" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== s05 新規:todo_write ===== -->
|
||||
<!-- TOOL_HANDLERS 枠(拡大、todo_write を含む) -->
|
||||
<rect x="500" y="74" width="120" height="140" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="560" y="94" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- s04 保持のツール -->
|
||||
<rect x="512" y="102" width="96" height="22" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="560" y="117" fill="#1e3a5f" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
|
||||
<rect x="512" y="130" width="96" height="22" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="560" y="145" fill="#1e3a5f" font-size="9" text-anchor="middle">edit · glob</text>
|
||||
|
||||
<!-- s05 新規:todo_write -->
|
||||
<rect x="512" y="158" width="96" height="22" rx="4" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="560" y="173" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">todo_write</text>
|
||||
<text x="560" y="232" fill="#16a34a" font-size="11" font-weight="600" text-anchor="middle">s05 新規</text>
|
||||
|
||||
<text x="560" y="196" fill="#64748b" font-size="8" text-anchor="middle">→ .tasks/current_todos.json</text>
|
||||
|
||||
<!-- ループバック -->
|
||||
<path d="M 620 112 L 660 112 L 660 260 L 90 260 L 90 134" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="370" y="280" fill="#64748b" font-size="10" text-anchor="middle">結果を messages[] に追加、ループ継続</text>
|
||||
|
||||
<!-- ===== Nag リマインダー ===== -->
|
||||
<rect x="100" y="310" width="600" height="56" rx="8" fill="#fffbeb" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="120" y="332" fill="#92400e" font-size="11" font-weight="700">Nag リマインダー(催促機構)</text>
|
||||
<text x="120" y="352" fill="#92400e" font-size="10">モデルが連続 3 ラウンド todo_write 未呼び出し → 自動注入 <reminder>Update your todos.</reminder></text>
|
||||
|
||||
<!-- 凡例 -->
|
||||
<rect x="100" y="384" width="600" height="28" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="120" y="392" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="140" y="402" fill="#334155" font-size="10">s04 保持(ループ、フック、5 つの基本ツール)</text>
|
||||
<rect x="400" y="392" width="12" height="12" rx="2" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="420" y="402" fill="#334155" font-size="10">s05 新規(todo_write + Nag リマインダー)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.7 KiB |
93
s05_todo_write/images/todo-overview.svg
Normal file
@@ -0,0 +1,93 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 420" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="800" height="420" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">TodoWrite — 循环不变,多一个工具自动分发</text>
|
||||
|
||||
<!-- ===== s04 保留(灰色) ===== -->
|
||||
<text x="50" y="76" fill="#94a3b8" font-size="11" font-weight="600">s04 保留</text>
|
||||
|
||||
<!-- messages -->
|
||||
<rect x="40" y="90" width="100" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="117" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="140" y1="112" x2="188" y2="112" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="190" y="86" width="110" height="52" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="245" y="108" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="245" y="126" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- 否 → 返回 -->
|
||||
<line x1="245" y1="138" x2="245" y2="162" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="258" y="156" fill="#16a34a" font-size="9" font-weight="600">否</text>
|
||||
<rect x="195" y="164" width="100" height="28" rx="14" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="245" y="182" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
|
||||
<!-- 是 → PreToolUse -->
|
||||
<line x1="300" y1="112" x2="348" y2="112" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="320" y="104" fill="#d97706" font-size="9" font-weight="600">是</text>
|
||||
|
||||
<!-- PreToolUse (s04) -->
|
||||
<rect x="350" y="88" width="100" height="48" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="400" y="110" fill="#166534" font-size="9" font-weight="600" text-anchor="middle">trigger_hooks</text>
|
||||
<text x="400" y="124" fill="#166534" font-size="8" text-anchor="middle">PreToolUse</text>
|
||||
|
||||
<!-- → TOOL_HANDLERS -->
|
||||
<line x1="450" y1="112" x2="498" y2="112" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== s05 新增:todo_write ===== -->
|
||||
<!-- TOOL_HANDLERS 框(扩大,包含 todo_write) -->
|
||||
<rect x="500" y="74" width="120" height="140" rx="10" fill="#f0fdf4" stroke="#16a34a" stroke-width="2" stroke-dasharray="6,3"/>
|
||||
<text x="560" y="94" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- s04 保留的工具 -->
|
||||
<rect x="512" y="102" width="96" height="22" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="560" y="117" fill="#1e3a5f" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
|
||||
<rect x="512" y="130" width="96" height="22" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="560" y="145" fill="#1e3a5f" font-size="9" text-anchor="middle">edit · glob</text>
|
||||
|
||||
<!-- s05 新增:todo_write -->
|
||||
<rect x="512" y="158" width="96" height="22" rx="4" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="560" y="173" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">todo_write</text>
|
||||
<text x="560" y="232" fill="#16a34a" font-size="11" font-weight="600" text-anchor="middle">s05 新增</text>
|
||||
|
||||
<text x="560" y="196" fill="#64748b" font-size="8" text-anchor="middle">→ .tasks/current_todos.json</text>
|
||||
|
||||
<!-- 回环 -->
|
||||
<path d="M 620 112 L 660 112 L 660 260 L 90 260 L 90 134" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="370" y="280" fill="#64748b" font-size="10" text-anchor="middle">结果追加到 messages[],循环继续</text>
|
||||
|
||||
<!-- ===== Nag Reminder ===== -->
|
||||
<rect x="100" y="310" width="600" height="56" rx="8" fill="#fffbeb" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="120" y="332" fill="#92400e" font-size="11" font-weight="700">Nag Reminder(催更机制)</text>
|
||||
<text x="120" y="352" fill="#92400e" font-size="10">模型连续 3 轮没调 todo_write → 自动注入 <reminder>Update your todos.</reminder></text>
|
||||
|
||||
<!-- 图例 -->
|
||||
<rect x="100" y="384" width="600" height="28" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="120" y="392" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="140" y="402" fill="#334155" font-size="10">s04 保留(循环、钩子、5 个基础工具)</text>
|
||||
<rect x="400" y="392" width="12" height="12" rx="2" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="420" y="402" fill="#334155" font-size="10">s05 新增(todo_write + nag reminder)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.6 KiB |
189
s06_subagent/README.en.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# s06: Subagent — Break Large Tasks into Small Ones with Clean Context
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → `s06` → [s07](../s07_skill_loading/) → s08 → ... → s20
|
||||
|
||||
> *"Break large tasks small, each with clean context"* — Subagent uses an independent messages[], no pollution in the main conversation.
|
||||
>
|
||||
> **Harness Layer**: Sub-Agent — Context isolation, attention doesn't drift.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
The Agent is fixing a bug. It reads 30 files to trace the call chain, chatting for 60 rounds along the way. The messages list grows to 120 entries, most of which are intermediate steps from "tracing the call chain" — unrelated to the final goal of "fixing the bug."
|
||||
|
||||
These intermediate steps occupy context space, making the Agent increasingly "forgetful" — it can no longer remember what the original problem was.
|
||||
|
||||
Think of it differently: when you fix a bug, you'd "open a new terminal" to trace the call chain. When done, close the terminal, write the result into your notes, and return to the original terminal to keep fixing. The Agent needs this ability too — **open an independent sub-process, give it an independent message list, let it focus on one thing.**
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
The minimal hook structure and `todo_write` tool from the previous chapter are preserved; this chapter focuses on the new `task` tool. When called, it spawns a sub-Agent with a fresh `messages[]`, running its own loop, and returning only a summary text to the main Agent. Conversation context is discarded, but file system side effects (writes, edits, commands) remain in the working directory.
|
||||
|
||||
The sub-Agent's tools are restricted: it has bash/read/write/edit/glob, but no task, preventing recursive spawning. The sub-Agent's tool calls still go through permission hooks; context isolation does not bypass security.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
**spawn_subagent**, gives the sub-Agent a fresh messages list, runs its own loop, returns only the conclusion:
|
||||
|
||||
```python
|
||||
def spawn_subagent(description: str) -> str:
|
||||
# Sub-Agent tools: base tools, but no task (no recursion)
|
||||
sub_tools = [...]
|
||||
messages = [{"role": "user", "content": description}] # fresh messages[]
|
||||
|
||||
for _ in range(30): # safety limit
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SUB_SYSTEM,
|
||||
messages=messages, tools=sub_tools, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({... "content": str(blocked)})
|
||||
continue
|
||||
handler = SUB_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
results.append({... "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
# Return only the final text conclusion, all intermediate steps discarded
|
||||
return extract_text(messages[-1]["content"])
|
||||
```
|
||||
|
||||
The main Agent calls it just like any other tool:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", ...},
|
||||
{"name": "read_file", ...},
|
||||
{"name": "write_file", ...},
|
||||
{"name": "edit_file", ...},
|
||||
{"name": "glob", ...},
|
||||
{"name": "todo_write", ...},
|
||||
# s06: new task tool
|
||||
{"name": "task",
|
||||
"description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.",
|
||||
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS["task"] = spawn_subagent
|
||||
```
|
||||
|
||||
Three key design decisions:
|
||||
|
||||
| Decision | Choice | Reason |
|
||||
|----------|--------|--------|
|
||||
| Context isolation | Fresh `messages[]` | Sub-Agent's intermediate steps don't pollute main Agent's context |
|
||||
| Return only conclusion | `extract_text(last_message)` | Not returning the entire messages list |
|
||||
| No recursion | Sub-Agent has no task tool | Prevents sub-Agent from spawning further sub-Agents |
|
||||
| Security not bypassed | Sub-Agent tool calls go through PreToolUse hook | Context isolation does not mean permission isolation |
|
||||
|
||||
The dispatch mechanism is unchanged; the task tool is routed through `TOOL_HANDLERS[block.name]`. The sub-Agent has its own `SUB_SYSTEM` prompt, explicitly instructing "complete the task, do not delegate further."
|
||||
|
||||
---
|
||||
|
||||
## Changes from s05
|
||||
|
||||
| Component | Before (s05) | After (s06) |
|
||||
|-----------|-------------|-------------|
|
||||
| Tool count | 6 (bash, read, write, edit, glob, todo_write) | 7 (+task) |
|
||||
| New function | — | spawn_subagent (independent messages[] + 30-round safety limit) |
|
||||
| Context isolation | Everything in the main conversation | Sub-Agent uses fresh messages[] |
|
||||
| Loop | Unchanged | Dispatch unchanged, sub-Agent has independent SUB_SYSTEM and hook-protected loop |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s06_subagent/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Use a subtask to find what testing framework this project uses` (sub-Agent reads files, main Agent receives only the conclusion)
|
||||
2. `Delegate: read all .py files in agents/ and summarize what each one does`
|
||||
3. `Use a task to create s06_subagent/example/string_tools.py with a slugify(text: str) function, then verify it from the parent agent`
|
||||
|
||||
What to watch for: Do `[Subagent spawned]` / `[Subagent done]` appear? Do sub-Agent tool calls print as `[sub] ...`? Does the parent Agent continue with only the summary returned by the sub-Agent?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
The Agent can now break tasks apart. But different tasks require different knowledge: editing frontend components needs React conventions, writing SQL needs table schemas. Stuffing all this knowledge into the system prompt would blow up the context.
|
||||
|
||||
→ s07 Skill Loading: Inject skills on demand instead of piling documents into the system prompt. Load only when needed, as natural as reading a file.
|
||||
|
||||
<details>
|
||||
<summary>Dive into CC Source Code</summary>
|
||||
|
||||
> The following is based on a complete analysis of CC source code `AgentTool.tsx`, `runAgent.ts`, `forkSubagent.ts`, and `forkedAgent.ts`.
|
||||
|
||||
### 1. Not One Pattern, but Three
|
||||
|
||||
The teaching version covers only "fresh messages[]". CC actually has three execution modes:
|
||||
|
||||
| Mode | Trigger | Context |
|
||||
|------|---------|---------|
|
||||
| **Normal Subagent** | `subagent_type` specified (normal path) | Truly fresh messages[], only the prompt |
|
||||
| **Fork Subagent** | No `subagent_type`, fork gate enabled | Constructs cache-friendly prefix via `buildForkedMessages()`, shares prompt cache |
|
||||
| **General-Purpose** | No `subagent_type`, fork gate disabled | Same as Normal |
|
||||
|
||||
### 2. Fork Mode: Sharing Prompt Cache
|
||||
|
||||
This is a core concept the teaching version omits. Fork mode (`forkSubagent.ts:60-71`) doesn't create a fresh context. Instead, it constructs a cache-friendly message prefix via `buildForkedMessages()` (`forkSubagent.ts:107-168`), preserving the parent assistant message and generating placeholder tool results. The goal isn't isolation, but making the Anthropic API's prompt cache hit: parent and child Agent's system prompt, tools, and message prefix are byte-identical, so the API doesn't need to recompute.
|
||||
|
||||
Five key components for cache hit (`forkedAgent.ts:57-68`): system prompt, tools, model, message prefix, thinking config, must be byte-identical.
|
||||
|
||||
### 3. Context Isolation's Precise Granularity
|
||||
|
||||
`createSubagentContext()` (`forkedAgent.ts:345-462`) creates the sub-Agent's `ToolUseContext`:
|
||||
|
||||
| Field | Behavior |
|
||||
|-------|----------|
|
||||
| `abortController` | New child controller; parent abort propagates down |
|
||||
| `setAppState` | Default no-op; but sync agents share via `shareSetAppState` (`runAgent.ts:697-714`) |
|
||||
| `readFileState` | **Cloned from parent** (avoids re-reading same files) |
|
||||
| `queryTracking` | New chainId, `depth = parentDepth + 1` |
|
||||
|
||||
The sub-Agent isn't fully isolated: file read state is shared. The degree of UI and notification isolation varies by execution path (sync/async/fork/teammate differ).
|
||||
|
||||
### 4. Recursive Fork Protection
|
||||
|
||||
The teaching version uses "sub-Agent has no task tool" for recursion protection. The real implementation is more nuanced: `isInForkChild()` (`forkSubagent.ts:78-89`) checks for `FORK_BOILERPLATE_TAG` in history. But `constants/tools.ts:36-46` defaults `Agent` to all agents' disabled set (with `USER_TYPE === 'ant'` exception); `forkSubagent.ts:73-89` has fork-child-specific recursion protection; `agentToolUtils.ts:100-110` has special allowances in teammate scenarios. Not simply "no further sub-Agents."
|
||||
|
||||
### 5. Permission Bubbling
|
||||
|
||||
Fork Agent's `permissionMode: 'bubble'` (`forkSubagent.ts:67`) means the sub-Agent's permission prompts bubble up to the parent terminal: the user approves sub-Agent operations in the main terminal.
|
||||
|
||||
### 6. Async vs Sync
|
||||
|
||||
The teaching version only shows synchronous sub-Agents (parent waits for child to finish). CC also supports async paths (`AgentTool.tsx:686-764`): when `run_in_background: true`, the sub-Agent launches asynchronously, returning `{ status: 'async_launched' }` immediately to the parent, and notifies the parent when complete. Actual triggers go beyond `run_in_background`, including auto-background, assistant force async, and coordinator/proactive paths.
|
||||
|
||||
### Teaching Version Simplifications Are Intentional
|
||||
|
||||
- Three modes → one (fresh messages): conceptually clear
|
||||
- Prompt cache sharing → omitted: teaching version doesn't involve API-layer optimization
|
||||
- Recursive fork protection → simplified to "sub-Agent has no task tool"
|
||||
- Async → omitted (left for s13): s06 focuses on the synchronous model first
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
189
s06_subagent/README.ja.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# s06: Subagent — 大きなタスクを分割、それぞれがクリーンなコンテキストを取得
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → `s06` → [s07](../s07_skill_loading/) → s08 → ... → s20
|
||||
|
||||
> *"大きなタスクは小さく、小さなタスクごとにクリーンなコンテキスト"* — Subagent は独立した messages[] を使い、メイン会話を汚染しない。
|
||||
>
|
||||
> **Harness レイヤー**: サブエージェント — コンテキストの隔離、注意の散漫を防ぐ。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
Agent がバグを修正している。呼び出しチェーンを追跡するために 30 のファイルを読み、途中で 60 ラウンドやり取りした。messages リストは 120 件に膨らみ、その大部分は「呼び出しチェーンの追跡」という中間過程 — 「バグ修正」という最終目標とは無関係。
|
||||
|
||||
この中間過程がコンテキストの席を占め、Agent はますます「健忘」になる — 最初の問題が何だったか覚えていられない。
|
||||
|
||||
別の見方をすると:バグを修正するとき、あなたは「新しいターミナルを開いて」呼び出しチェーンを追跡するだろう。追跡が終わったらターミナルを閉じ、結果をメモに書き、元のターミナルに戻ってバグ修正を続ける。Agent にもこの能力が必要 — **独立したサブプロセスを開き、独立したメッセージリストを与え、一つのことに集中させる。**
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
前章の最小フック構造と `todo_write` ツールを保持し、本章は新規の `task` ツールに注目する。呼び出されると、サブエージェントを spawn する。新しい `messages[]` を持ち、自分自身のループを実行し、終了後に要約テキストのみをメイン Agent に返す。会話コンテキストは破棄されるが、ファイルシステムの副作用(書き込み、編集、コマンド実行)は作業ディレクトリに残る。
|
||||
|
||||
サブエージェントのツールは制限される:bash/read/write/edit/glob を持つが、task はない。再帰 spawn を防止する。サブエージェントのツール呼び出しも権限フックを経由する。コンテキスト分離は権限のバイパスではない。
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||
**spawn_subagent**、サブエージェントに新しいメッセージリストを与え、自分自身のループを実行し、結論のみを返す:
|
||||
|
||||
```python
|
||||
def spawn_subagent(description: str) -> str:
|
||||
# サブエージェントのツール:基本ツールのみ、task なし(再帰禁止)
|
||||
sub_tools = [...]
|
||||
messages = [{"role": "user", "content": description}] # 新規 messages[]
|
||||
|
||||
for _ in range(30): # safety limit
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SUB_SYSTEM,
|
||||
messages=messages, tools=sub_tools, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({... "content": str(blocked)})
|
||||
continue
|
||||
handler = SUB_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
results.append({... "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
# 最後のテキスト結論のみを返す、中間過程はすべて破棄
|
||||
return extract_text(messages[-1]["content"])
|
||||
```
|
||||
|
||||
メイン Agent の呼び出しは、他のツールと同じ:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", ...},
|
||||
{"name": "read_file", ...},
|
||||
{"name": "write_file", ...},
|
||||
{"name": "edit_file", ...},
|
||||
{"name": "glob", ...},
|
||||
{"name": "todo_write", ...},
|
||||
# s06: 新規 task ツール
|
||||
{"name": "task",
|
||||
"description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.",
|
||||
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS["task"] = spawn_subagent
|
||||
```
|
||||
|
||||
三つの重要な設計決定:
|
||||
|
||||
| 決定 | 選択 | 理由 |
|
||||
|------|------|------|
|
||||
| コンテキスト隔離 | 新規 `messages[]` | サブエージェントの中間過程がメイン Agent のコンテキストを汚染しない |
|
||||
| 結論のみ返却 | `extract_text(last_message)` | messages リスト全体を返すのではない |
|
||||
| 再帰禁止 | サブエージェントに task ツールなし | サブエージェントがさらにサブエージェントを spawn するのを防止 |
|
||||
| セキュリティのバイパスなし | サブエージェントのツール呼び出しも PreToolUse フックを経由 | コンテキスト分離は権限分離ではない |
|
||||
|
||||
ディスパッチ機構は変わらず、task ツールは `TOOL_HANDLERS[block.name]` を経由する。サブエージェントは独立した `SUB_SYSTEM` プロンプトを持ち、「タスクを完了し、さらに委託しない」と明示される。
|
||||
|
||||
---
|
||||
|
||||
## s05 からの変更
|
||||
|
||||
| コンポーネント | 変更前 (s05) | 変更後 (s06) |
|
||||
|--------------|-------------|-------------|
|
||||
| ツール数 | 6 (bash, read, write, edit, glob, todo_write) | 7 (+task) |
|
||||
| 新規関数 | — | spawn_subagent(独立 messages[] + 30 ラウンド安全制限) |
|
||||
| コンテキスト隔離 | すべてメイン会話内 | サブエージェントが新規 messages[] を使用 |
|
||||
| ループ | 不変 | ディスパッチは不変、サブエージェントに独立した SUB_SYSTEM とフック保護されたループ |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s06_subagent/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみよう:
|
||||
|
||||
1. `Use a subtask to find what testing framework this project uses`(サブエージェントがファイルを読み、メイン Agent は結論のみ受け取る)
|
||||
2. `Delegate: read all .py files in agents/ and summarize what each one does`
|
||||
3. `Use a task to create s06_subagent/example/string_tools.py with a slugify(text: str) function, then verify it from the parent agent`
|
||||
|
||||
観察のポイント:`[Subagent spawned]` / `[Subagent done]` が表示されるか? サブエージェントのツール呼び出しが `[sub] ...` として出力されるか? 親 Agent はサブエージェントが返した要約だけを受け取って続行するか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
Agent はタスクを分割できるようになった。しかし各タスクに必要な知識は異なる。フロントエンドコンポーネントの変更には React 規約が必要で、SQL を書くにはテーブル構造を知る必要がある。これらの知識をすべて system prompt に詰め込むと、コンテキストが溢れてしまう。
|
||||
|
||||
→ s07 Skill Loading:スキルをオンデマンドで注入する。system prompt にドキュメントを積み上げるのではなく、必要なときだけ読み込む。ファイルを読むのと同じくらい自然に。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードを深掘り</summary>
|
||||
|
||||
> 以下は CC ソースコード `AgentTool.tsx`、`runAgent.ts`、`forkSubagent.ts`、`forkedAgent.ts` の完全分析に基づく。
|
||||
|
||||
### 一、一つのパターンではなく三つ
|
||||
|
||||
教育版は「新規 messages[]」のみを取り上げる。CC には実際に三つの実行モードがある:
|
||||
|
||||
| モード | トリガー | コンテキスト |
|
||||
|--------|---------|-------------|
|
||||
| **Normal Subagent** | `subagent_type` 指定時(normal path) | 新規 messages[]、プロンプトのみ |
|
||||
| **Fork Subagent** | `subagent_type` 未指定、fork gate 有効時 | `buildForkedMessages()` でキャッシュフレンドリーなプレフィックスを構築、プロンプトキャッシュを共有 |
|
||||
| **General-Purpose** | `subagent_type` 未指定、fork gate 無効時 | Normal と同じ |
|
||||
|
||||
### 二、Fork モード:プロンプトキャッシュの共有のため
|
||||
|
||||
これは教育版にはない核心概念。Fork モード(`forkSubagent.ts:60-71`)は新規コンテキストを作成せず、`buildForkedMessages()`(`forkSubagent.ts:107-168`)でキャッシュフレンドリーなメッセージプレフィックスを構築する。親の assistant message を保持し、placeholder tool results を生成する。目的は隔離ではなく、Anthropic API のプロンプトキャッシュをヒットさせること:親子 Agent の system prompt、tools、messages プレフィックスがバイトレベルで一致するため、API 側で再計算が不要になる。
|
||||
|
||||
キャッシュヒットの五つの重要コンポーネント(`forkedAgent.ts:57-68`):system prompt、tools、model、messages プレフィックス、thinking config、バイトレベルで一致する必要がある。
|
||||
|
||||
### 三、コンテキスト隔離の精密な粒度
|
||||
|
||||
`createSubagentContext()`(`forkedAgent.ts:345-462`)はサブエージェントの `ToolUseContext` を作成:
|
||||
|
||||
| フィールド | 挙動 |
|
||||
|-----------|------|
|
||||
| `abortController` | 新しい子コントローラ、親の abort は下に伝播 |
|
||||
| `setAppState` | デフォルトは no-op、ただし sync agent は `shareSetAppState` で共有(`runAgent.ts:697-714`) |
|
||||
| `readFileState` | **親からクローン**(同じファイルの再読み込みを回避) |
|
||||
| `queryTracking` | 新しい chainId、`depth = parentDepth + 1` |
|
||||
|
||||
サブエージェントは完全に隔離されているわけではない。ファイル読み取り状態は共有される。UI と通知の隔離度は実行パスにより異なる(sync/async/fork/teammate でそれぞれ異なる)。
|
||||
|
||||
### 四、再帰 Fork 防護
|
||||
|
||||
教育版は「サブエージェントに task ツールなし」で再帰防止を表現する。実際の実装はより精密:`isInForkChild()`(`forkSubagent.ts:78-89`)が会話履歴内の `FORK_BOILERPLATE_TAG` をチェックする。しかし `constants/tools.ts:36-46` では `Agent` ツールが全エージェントの無効セットにデフォルト設定(`USER_TYPE === 'ant'` 時は例外)、`forkSubagent.ts:73-89` は fork child 向けの専用再帰保護があり、`agentToolUtils.ts:100-110` は teammate シナリオで特別な許可がある。単純な「サブエージェントの再 spawn 禁止」ではない。
|
||||
|
||||
### 五、Permission Bubbling
|
||||
|
||||
Fork Agent の `permissionMode: 'bubble'`(`forkSubagent.ts:67`)は、サブエージェントの権限プロンプトが親ターミナルにバブルアップすることを意味する。ユーザーはメインターミナルでサブエージェントの操作を承認する。
|
||||
|
||||
### 六、Async vs Sync
|
||||
|
||||
教育版は同期サブエージェントのみ(親が子の完了を待つ)を示す。CC は非同期パスもサポート(`AgentTool.tsx:686-764`):`run_in_background: true` の場合、サブエージェントは非同期で起動し、`{ status: 'async_launched' }` を直ちに親に返し、完了時に通知機構で親に知らせる。実際のトリガーは `run_in_background` だけでなく、auto-background、assistant force async、coordinator/proactive パスもある。
|
||||
|
||||
### 教育版の簡略化は意図的
|
||||
|
||||
- 三つのモード → 一つ(新規 messages):概念的に明確
|
||||
- プロンプトキャッシュ共有 → 省略:教育版は API 層の最適化を扱わない
|
||||
- 再帰 fork 防護 → 「サブエージェントに task ツールなし」に簡略化
|
||||
- Async → 省略(s13 に委ねる):s06 はまず同期モデルを理解する
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
193
s06_subagent/README.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# s06: Subagent — 大任务拆小,每个拿到的都是干净上下文
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → `s06` → [s07](../s07_skill_loading/) → s08 → ... → s20
|
||||
|
||||
> *"大任务拆小, 每个小任务干净的上下文"* — Subagent 用独立 messages[], 不污染主对话。
|
||||
>
|
||||
> **Harness 层**: 子 Agent — 上下文隔离, 注意力不漂移。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
Agent 在修一个 bug。它读了 30 个文件来追踪调用链,中间聊了 60 轮。messages 列表涨到 120 条,其中大部分是"追踪调用链"的中间过程,和"修 bug"这个最终目标无关。
|
||||
|
||||
这些中间过程占着上下文位置,让 Agent 越来越"健忘",它记不住最初的问题是什么了。
|
||||
|
||||
换个角度:你修 bug 的时候,会"开一个新终端"来追踪调用链。追踪完了,终端关掉,结果写进笔记,回到原来的终端继续修 bug。Agent 也需要这个能力:开一个独立的子进程,给它一个独立的消息列表,让它专心做一件事。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
保留上一章的最小 hook 结构和 `todo_write` 工具,本章重点转向新增的 `task` 工具。调用它时,spawn 一个子 Agent,拥有全新的 `messages[]`,跑自己的循环,结束后只把摘要文本回传给主 Agent。对话上下文被丢弃,但文件系统的副作用(写文件、改文件、跑命令)保留在工作目录中。
|
||||
|
||||
子 Agent 的工具受限:有 bash/read/write/edit/glob,但没有 task,不能递归 spawn 新的子 Agent。子 Agent 的工具调用仍经过权限 hook,安全策略不因上下文隔离而跳过。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||
**spawn_subagent**,给子 Agent 一个全新的 messages 列表,跑自己的循环,只回传结论:
|
||||
|
||||
```python
|
||||
def spawn_subagent(description: str) -> str:
|
||||
# 子 Agent 的工具:基础工具,但没有 task(禁止递归)
|
||||
sub_tools = [
|
||||
{"name": "bash", ...}, {"name": "read_file", ...},
|
||||
{"name": "write_file", ...}, {"name": "edit_file", ...},
|
||||
{"name": "glob", ...},
|
||||
]
|
||||
messages = [{"role": "user", "content": description}] # 全新 messages[]
|
||||
|
||||
for _ in range(30): # safety limit
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SUB_SYSTEM,
|
||||
messages=messages, tools=sub_tools, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({... "content": str(blocked)})
|
||||
continue
|
||||
handler = SUB_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
results.append({... "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
# 只返回最后的文本结论,中间过程全部丢弃
|
||||
return extract_text(messages[-1]["content"])
|
||||
```
|
||||
|
||||
主 Agent 调用时,跟调其他工具一样:
|
||||
|
||||
```python
|
||||
TOOLS = [
|
||||
{"name": "bash", ...},
|
||||
{"name": "read_file", ...},
|
||||
{"name": "write_file", ...},
|
||||
{"name": "edit_file", ...},
|
||||
{"name": "glob", ...},
|
||||
{"name": "todo_write", ...},
|
||||
# s06: 新增 task 工具
|
||||
{"name": "task",
|
||||
"description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.",
|
||||
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS["task"] = spawn_subagent
|
||||
```
|
||||
|
||||
三个关键设计决策:
|
||||
|
||||
| 决策 | 选择 | 原因 |
|
||||
|------|------|------|
|
||||
| 上下文隔离 | 全新 `messages[]` | 子 Agent 的中间过程不污染主 Agent 的上下文 |
|
||||
| 只回传结论 | `extract_text(last_message)` | 不是回传整个 messages 列表 |
|
||||
| 禁止递归 | 子 Agent 无 task 工具 | 防止子 Agent 再 spawn 新的子 Agent |
|
||||
| 安全策略不跳过 | 子 Agent 工具调用也走 PreToolUse hook | 上下文隔离不代表权限隔离 |
|
||||
|
||||
dispatch 机制不变,task 工具通过 `TOOL_HANDLERS[block.name]` 分发。子 Agent 有独立的 `SUB_SYSTEM` 提示,明确要求"直接完成任务,不要再委派"。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s05 的变更
|
||||
|
||||
| 组件 | 之前 (s05) | 之后 (s06) |
|
||||
|------|-----------|-----------|
|
||||
| 工具数量 | 6 (bash, read, write, edit, glob, todo_write) | 7 (+task) |
|
||||
| 新函数 | — | spawn_subagent(独立 messages[] + 30 轮安全限制) |
|
||||
| 上下文隔离 | 全部在主对话中 | 子 Agent 用全新的 messages[] |
|
||||
| 循环 | 不变 | dispatch 不变,子 Agent 有独立 SUB_SYSTEM 和 hook 保护的循环 |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s06_subagent/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Use a subtask to find what testing framework this project uses`(子 Agent 去读文件,主 Agent 只收结论)
|
||||
2. `Delegate: read all .py files in agents/ and summarize what each one does`
|
||||
3. `Use a task to create s06_subagent/example/string_tools.py with a slugify(text: str) function, then verify it from the parent agent`
|
||||
|
||||
观察重点:是否出现 `[Subagent spawned]` / `[Subagent done]`?子 Agent 的工具调用是否以 `[sub] ...` 输出?主 Agent 最后是否只继续处理子 Agent 返回的摘要?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
Agent 现在能拆任务了。但每个任务需要的知识不一样:改前端组件需要知道 React 规范,写 SQL 需要知道表结构。这些知识全塞进 system prompt,上下文直接爆了。
|
||||
|
||||
s07 Skill Loading → 技能按需注入,不在 system prompt 里堆文档。用到的时候才加载,和读文件一样自然。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `AgentTool.tsx`、`runAgent.ts`、`forkSubagent.ts`、`forkedAgent.ts` 的完整分析。
|
||||
|
||||
### 一、不是一种模式,是三种
|
||||
|
||||
教学版只讲了"全新的 messages[]"。CC 实际有三种执行模式:
|
||||
|
||||
| 模式 | 触发条件 | 上下文 |
|
||||
|------|---------|--------|
|
||||
| **Normal Subagent** | 指定了 `subagent_type`(normal path) | 全新 messages[],只有 prompt |
|
||||
| **Fork Subagent** | 没指定 `subagent_type`,fork gate 开启 | 通过 `buildForkedMessages()` 构造 cache-friendly 前缀,共享 prompt cache |
|
||||
| **General-Purpose** | 没指定 `subagent_type`,fork gate 关闭 | 同 Normal |
|
||||
|
||||
### 二、Fork 模式:为了共享 Prompt Cache
|
||||
|
||||
这是教学版没有的核心概念。Fork 模式(`forkSubagent.ts:60-71`)不创建全新上下文,而是通过 `buildForkedMessages()`(`forkSubagent.ts:107-168`)构造 cache-friendly 消息前缀,保留父 assistant message 并生成 placeholder tool results。目的不是隔离,而是让 Anthropic API 的 prompt cache 命中:父子 Agent 的 system prompt、tools、messages 前缀完全一致,API 端不需要重算。
|
||||
|
||||
缓存命中的五个关键组件(`forkedAgent.ts:57-68`):system prompt、tools、model、messages 前缀、thinking config,必须字节级一致。
|
||||
|
||||
### 三、Context Isolation 的精确粒度
|
||||
|
||||
`createSubagentContext()`(`forkedAgent.ts:345-462`)创建子 Agent 的 `ToolUseContext`:
|
||||
|
||||
| 字段 | 行为 |
|
||||
|------|------|
|
||||
| `abortController` | 新的 child controller,父 abort 向下传播 |
|
||||
| `setAppState` | 默认 no-op;但 sync agent 通过 `shareSetAppState` 共享(`runAgent.ts:697-714`) |
|
||||
| `readFileState` | **从父克隆**(避免重复读相同文件) |
|
||||
| `queryTracking` | 新 chainId,`depth = parentDepth + 1` |
|
||||
|
||||
子 Agent 不是完全隔离的:文件读取状态是共享的。UI 和通知的隔离程度取决于执行路径(sync/async/fork/teammate 各不同)。
|
||||
|
||||
### 四、递归 Fork 防护
|
||||
|
||||
教学版用"子 Agent 不给 task 工具"表达递归保护。真实实现更精细:`isInForkChild()`(`forkSubagent.ts:78-89`)检查对话历史中是否有 `FORK_BOILERPLATE_TAG`,有就拒绝。但 `constants/tools.ts:36-46` 中 `Agent` 工具默认在所有 agent 的禁用集合里,`USER_TYPE === 'ant'` 时例外;`forkSubagent.ts:73-89` 针对 fork child 有专门的递归保护;`agentToolUtils.ts:100-110` 在 teammate 场景下有特殊放行。不是简单的"禁止新的子 Agent"。
|
||||
|
||||
### 五、Permission Bubbling
|
||||
|
||||
Fork Agent 的 `permissionMode: 'bubble'`(`forkSubagent.ts:67`)意味着子 Agent 的权限弹窗冒泡到父终端,用户在主终端里审批子 Agent 的操作。
|
||||
|
||||
### 六、Async vs Sync
|
||||
|
||||
教学版只展示了同步子 Agent(父等着子跑完)。CC 还支持异步路径(`AgentTool.tsx:686-764`):`run_in_background: true` 时异步启动,返回 `{ status: 'async_launched' }` 立即给父 Agent,子 Agent 完成后通过通知机制告知父 Agent。实际触发条件不止 `run_in_background`,还有 auto-background、assistant force async、coordinator/proactive 等路径。
|
||||
|
||||
### 教学版的简化是刻意的
|
||||
|
||||
- 三种模式 → 一种(fresh messages):概念清晰
|
||||
- Prompt cache 共享 → 省略:教学版不涉及 API 层优化
|
||||
- 递归 fork 防护 → 简化为"子 Agent 无 task 工具"
|
||||
- Async → 省略(留给 s13):s06 先理解同步模型
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v0, ja@v0 -->
|
||||
365
s06_subagent/code.py
Normal file
@@ -0,0 +1,365 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s06: Subagent — spawn sub-agents with fresh messages[] for context isolation.
|
||||
|
||||
Parent Agent Subagent
|
||||
+------------------+ +------------------+
|
||||
| messages=[...] | | messages=[task] | <-- fresh
|
||||
| | dispatch | |
|
||||
| tool: task | ---------------> | own while loop |
|
||||
| prompt="..." | | bash/read/... |
|
||||
| | summary only | (max 30 turns) |
|
||||
| result = "..." | <--------------- | return last text |
|
||||
+------------------+ +------------------+
|
||||
^ |
|
||||
| intermediate results DISCARDED |
|
||||
+--------------------------------------+
|
||||
|
||||
Subagent tools: bash, read, write, edit, glob (NO task — no recursion)
|
||||
|
||||
Changes from s05:
|
||||
+ task tool + spawn_subagent() with fresh messages[]
|
||||
+ Safety limit: max 30 turns per subagent
|
||||
+ extract_text() helper
|
||||
Subagent cannot spawn sub-subagents (no task tool in sub_tools).
|
||||
Main loop unchanged: task auto-dispatches via TOOL_HANDLERS.
|
||||
|
||||
Run: python s06_subagent/code.py
|
||||
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
|
||||
"""
|
||||
|
||||
import os, subprocess, json
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True)
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"For complex sub-problems, use the task tool to spawn a subagent."
|
||||
)
|
||||
|
||||
# s06: subagent gets its own system prompt — no task, no recursion
|
||||
SUB_SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Complete the task you were given, then return a concise summary. "
|
||||
"Do not delegate further."
|
||||
)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02-s05 (unchanged): Tool Implementations
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text:
|
||||
return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_todo_write(todos: list) -> str:
|
||||
for i, t in enumerate(todos):
|
||||
if "content" not in t or "status" not in t:
|
||||
return f"Error: todos[{i}] missing 'content' or 'status'"
|
||||
if t["status"] not in ("pending", "in_progress", "completed"):
|
||||
return f"Error: todos[{i}] has invalid status '{t['status']}'"
|
||||
tasks_file = TASKS_DIR / "current_todos.json"
|
||||
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
|
||||
lines = ["\n\033[33m## Current Tasks\033[0m"]
|
||||
for t in todos:
|
||||
icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]]
|
||||
lines.append(f" [{icon}] {t['content']}")
|
||||
print("\n".join(lines))
|
||||
return f"Updated {len(todos)} tasks"
|
||||
|
||||
def extract_text(content) -> str:
|
||||
"""Extract text from message content blocks."""
|
||||
if not isinstance(content, list):
|
||||
return str(content)
|
||||
return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text")
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
{"name": "todo_write", "description": "Create and manage a task list for your current coding session.",
|
||||
"input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}, "required": ["content", "status"]}}}, "required": ["todos"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write,
|
||||
}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s06: Subagent — fresh messages[], summary only
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
SUB_TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
]
|
||||
# NO "task" tool — prevent recursive spawning
|
||||
|
||||
SUB_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob,
|
||||
}
|
||||
|
||||
def spawn_subagent(description: str) -> str:
|
||||
"""Spawn a subagent with fresh messages[], return summary only."""
|
||||
print(f"\n\033[35m[Subagent spawned]\033[0m")
|
||||
messages = [{"role": "user", "content": description}] # fresh context
|
||||
|
||||
for _ in range(30): # safety limit
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SUB_SYSTEM,
|
||||
messages=messages, tools=SUB_TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
# Issue 1: subagent also runs hooks (permissions apply)
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
handler = SUB_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
print(f" \033[90m[sub] {block.name}: {str(output)[:100]}\033[0m")
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
# Issue 5: fallback if safety limit hit during tool_use
|
||||
result = extract_text(messages[-1]["content"])
|
||||
if not result:
|
||||
# last message is tool_result, look backwards for assistant text
|
||||
for msg in reversed(messages):
|
||||
if msg["role"] == "assistant":
|
||||
result = extract_text(msg["content"])
|
||||
if result:
|
||||
break
|
||||
if not result:
|
||||
result = "Subagent stopped after 30 turns without final answer."
|
||||
print(f"\033[35m[Subagent done]\033[0m")
|
||||
return result # only summary, entire message history discarded
|
||||
|
||||
# Add task tool to parent's tools
|
||||
TOOLS.append({
|
||||
"name": "task",
|
||||
"description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.",
|
||||
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]},
|
||||
})
|
||||
TOOL_HANDLERS["task"] = spawn_subagent
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s04 (unchanged): Hook System
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
HOOKS = {"UserPromptSubmit": [], "PreToolUse": [], "PostToolUse": [], "Stop": []}
|
||||
|
||||
def register_hook(event: str, callback):
|
||||
HOOKS[event].append(callback)
|
||||
|
||||
def trigger_hooks(event: str, *args):
|
||||
for callback in HOOKS[event]:
|
||||
result = callback(*args)
|
||||
if result is not None:
|
||||
return result
|
||||
return None
|
||||
|
||||
DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if="]
|
||||
|
||||
def permission_hook(block):
|
||||
"""PreToolUse: deny list check."""
|
||||
if block.name == "bash":
|
||||
for p in DENY_LIST:
|
||||
if p in block.input.get("command", ""):
|
||||
print(f"\n\033[31m⛔ Blocked: '{p}'\033[0m")
|
||||
return "Permission denied"
|
||||
return None
|
||||
|
||||
def log_hook(block):
|
||||
"""PreToolUse: log tool calls."""
|
||||
print(f"\033[90m[HOOK] {block.name}\033[0m")
|
||||
return None
|
||||
|
||||
def context_inject_hook(query: str):
|
||||
"""UserPromptSubmit: log working directory."""
|
||||
print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m")
|
||||
return None
|
||||
|
||||
def summary_hook(messages: list):
|
||||
"""Stop: print tool call count."""
|
||||
tool_count = sum(1 for m in messages
|
||||
for b in (m.get("content") if isinstance(m.get("content"), list) else [])
|
||||
if isinstance(b, dict) and b.get("type") == "tool_result")
|
||||
print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m")
|
||||
return None
|
||||
|
||||
register_hook("UserPromptSubmit", context_inject_hook)
|
||||
register_hook("PreToolUse", permission_hook)
|
||||
register_hook("PreToolUse", log_hook)
|
||||
register_hook("Stop", summary_hook)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — same as s05 + nag reminder, task auto-dispatches
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
rounds_since_todo = 0
|
||||
|
||||
def agent_loop(messages: list):
|
||||
global rounds_since_todo
|
||||
while True:
|
||||
# s05: nag reminder
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
messages.append({"role": "user",
|
||||
"content": "<reminder>Update your todos.</reminder>"})
|
||||
rounds_since_todo = 0
|
||||
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
force = trigger_hooks("Stop", messages)
|
||||
if force:
|
||||
messages.append({"role": "user", "content": force})
|
||||
continue
|
||||
return
|
||||
|
||||
rounds_since_todo += 1
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
|
||||
if block.name == "todo_write":
|
||||
rounds_since_todo = 0
|
||||
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": output})
|
||||
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s06: Subagent — spawn sub-agents with fresh context, summary only")
|
||||
print("Type a question, press Enter. Type q to quit.\n")
|
||||
|
||||
history = []
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms06 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
trigger_hooks("UserPromptSubmit", query)
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
125
s06_subagent/images/subagent-overview.en.svg
Normal file
@@ -0,0 +1,125 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 500" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-purple" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="800" height="500" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Subagent — Independent messages[], All Intermediate Steps Discarded</text>
|
||||
|
||||
<!-- ===== Parent Agent (left) ===== -->
|
||||
<rect x="30" y="68" width="310" height="268" rx="12" fill="#f0f4ff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="185" y="92" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">Parent Agent</text>
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="50" y="100" width="110" height="36" rx="6" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="105" y="123" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="160" y1="118" x2="198" y2="118" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="200" y="96" width="100" height="44" rx="6" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="250" y="123" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
|
||||
<!-- LLM → dispatch -->
|
||||
<line x1="250" y1="140" x2="250" y2="156" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="264" y="152" fill="#64748b" font-size="8">tool_use</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="50" y="158" width="280" height="74" rx="6" fill="#e0e7ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="190" y="176" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- Other tools (gray) -->
|
||||
<rect x="65" y="190" width="115" height="32" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="122" y="206" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">Base Tools</text>
|
||||
<text x="122" y="219" fill="#64748b" font-size="8" text-anchor="middle">bash / read / write / ...</text>
|
||||
|
||||
<!-- task → spawn -->
|
||||
<rect x="200" y="193" width="110" height="26" rx="4" fill="#ede9fe" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="255" y="210" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">task → spawn</text>
|
||||
|
||||
<!-- Parent tool_result target -->
|
||||
<rect x="190" y="270" width="120" height="34" rx="5" fill="#dcfce7" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="250" y="291" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">tool_result</text>
|
||||
<path d="M 250 232 L 250 270" fill="none" stroke="#16a34a" stroke-width="1.4" marker-end="url(#arrow-green)"/>
|
||||
<path d="M 190 287 L 42 287 L 42 118 L 50 118" fill="none" stroke="#16a34a" stroke-width="1.2" marker-end="url(#arrow-green)" stroke-dasharray="4,3"/>
|
||||
<text x="110" y="304" fill="#94a3b8" font-size="8" text-anchor="middle">append messages[]</text>
|
||||
<text x="210" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">Normal tool results also append to messages[]</text>
|
||||
|
||||
<!-- ===== Subagent (right) ===== -->
|
||||
<rect x="430" y="68" width="340" height="268" rx="12" fill="#faf5ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="600" y="92" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">Subagent (Fresh Context)</text>
|
||||
|
||||
<!-- fresh messages -->
|
||||
<rect x="450" y="100" width="150" height="36" rx="6" fill="#fff" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="525" y="116" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">messages = [task]</text>
|
||||
<text x="525" y="130" fill="#7c3aed" font-size="8" text-anchor="middle">fresh — no parent history</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="600" y1="118" x2="648" y2="118" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
|
||||
<!-- Subagent LLM -->
|
||||
<rect x="650" y="96" width="100" height="44" rx="6" fill="#fff" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="700" y="123" fill="#5b21b6" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
|
||||
<!-- own loop -->
|
||||
<rect x="455" y="150" width="300" height="56" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="605" y="170" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">Own while loop (max 30 rounds)</text>
|
||||
<text x="605" y="186" fill="#5b21b6" font-size="9" text-anchor="middle">bash · read · write · edit · glob</text>
|
||||
<text x="605" y="198" fill="#94a3b8" font-size="8" text-anchor="middle">No task — recursive spawn forbidden</text>
|
||||
|
||||
<!-- intermediate results → discard -->
|
||||
<rect x="460" y="218" width="290" height="44" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="605" y="238" fill="#64748b" font-size="10" text-anchor="middle">Intermediate 30+ tool calls + results</text>
|
||||
<text x="605" y="254" fill="#dc2626" font-size="10" font-weight="600" text-anchor="middle">All discarded ✗</text>
|
||||
|
||||
<!-- extract only last text -->
|
||||
<rect x="460" y="272" width="290" height="28" rx="6" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="605" y="290" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">✓ Extract only final text → return to Parent</text>
|
||||
|
||||
<!-- ===== dispatch line: Parent → Subagent (top) ===== -->
|
||||
<path d="M 310 206 L 362 206 Q 370 206 370 198 L 370 126 Q 370 118 378 118 L 450 118" fill="none" stroke="#7c3aed" stroke-width="2.5" marker-end="url(#arrow-purple)"/>
|
||||
<rect x="342" y="152" width="80" height="20" rx="4" fill="#faf5ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="382" y="166" fill="#7c3aed" font-size="9" font-weight="700" text-anchor="middle">① task desc</text>
|
||||
|
||||
<!-- ===== return line: Subagent → Parent tool_result ===== -->
|
||||
<path d="M 460 286 L 310 286" fill="none" stroke="#16a34a" stroke-width="2.5" marker-end="url(#arrow-green)"/>
|
||||
<rect x="350" y="268" width="70" height="20" rx="4" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="385" y="282" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">② summary</text>
|
||||
|
||||
<!-- ===== Legend ===== -->
|
||||
<rect x="60" y="370" width="680" height="56" rx="8" fill="#f1f5f9"/>
|
||||
|
||||
<rect x="80" y="384" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="104" y="394" fill="#334155" font-size="10">s05 Preserved: loop, hooks, todo_write, 6 base tools</text>
|
||||
|
||||
<rect x="80" y="404" width="16" height="12" rx="3" fill="#ede9fe" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="104" y="414" fill="#334155" font-size="10">s06 New: task tool + spawn_subagent() — independent messages[], returns only summary</text>
|
||||
|
||||
<!-- Data flow labels -->
|
||||
<rect x="430" y="440" width="310" height="44" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="445" y="458" fill="#7c3aed" font-size="10" font-weight="600">① Parent → Sub:</text>
|
||||
<text x="580" y="458" fill="#64748b" font-size="10">task description (a short string)</text>
|
||||
<text x="445" y="476" fill="#16a34a" font-size="10" font-weight="600">② Sub → Parent:</text>
|
||||
<text x="580" y="476" fill="#64748b" font-size="10">extract_text() (final conclusion only)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 8.1 KiB |
125
s06_subagent/images/subagent-overview.ja.svg
Normal file
@@ -0,0 +1,125 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 500" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-purple" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="800" height="500" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Subagent — 独立した messages[]、中間過程はすべて破棄</text>
|
||||
|
||||
<!-- ===== 親 Agent(左側) ===== -->
|
||||
<rect x="30" y="68" width="310" height="268" rx="12" fill="#f0f4ff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="185" y="92" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">親 Agent</text>
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="50" y="100" width="110" height="36" rx="6" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="105" y="123" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="160" y1="118" x2="198" y2="118" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="200" y="96" width="100" height="44" rx="6" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="250" y="123" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
|
||||
<!-- LLM → ディスパッチ -->
|
||||
<line x1="250" y1="140" x2="250" y2="156" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="264" y="152" fill="#64748b" font-size="8">tool_use</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="50" y="158" width="280" height="74" rx="6" fill="#e0e7ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="190" y="176" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- 他のツール(灰色) -->
|
||||
<rect x="65" y="190" width="115" height="32" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="122" y="206" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">基本ツール</text>
|
||||
<text x="122" y="219" fill="#64748b" font-size="8" text-anchor="middle">bash / read / write / ...</text>
|
||||
|
||||
<!-- task → spawn -->
|
||||
<rect x="200" y="193" width="110" height="26" rx="4" fill="#ede9fe" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="255" y="210" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">task → spawn</text>
|
||||
|
||||
<!-- Parent tool_result target -->
|
||||
<rect x="190" y="270" width="120" height="34" rx="5" fill="#dcfce7" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="250" y="291" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">tool_result</text>
|
||||
<path d="M 250 232 L 250 270" fill="none" stroke="#16a34a" stroke-width="1.4" marker-end="url(#arrow-green)"/>
|
||||
<path d="M 190 287 L 42 287 L 42 118 L 50 118" fill="none" stroke="#16a34a" stroke-width="1.2" marker-end="url(#arrow-green)" stroke-dasharray="4,3"/>
|
||||
<text x="110" y="304" fill="#94a3b8" font-size="8" text-anchor="middle">messages[] に追加</text>
|
||||
<text x="210" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">通常ツール結果も messages[] に戻る</text>
|
||||
|
||||
<!-- ===== サブエージェント(右側) ===== -->
|
||||
<rect x="430" y="68" width="340" height="268" rx="12" fill="#faf5ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="600" y="92" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">サブエージェント(新規コンテキスト)</text>
|
||||
|
||||
<!-- 新規 messages -->
|
||||
<rect x="450" y="100" width="150" height="36" rx="6" fill="#fff" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="525" y="116" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">messages = [task]</text>
|
||||
<text x="525" y="130" fill="#7c3aed" font-size="8" text-anchor="middle">新規 — 親の会話を継承しない</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="600" y1="118" x2="648" y2="118" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
|
||||
<!-- サブエージェント LLM -->
|
||||
<rect x="650" y="96" width="100" height="44" rx="6" fill="#fff" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="700" y="123" fill="#5b21b6" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
|
||||
<!-- 独自ループ -->
|
||||
<rect x="455" y="150" width="300" height="56" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="605" y="170" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">独自の while ループ(最大 30 ラウンド)</text>
|
||||
<text x="605" y="186" fill="#5b21b6" font-size="9" text-anchor="middle">bash · read · write · edit · glob</text>
|
||||
<text x="605" y="198" fill="#94a3b8" font-size="8" text-anchor="middle">task なし — 再帰 spawn 禁止</text>
|
||||
|
||||
<!-- 中間結果 → 破棄 -->
|
||||
<rect x="460" y="218" width="290" height="44" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="605" y="238" fill="#64748b" font-size="10" text-anchor="middle">中間 30+ ラウンドのツール呼び出し + 結果</text>
|
||||
<text x="605" y="254" fill="#dc2626" font-size="10" font-weight="600" text-anchor="middle">すべて破棄 ✗</text>
|
||||
|
||||
<!-- 最後のテキストのみ抽出 -->
|
||||
<rect x="460" y="272" width="290" height="28" rx="6" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="605" y="290" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">✓ 最後のテキストのみ抽出 → 親に返却</text>
|
||||
|
||||
<!-- ===== ディスパッチ線:親 → サブエージェント(上) ===== -->
|
||||
<path d="M 310 206 L 362 206 Q 370 206 370 198 L 370 126 Q 370 118 378 118 L 450 118" fill="none" stroke="#7c3aed" stroke-width="2.5" marker-end="url(#arrow-purple)"/>
|
||||
<rect x="342" y="152" width="80" height="20" rx="4" fill="#faf5ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="382" y="166" fill="#7c3aed" font-size="9" font-weight="700" text-anchor="middle">① task 説明</text>
|
||||
|
||||
<!-- ===== 返却線:サブエージェント → Parent tool_result ===== -->
|
||||
<path d="M 460 286 L 310 286" fill="none" stroke="#16a34a" stroke-width="2.5" marker-end="url(#arrow-green)"/>
|
||||
<rect x="350" y="268" width="70" height="20" rx="4" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="385" y="282" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">② summary</text>
|
||||
|
||||
<!-- ===== 凡例 ===== -->
|
||||
<rect x="60" y="370" width="680" height="56" rx="8" fill="#f1f5f9"/>
|
||||
|
||||
<rect x="80" y="384" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="104" y="394" fill="#334155" font-size="10">s05 保持:ループ、フック、todo_write、6 つの基本ツール</text>
|
||||
|
||||
<rect x="80" y="404" width="16" height="12" rx="3" fill="#ede9fe" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="104" y="414" fill="#334155" font-size="10">s06 新規:task ツール + spawn_subagent() — 独立 messages[]、要約のみ返却</text>
|
||||
|
||||
<!-- データフローラベル -->
|
||||
<rect x="430" y="440" width="310" height="44" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="445" y="458" fill="#7c3aed" font-size="10" font-weight="600">① 親 → サブ:</text>
|
||||
<text x="580" y="458" fill="#64748b" font-size="10">task description(短い文字列)</text>
|
||||
<text x="445" y="476" fill="#16a34a" font-size="10" font-weight="600">② サブ → 親:</text>
|
||||
<text x="580" y="476" fill="#64748b" font-size="10">extract_text()(最終結論のみ)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 8.4 KiB |
125
s06_subagent/images/subagent-overview.svg
Normal file
@@ -0,0 +1,125 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 500" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-purple" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="800" height="500" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Subagent — 独立 messages[],中间过程全部丢弃</text>
|
||||
|
||||
<!-- ===== Parent Agent(左侧) ===== -->
|
||||
<rect x="30" y="68" width="310" height="268" rx="12" fill="#f0f4ff" stroke="#2563eb" stroke-width="2"/>
|
||||
<text x="185" y="92" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">Parent Agent</text>
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="50" y="100" width="110" height="36" rx="6" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="105" y="123" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="160" y1="118" x2="198" y2="118" stroke="#2563eb" stroke-width="1.5" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="200" y="96" width="100" height="44" rx="6" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="250" y="123" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
|
||||
<!-- LLM → dispatch -->
|
||||
<line x1="250" y1="140" x2="250" y2="156" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="264" y="152" fill="#64748b" font-size="8">tool_use</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="50" y="158" width="280" height="74" rx="6" fill="#e0e7ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="190" y="176" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- 其他工具(灰色) -->
|
||||
<rect x="65" y="190" width="115" height="32" rx="4" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="122" y="206" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">基础工具</text>
|
||||
<text x="122" y="219" fill="#64748b" font-size="8" text-anchor="middle">bash / read / write / ...</text>
|
||||
|
||||
<!-- task → spawn -->
|
||||
<rect x="200" y="193" width="110" height="26" rx="4" fill="#ede9fe" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="255" y="210" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">task → spawn</text>
|
||||
|
||||
<!-- Parent tool_result target -->
|
||||
<rect x="190" y="270" width="120" height="34" rx="5" fill="#dcfce7" stroke="#16a34a" stroke-width="1.2"/>
|
||||
<text x="250" y="291" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">tool_result</text>
|
||||
<path d="M 250 232 L 250 270" fill="none" stroke="#16a34a" stroke-width="1.4" marker-end="url(#arrow-green)"/>
|
||||
<path d="M 190 287 L 42 287 L 42 118 L 50 118" fill="none" stroke="#16a34a" stroke-width="1.2" marker-end="url(#arrow-green)" stroke-dasharray="4,3"/>
|
||||
<text x="110" y="304" fill="#94a3b8" font-size="8" text-anchor="middle">append messages[]</text>
|
||||
<text x="210" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">普通工具结果也回填 messages[]</text>
|
||||
|
||||
<!-- ===== Subagent(右侧) ===== -->
|
||||
<rect x="430" y="68" width="340" height="268" rx="12" fill="#faf5ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="600" y="92" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">Subagent (全新上下文)</text>
|
||||
|
||||
<!-- fresh messages -->
|
||||
<rect x="450" y="100" width="150" height="36" rx="6" fill="#fff" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="525" y="116" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">messages = [task]</text>
|
||||
<text x="525" y="130" fill="#7c3aed" font-size="8" text-anchor="middle">fresh — 不继承父对话</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="600" y1="118" x2="648" y2="118" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
|
||||
<!-- Subagent LLM -->
|
||||
<rect x="650" y="96" width="100" height="44" rx="6" fill="#fff" stroke="#7c3aed" stroke-width="1.5"/>
|
||||
<text x="700" y="123" fill="#5b21b6" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
|
||||
<!-- own loop -->
|
||||
<rect x="455" y="150" width="300" height="56" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="605" y="170" fill="#5b21b6" font-size="10" font-weight="600" text-anchor="middle">自己的 while 循环(最多 30 轮)</text>
|
||||
<text x="605" y="186" fill="#5b21b6" font-size="9" text-anchor="middle">bash · read · write · edit · glob</text>
|
||||
<text x="605" y="198" fill="#94a3b8" font-size="8" text-anchor="middle">无 task — 禁止递归 spawn</text>
|
||||
|
||||
<!-- intermediate results → discard -->
|
||||
<rect x="460" y="218" width="290" height="44" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="605" y="238" fill="#64748b" font-size="10" text-anchor="middle">中间 30+ 轮工具调用 + 结果</text>
|
||||
<text x="605" y="254" fill="#dc2626" font-size="10" font-weight="600" text-anchor="middle">全部丢弃 ✗</text>
|
||||
|
||||
<!-- extract only last text -->
|
||||
<rect x="460" y="272" width="290" height="28" rx="6" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="605" y="290" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">✓ 只提取最后一段文本 → 返回给 Parent</text>
|
||||
|
||||
<!-- ===== dispatch 线:Parent → Subagent(走上面) ===== -->
|
||||
<path d="M 310 206 L 362 206 Q 370 206 370 198 L 370 126 Q 370 118 378 118 L 450 118" fill="none" stroke="#7c3aed" stroke-width="2.5" marker-end="url(#arrow-purple)"/>
|
||||
<rect x="342" y="152" width="80" height="20" rx="4" fill="#faf5ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="382" y="166" fill="#7c3aed" font-size="9" font-weight="700" text-anchor="middle">① task 描述</text>
|
||||
|
||||
<!-- ===== return 线:Subagent → Parent tool_result ===== -->
|
||||
<path d="M 460 286 L 310 286" fill="none" stroke="#16a34a" stroke-width="2.5" marker-end="url(#arrow-green)"/>
|
||||
<rect x="350" y="268" width="70" height="20" rx="4" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="385" y="282" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">② summary</text>
|
||||
|
||||
<!-- ===== 图例 ===== -->
|
||||
<rect x="60" y="370" width="680" height="56" rx="8" fill="#f1f5f9"/>
|
||||
|
||||
<rect x="80" y="384" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="104" y="394" fill="#334155" font-size="10">s05 保留:循环、hook、todo_write、6 个基础工具</text>
|
||||
|
||||
<rect x="80" y="404" width="16" height="12" rx="3" fill="#ede9fe" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="104" y="414" fill="#334155" font-size="10">s06 新增:task 工具 + spawn_subagent() — 独立 messages[],只回传摘要</text>
|
||||
|
||||
<!-- 数据流标注 -->
|
||||
<rect x="430" y="440" width="310" height="44" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="445" y="458" fill="#7c3aed" font-size="10" font-weight="600">① Parent → Sub:</text>
|
||||
<text x="580" y="458" fill="#64748b" font-size="10">task description(一小段文字)</text>
|
||||
<text x="445" y="476" fill="#16a34a" font-size="10" font-weight="600">② Sub → Parent:</text>
|
||||
<text x="580" y="476" fill="#64748b" font-size="10">extract_text()(只有最终结论)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 8.2 KiB |
182
s07_skill_loading/README.en.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# s07: Skill Loading — Load Only When Needed
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → `s07` → [s08](../s08_context_compact/) → s09 → ... → s20
|
||||
> *"Load when needed, don't stuff the prompt"* — Inject via tool_result, not system prompt.
|
||||
>
|
||||
> **Harness Layer**: Knowledge — load on demand, don't fill the context.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
Your project has a React component spec, a SQL style guide, and an API design doc. You want the Agent to follow these specs automatically. The most straightforward idea — stuff them all into the system prompt:
|
||||
|
||||
```python
|
||||
SYSTEM = (
|
||||
f"You are a coding agent. "
|
||||
+ open("docs/react-style.md").read() # 2000 lines
|
||||
+ open("docs/sql-style.md").read() # 1500 lines
|
||||
+ open("docs/api-design.md").read() # 3000 lines
|
||||
)
|
||||
```
|
||||
|
||||
6500 lines of system prompt. The Agent carries these docs on every LLM call — whether it's changing a CSS color or fixing a SQL query. 99% of the content is irrelevant to the current task, burning tokens for nothing.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
The minimal hook structure, `todo_write`, and sub-Agent from the previous chapter are preserved. This chapter focuses on the new `load_skill` tool. At startup, inject the skill catalog into the SYSTEM prompt; at runtime, register one more tool to load full content, spending tokens only when used.
|
||||
|
||||
Two-level design:
|
||||
|
||||
| Level | Location | Timing | Cost |
|
||||
|-------|----------|--------|------|
|
||||
| 1. Catalog | system prompt | Injected at startup (harness scans skills/) | ~100 tokens/skill, carried every turn |
|
||||
| 2. Content | tool_result | When Agent calls load_skill | ~2000 tokens/skill, on demand |
|
||||
|
||||
The dispatch mechanism is unchanged, `load_skill` auto-dispatches via `TOOL_HANDLERS[block.name]`.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
**skills/ directory**, one subdirectory per skill, each containing a `SKILL.md` file:
|
||||
|
||||
```
|
||||
skills/
|
||||
agent-builder/SKILL.md
|
||||
code-review/SKILL.md
|
||||
mcp-builder/SKILL.md
|
||||
pdf/SKILL.md
|
||||
```
|
||||
|
||||
**Level 1: Inject catalog at startup**: the harness calls `_scan_skills()` at startup to scan the skills/ directory, parsing each SKILL.md's YAML frontmatter (`name`, `description`) into a `SKILL_REGISTRY` dictionary. `list_skills()` generates the catalog from the registry, injected into the SYSTEM prompt. The Agent sees "which skills I have available" every turn, with no extra API calls:
|
||||
|
||||
```python
|
||||
SKILL_REGISTRY: dict[str, dict] = {}
|
||||
|
||||
def _scan_skills():
|
||||
if not SKILLS_DIR.exists():
|
||||
return
|
||||
for d in sorted(SKILLS_DIR.iterdir()):
|
||||
if not d.is_dir():
|
||||
continue
|
||||
manifest = d / "SKILL.md"
|
||||
if manifest.exists():
|
||||
raw = manifest.read_text()
|
||||
meta, body = _parse_frontmatter(raw)
|
||||
name = meta.get("name", d.name)
|
||||
desc = meta.get("description", raw.split("\n")[0].lstrip("#").strip())
|
||||
SKILL_REGISTRY[name] = {"name": name, "description": desc, "content": raw}
|
||||
|
||||
_scan_skills() # runs once at startup
|
||||
|
||||
def list_skills() -> str:
|
||||
return "\n".join(f"- **{s['name']}**: {s['description']}" for s in SKILL_REGISTRY.values())
|
||||
|
||||
def build_system() -> str:
|
||||
catalog = list_skills()
|
||||
return (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
f"Skills available:\n{catalog}\n"
|
||||
"Use load_skill to get full details when needed."
|
||||
)
|
||||
|
||||
SYSTEM = build_system()
|
||||
```
|
||||
|
||||
**Level 2: load_skill**: the Agent decides "I need the SQL style guide" and calls `load_skill("sql-style")`. Lookup goes through the registry, not file paths, eliminating path traversal risk. The content is injected via `tool_result`:
|
||||
|
||||
```python
|
||||
def load_skill(name: str) -> str:
|
||||
skill = SKILL_REGISTRY.get(name)
|
||||
if not skill:
|
||||
return f"Skill not found: {name}"
|
||||
return skill["content"]
|
||||
```
|
||||
|
||||
The key distinction: skill content is not part of the system prompt. It enters the current messages as a tool result. Subsequent calls carry it along with the history until context compaction, truncation, or session end. This naturally connects to s08's compact: on-demand loading solves "don't carry what you shouldn't", compact solves "how to drop what you should."
|
||||
|
||||
---
|
||||
|
||||
## Changes from s06
|
||||
|
||||
| Component | Before (s06) | After (s07) |
|
||||
|-----------|-------------|-------------|
|
||||
| Tool count | 7 (bash, read, write, edit, glob, todo_write, task) | 8 (+load_skill) |
|
||||
| Knowledge loading | None | Two-level: startup catalog in SYSTEM + runtime load_skill |
|
||||
| SYSTEM prompt | Static string | Startup scan of skills/ injects catalog |
|
||||
| Skill registry | None | SKILL_REGISTRY (populated at startup, prevents path traversal) |
|
||||
| Loop | Unchanged | Unchanged (skill tool auto-dispatches) |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s07_skill_loading/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `What skills are available?`
|
||||
2. `Load the code-review skill and follow its instructions`
|
||||
3. `I need to do a code review -- load the relevant skill first`
|
||||
|
||||
What to watch for: Does the Agent know available skills from the SYSTEM catalog? Does `[HOOK] load_skill` appear when full instructions are needed? Does the answer use the loaded skill's instructions?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
On-demand loading solved "don't carry what you shouldn't." But another problem looms: after the Agent works for 30 minutes, the messages list fills up with intermediate process. Old tool_results, stale file contents, occupying context but adding no value.
|
||||
|
||||
→ s08 Context Compact: A four-layer compaction strategy. Cheap layers run first, expensive layers run last.
|
||||
|
||||
<details>
|
||||
<summary>Dive into CC Source Code</summary>
|
||||
|
||||
> The following is based on analysis of CC source code `loadSkillsDir.ts`, `SkillTool.ts`, `bundledSkills.ts`, `commands.ts`.
|
||||
|
||||
### 1. Skill Sources: Not Just One skills/ Directory
|
||||
|
||||
The teaching version assumes all skills live in a `skills/` directory. CC loads from multiple sources spread across multiple files: `loadSkillsDir.ts` handles user/project/`--add-dir` directories and legacy commands (`.claude/commands/`); `bundledSkills.ts` handles built-in skills; `SkillTool.ts` handles MCP remote skills; `commands.ts` handles command aggregation. Types include managed/policy skills, user skills (`~/.claude/skills/`), project skills (`.claude/skills/`), `--add-dir` skills, legacy commands, dynamic skills, conditional skills (with `paths` frontmatter, activated by file path), bundled skills, plugin skills, MCP skills.
|
||||
|
||||
### 2. SKILL.md Frontmatter — Common Fields
|
||||
|
||||
CC's SKILL.md YAML frontmatter is parsed by `parseSkillFrontmatterFields()` in `loadSkillsDir.ts`. Common fields include:
|
||||
|
||||
| Field | Purpose |
|
||||
|-------|---------|
|
||||
| `name` / `description` | Display name and description |
|
||||
| `when_to_use` | Guides the model on when to invoke |
|
||||
| `allowed-tools` | Auto-allow list of tools available to the skill |
|
||||
| `context` | `inline` (default) or `fork` (run as sub-Agent) |
|
||||
| `model` | Model override (haiku/sonnet/opus/inherit) |
|
||||
| `hooks` | Skill-level hook configuration |
|
||||
| `paths` | Glob patterns for conditional activation |
|
||||
| `user-invocable` | Users can invoke via `/name` |
|
||||
|
||||
The complete field list changes across versions; above are the core fields relevant to the teaching version.
|
||||
|
||||
### 3. Precise Implementation of Two-Level Loading
|
||||
|
||||
1. **Catalog (at startup)**: `getSkillDirCommands()` scans directory → registers as `Command` objects containing only metadata. `getSkillListingAttachments()` formats the skill list as attachments, budgeted at ~1% of the context window (cap 8000 characters).
|
||||
2. **Load (on invocation)**: Model calls `Skill` tool (input fields are `skill` + optional `args`; teaching version uses `name`) → `getPromptForCommand()` expands full SKILL.md content → `SkillTool` returns a tool_result with display text `"Launching skill: {name}"`, while the actual skill content is injected via `newMessages`. The teaching version merges both into "injected via tool_result" as a simplification.
|
||||
|
||||
### The Teaching Version's Simplification Is Intentional
|
||||
|
||||
- Multiple files and sources → 1 `skills/` directory: sufficient to demonstrate the core concept of two-level loading
|
||||
- Multiple frontmatter fields → only parse name/description: reduces parsing complexity
|
||||
- Forked skills (`context: 'fork'`) → omitted: the teaching version only expands inline skill loading
|
||||
- `Skill` tool input `skill`+`args` → teaching version uses `name`: avoids extra argument parsing complexity
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
182
s07_skill_loading/README.ja.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# s07: Skill Loading — 必要なときにだけ読み込む
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → `s07` → [s08](../s08_context_compact/) → s09 → ... → s20
|
||||
> *"Load when needed, don't stuff the prompt"* — tool_result で注入、system prompt には詰め込まない。
|
||||
>
|
||||
> **Harness レイヤー**: 知識 — 必要に応じて読み込み、コンテキストに詰め込まない。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
プロジェクトには React コンポーネント仕様、SQL スタイルガイド、API 設計ドキュメントがある。Agent にこれらの仕様を自動的に守らせたい。最も直接的な方法 — すべて system prompt に詰め込む:
|
||||
|
||||
```python
|
||||
SYSTEM = (
|
||||
f"You are a coding agent. "
|
||||
+ open("docs/react-style.md").read() # 2000 行
|
||||
+ open("docs/sql-style.md").read() # 1500 行
|
||||
+ open("docs/api-design.md").read() # 3000 行
|
||||
)
|
||||
```
|
||||
|
||||
6500 行の system prompt。Agent は LLM を呼び出すたびにこれらのドキュメントを運ぶ — CSS の色を変えるときも SQL クエリを修正するときも。99% の内容が現在のタスクと無関係で、トークンを無駄に消費する。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
前章の最小フック構造、`todo_write`、サブ Agent を維持し、本章は新規の `load_skill` ツールに注目する。起動時にスキルカタログを SYSTEM prompt に注入し、実行時に完全な内容を読み込むツールを登録する。使ったときだけトークンを消費。
|
||||
|
||||
2 層設計:
|
||||
|
||||
| 層 | 場所 | タイミング | コスト |
|
||||
|---|------|-----------|--------|
|
||||
| 1. カタログ | system prompt | 起動時に注入(harness が skills/ をスキャン) | ~100 トークン/スキル、毎ターン携帯 |
|
||||
| 2. 内容 | tool_result | Agent が load_skill を呼び出したとき | ~2000 トークン/スキル、オンデマンド |
|
||||
|
||||
ディスパッチ機構は変わらず、`load_skill` は `TOOL_HANDLERS[block.name]` を通じて自動的にディスパッチされる。
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||
**skills/ ディレクトリ**、スキルごとに 1 つのサブディレクトリ、それぞれに `SKILL.md` ファイルを含む:
|
||||
|
||||
```
|
||||
skills/
|
||||
agent-builder/SKILL.md
|
||||
code-review/SKILL.md
|
||||
mcp-builder/SKILL.md
|
||||
pdf/SKILL.md
|
||||
```
|
||||
|
||||
**第 1 層:起動時にカタログを注入**:harness は起動時に `_scan_skills()` を呼び出して skills/ ディレクトリをスキャンし、各 SKILL.md の YAML frontmatter(`name`、`description`)を解析して `SKILL_REGISTRY` 辞書に格納する。`list_skills()` はレジストリからカタログを生成し、SYSTEM prompt に注入する。Agent は毎ターン「どのスキルが利用可能か」を確認できる。追加の API 呼び出しは不要:
|
||||
|
||||
```python
|
||||
SKILL_REGISTRY: dict[str, dict] = {}
|
||||
|
||||
def _scan_skills():
|
||||
if not SKILLS_DIR.exists():
|
||||
return
|
||||
for d in sorted(SKILLS_DIR.iterdir()):
|
||||
if not d.is_dir():
|
||||
continue
|
||||
manifest = d / "SKILL.md"
|
||||
if manifest.exists():
|
||||
raw = manifest.read_text()
|
||||
meta, body = _parse_frontmatter(raw)
|
||||
name = meta.get("name", d.name)
|
||||
desc = meta.get("description", raw.split("\n")[0].lstrip("#").strip())
|
||||
SKILL_REGISTRY[name] = {"name": name, "description": desc, "content": raw}
|
||||
|
||||
_scan_skills() # runs once at startup
|
||||
|
||||
def list_skills() -> str:
|
||||
return "\n".join(f"- **{s['name']}**: {s['description']}" for s in SKILL_REGISTRY.values())
|
||||
|
||||
def build_system() -> str:
|
||||
catalog = list_skills()
|
||||
return (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
f"Skills available:\n{catalog}\n"
|
||||
"Use load_skill to get full details when needed."
|
||||
)
|
||||
|
||||
SYSTEM = build_system()
|
||||
```
|
||||
|
||||
**第 2 層:load_skill**:Agent が「SQL スタイルガイドが必要」と判断し、`load_skill("sql-style")` を呼び出す。レジストリを通じて検索し、ファイルパスを経由しないため、パストラバーサルのリスクがない。内容は `tool_result` を通じて注入される:
|
||||
|
||||
```python
|
||||
def load_skill(name: str) -> str:
|
||||
skill = SKILL_REGISTRY.get(name)
|
||||
if not skill:
|
||||
return f"Skill not found: {name}"
|
||||
return skill["content"]
|
||||
```
|
||||
|
||||
重要な違い:スキル内容は system prompt の一部ではなく、ツール結果として現在の messages に入る。後続の呼び出しでは履歴とともに携帯され、コンテキスト圧縮、切り捨て、またはセッション終了まで保持される。これは s08 の compact と自然に接続する:オンデマンド読み込みで「運ぶべきでないものは運ばない」を解決し、compact が「捨てるべきものをどう捨てるか」を解決する。
|
||||
|
||||
---
|
||||
|
||||
## s06 からの変更点
|
||||
|
||||
| コンポーネント | 変更前 (s06) | 変更後 (s07) |
|
||||
|---------------|-------------|-------------|
|
||||
| ツール数 | 7 (bash, read, write, edit, glob, todo_write, task) | 8 (+load_skill) |
|
||||
| 知識読み込み | なし | 2 層:起動時カタログ注入 SYSTEM + 実行時 load_skill |
|
||||
| SYSTEM プロンプト | 静的文字列 | 起動時に skills/ をスキャンしてカタログ注入 |
|
||||
| スキルレジストリ | なし | SKILL_REGISTRY(起動時に充填、パストラバーサル防止) |
|
||||
| ループ | 変更なし | 変更なし(スキルツールは自動ディスパッチ) |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s07_skill_loading/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみよう:
|
||||
|
||||
1. `What skills are available?`
|
||||
2. `Load the code-review skill and follow its instructions`
|
||||
3. `I need to do a code review -- load the relevant skill first`
|
||||
|
||||
観察のポイント:Agent は SYSTEM 内のカタログから利用可能なスキルを知っているか? 完全な手順が必要なときに `[HOOK] load_skill` が表示されるか? 読み込んだスキルの説明を使って回答しているか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
オンデマンド読み込みで「運ぶべきでないものは運ばない」問題は解決した。しかし別の問題が待っている:Agent が 30 分連続で作業すると、messages リストが中間プロセスで埋め尽くされる。古い tool_result、期限切れのファイル内容、コンテキストを占領しているが価値を生まない。
|
||||
|
||||
→ s08 Context Compact:4 層圧縮戦略。安価な層を先に実行、高価な層を後に実行。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードを深掘り</summary>
|
||||
|
||||
> 以下は CC ソースコード `loadSkillsDir.ts`、`SkillTool.ts`、`bundledSkills.ts`、`commands.ts` の分析に基づく。
|
||||
|
||||
### 一、スキルソース:skills/ ディレクトリだけではない
|
||||
|
||||
教育版はすべてのスキルが `skills/` ディレクトリにあると想定している。CC は実際に複数のファイルに分散したソースから読み込む:`loadSkillsDir.ts` は user/project/`--add-dir` ディレクトリと legacy commands(`.claude/commands/`)を担当、`bundledSkills.ts` は組み込みスキル、`SkillTool.ts` は MCP リモートスキル、`commands.ts` はコマンド集約を担当。タイプには managed/policy skills、user skills(`~/.claude/skills/`)、project skills(`.claude/skills/`)、`--add-dir` skills、legacy commands、dynamic skills、conditional skills(`paths` frontmatter を持ち、ファイルパスでアクティベート)、bundled skills、plugin skills、MCP skills が含まれる。
|
||||
|
||||
### 二、SKILL.md Frontmatter の一般的なフィールド
|
||||
|
||||
CC の SKILL.md YAML frontmatter は `parseSkillFrontmatterFields()`(`loadSkillsDir.ts`)で解析される。一般的なフィールド:
|
||||
|
||||
| フィールド | 用途 |
|
||||
|-----------|------|
|
||||
| `name` / `description` | 表示名と説明 |
|
||||
| `when_to_use` | モデルにいつ呼び出すかを指導 |
|
||||
| `allowed-tools` | スキルが使用可能なツールの自動許可リスト |
|
||||
| `context` | `inline`(デフォルト)または `fork`(サブ Agent として実行) |
|
||||
| `model` | モデルオーバーライド(haiku/sonnet/opus/inherit) |
|
||||
| `hooks` | スキルレベルのフック設定 |
|
||||
| `paths` | 条件付きアクティベーションの glob パターン |
|
||||
| `user-invocable` | ユーザーが `/name` で呼び出し可能 |
|
||||
|
||||
完全なフィールドリストはバージョンによって変動する。上記は教育版に関連するコアフィールドのみ。
|
||||
|
||||
### 三、2 層読み込みの正確な実装
|
||||
|
||||
1. **カタログ(起動時)**:`getSkillDirCommands()` がディレクトリをスキャン → メタデータのみを含む `Command` オブジェクトとして登録。`getSkillListingAttachments()` がスキルリストを添付ファイルとしてフォーマット、コンテキストウィンドウの ~1% を予算とする(上限 8000 文字)。
|
||||
2. **読み込み(呼び出し時)**:モデルが `Skill` ツールを呼び出す(入力フィールドは `skill` + オプションの `args`、教育版は `name` を使用)→ `getPromptForCommand()` が完全な SKILL.md 内容を展開 → `SkillTool` が返す tool_result の表示テキストは `"Launching skill: {name}"` のみ、実際のスキル内容は `newMessages` を通じて注入される。教育版では両者を「tool_result を通じて注入」として簡略化している。
|
||||
|
||||
### 教育版の単純化は意図的
|
||||
|
||||
- 複数ファイル・複数ソース → 1 つの `skills/` ディレクトリ:2 層読み込みの核心概念を示すのに十分
|
||||
- 複数の frontmatter フィールド → name/description のみ解析:解析の複雑さを削減
|
||||
- forked skills(`context: 'fork'`)→ 省略:教学版では inline skill loading のみ展開する
|
||||
- `Skill` ツールの入力 `skill`+`args` → 教育版は `name` を使用:追加の引数解析の複雑さを回避
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
182
s07_skill_loading/README.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# s07: Skill Loading — 用到的时候才加载
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → `s07` → [s08](../s08_context_compact/) → s09 → ... → s20
|
||||
> *"用到时再加载, 别全塞 prompt 里"* — 通过 tool_result 注入, 不塞 system prompt。
|
||||
>
|
||||
> **Harness 层**: 知识 — 按需加载, 不堆满上下文。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
你的项目有一套 React 组件规范、一份 SQL 风格指南、一份 API 设计文档。你希望 Agent 自动遵守这些规范。最直接的想法,全塞进 system prompt:
|
||||
|
||||
```python
|
||||
SYSTEM = (
|
||||
f"You are a coding agent. "
|
||||
+ open("docs/react-style.md").read() # 2000 行
|
||||
+ open("docs/sql-style.md").read() # 1500 行
|
||||
+ open("docs/api-design.md").read() # 3000 行
|
||||
)
|
||||
```
|
||||
|
||||
6500 行 system prompt。Agent 每次调用 LLM 都带着这些文档——不管是在改 CSS 颜色还是修 SQL 查询。99% 的内容和当前任务无关,白白消耗 token。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
保留上一章的最小 hook 结构、`todo_write` 和子 Agent,本章重点转向新增的 `load_skill` 工具。启动时把技能目录注入 SYSTEM prompt,运行时多注册一个工具加载完整内容,用到才花 token。
|
||||
|
||||
两层设计:
|
||||
|
||||
| 层 | 位置 | 时机 | 代价 |
|
||||
|---|------|------|------|
|
||||
| 1. 目录 | system prompt | 启动时注入(harness 扫描 skills/) | ~100 tokens/skill,每轮都带 |
|
||||
| 2. 内容 | tool_result | Agent 调用 load_skill 时 | ~2000 tokens/skill,按需 |
|
||||
|
||||
dispatch 机制不变,load_skill 通过 `TOOL_HANDLERS[block.name]` 分发。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||
**skills/ 目录**,每个技能一个子目录,包含 `SKILL.md` 文件:
|
||||
|
||||
```
|
||||
skills/
|
||||
agent-builder/SKILL.md
|
||||
code-review/SKILL.md
|
||||
mcp-builder/SKILL.md
|
||||
pdf/SKILL.md
|
||||
```
|
||||
|
||||
**第一级:启动时注入目录**:harness 启动时调用 `_scan_skills()` 扫描 skills/ 目录,解析每个 SKILL.md 的 YAML frontmatter(`name`、`description`),存入 `SKILL_REGISTRY` 字典。`list_skills()` 从注册表生成目录,注入 SYSTEM prompt。Agent 每轮都能看到"我有哪些技能可用",不花额外 API 调用:
|
||||
|
||||
```python
|
||||
SKILL_REGISTRY: dict[str, dict] = {}
|
||||
|
||||
def _scan_skills():
|
||||
if not SKILLS_DIR.exists():
|
||||
return
|
||||
for d in sorted(SKILLS_DIR.iterdir()):
|
||||
if not d.is_dir():
|
||||
continue
|
||||
manifest = d / "SKILL.md"
|
||||
if manifest.exists():
|
||||
raw = manifest.read_text()
|
||||
meta, body = _parse_frontmatter(raw)
|
||||
name = meta.get("name", d.name)
|
||||
desc = meta.get("description", raw.split("\n")[0].lstrip("#").strip())
|
||||
SKILL_REGISTRY[name] = {"name": name, "description": desc, "content": raw}
|
||||
|
||||
_scan_skills() # runs once at startup
|
||||
|
||||
def list_skills() -> str:
|
||||
return "\n".join(f"- **{s['name']}**: {s['description']}" for s in SKILL_REGISTRY.values())
|
||||
|
||||
def build_system() -> str:
|
||||
catalog = list_skills()
|
||||
return (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
f"Skills available:\n{catalog}\n"
|
||||
"Use load_skill to get full details when needed."
|
||||
)
|
||||
|
||||
SYSTEM = build_system()
|
||||
```
|
||||
|
||||
**第二级:load_skill**:Agent 决定"我需要 SQL 风格指南",调用 `load_skill("sql-style")`。通过注册表查找,不走文件路径,没有路径遍历风险。内容通过 `tool_result` 注入:
|
||||
|
||||
```python
|
||||
def load_skill(name: str) -> str:
|
||||
skill = SKILL_REGISTRY.get(name)
|
||||
if not skill:
|
||||
return f"Skill not found: {name}"
|
||||
return skill["content"]
|
||||
```
|
||||
|
||||
关键区别:技能内容不是 system prompt 的一部分,它作为一次工具结果进入当前 messages。后续调用会随历史一起携带,直到上下文压缩、截断或会话结束。这和 s08 的 compact 自然衔接:按需加载解决了"不该提前带的不要带",compact 解决"该丢的怎么丢"。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s06 的变更
|
||||
|
||||
| 组件 | 之前 (s06) | 之后 (s07) |
|
||||
|------|-----------|-----------|
|
||||
| 工具数量 | 7 (bash, read, write, edit, glob, todo_write, task) | 8 (+load_skill) |
|
||||
| 知识加载 | 无 | 两级:启动时目录注入 SYSTEM + 运行时 load_skill |
|
||||
| SYSTEM 提示 | 静态字符串 | 启动时扫描 skills/ 注入目录 |
|
||||
| 技能注册表 | 无 | SKILL_REGISTRY(启动时填充,防路径遍历) |
|
||||
| 循环 | 不变 | 不变(skill 工具自动分发) |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s07_skill_loading/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `What skills are available?`
|
||||
2. `Load the code-review skill and follow its instructions`
|
||||
3. `I need to do a code review -- load the relevant skill first`
|
||||
|
||||
观察重点:Agent 是否直接从 SYSTEM 里的目录知道有哪些技能?需要完整规范时是否出现 `[HOOK] load_skill`?加载后回答是否使用了对应 skill 的说明?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
按需加载解决了"不该带的不要带"。但另一个问题来了:Agent 连续工作 30 分钟后,messages 列表塞满了中间过程。旧的 tool_result、过时的文件内容,占着上下文但不产生价值。
|
||||
|
||||
s08 Context Compact → 四层压缩策略。便宜的先跑,贵的后跑。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `loadSkillsDir.ts`、`SkillTool.ts`、`bundledSkills.ts`、`commands.ts` 的分析。
|
||||
|
||||
### 一、技能来源:不是只有一个 skills/ 目录
|
||||
|
||||
教学版假设所有技能在 `skills/` 目录下。CC 实际从多个来源加载,分布在多个文件中:`loadSkillsDir.ts` 负责从 user/project/`--add-dir` 目录和 legacy commands(`.claude/commands/`)加载;`bundledSkills.ts` 负责内置技能;`SkillTool.ts` 处理 MCP 远程技能;`commands.ts` 负责命令聚合。类型包括 managed/policy skills、user skills(`~/.claude/skills/`)、project skills(`.claude/skills/`)、`--add-dir` skills、legacy commands、dynamic skills、conditional skills(带 `paths` frontmatter,按文件路径激活)、bundled skills、plugin skills、MCP skills。
|
||||
|
||||
### 二、SKILL.md Frontmatter 常见字段
|
||||
|
||||
CC 的 SKILL.md YAML frontmatter 由 `parseSkillFrontmatterFields()` 解析(`loadSkillsDir.ts`),常见字段包括:
|
||||
|
||||
| 字段 | 用途 |
|
||||
|------|------|
|
||||
| `name` / `description` | 显示名称和描述 |
|
||||
| `when_to_use` | 指导模型何时调用 |
|
||||
| `allowed-tools` | 技能可用工具的自动允许列表 |
|
||||
| `context` | `inline`(默认)或 `fork`(作为子 Agent 运行) |
|
||||
| `model` | 模型覆盖(haiku/sonnet/opus/inherit) |
|
||||
| `hooks` | 技能级别的 hook 配置 |
|
||||
| `paths` | 条件激活的 glob 模式 |
|
||||
| `user-invocable` | 用户可以通过 `/name` 调用 |
|
||||
|
||||
完整字段列表随版本迭代会变化,以上仅列出教学版涉及的核心字段。
|
||||
|
||||
### 三、两级加载的精确实现
|
||||
|
||||
1. **Catalog(启动时)**:`getSkillDirCommands()` 扫描目录 → 注册为 `Command` 对象,只包含元数据。`getSkillListingAttachments()` 把技能列表格式化为附件,预算为上下文窗口的 ~1%(上限 8000 字符)。
|
||||
2. **Load(调用时)**:模型调 `Skill` 工具(输入字段是 `skill` + 可选 `args`,教学版用 `name`)→ `getPromptForCommand()` 展开完整 SKILL.md 内容 → `SkillTool` 返回的 tool_result 展示文本只是 `"Launching skill: {name}"`,真正的技能内容通过 `newMessages` 注入对话。教学版把两者合并为"通过 tool_result 注入"是一种简化。
|
||||
|
||||
### 教学版的简化是刻意的
|
||||
|
||||
- 多文件多来源 → 1 个 `skills/` 目录:足以展示两级加载的核心概念
|
||||
- 多个 frontmatter 字段 → 只解析 name/description:减少解析复杂度
|
||||
- forked skills(`context: 'fork'`)→ 省略:教学版只展开 inline 技能加载
|
||||
- `Skill` 工具输入 `skill`+`args` → 教学版用 `name`:避免参数解析的额外复杂度
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
411
s07_skill_loading/code.py
Normal file
@@ -0,0 +1,411 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s07: Skill Loading — two-level on-demand knowledge injection.
|
||||
|
||||
Layer 1 (cheap, always present):
|
||||
SYSTEM prompt includes skill names + one-line descriptions (~100 tokens/skill)
|
||||
"Skills available: agent-builder, code-review, mcp-builder, pdf"
|
||||
|
||||
Layer 2 (expensive, on demand):
|
||||
Agent calls load_skill("code-review") → full SKILL.md content
|
||||
injected via tool_result (~2000 tokens/skill)
|
||||
|
||||
skills/
|
||||
agent-builder/SKILL.md
|
||||
code-review/SKILL.md
|
||||
mcp-builder/SKILL.md
|
||||
pdf/SKILL.md
|
||||
|
||||
Changes from s06:
|
||||
+ build_system() — scan skills/ dir at startup, inject catalog into SYSTEM
|
||||
+ load_skill(name) — return full SKILL.md content via tool_result
|
||||
+ SKILLS_DIR config
|
||||
Loop unchanged: load_skill auto-dispatches via TOOL_HANDLERS.
|
||||
|
||||
Run: python s07_skill_loading/code.py
|
||||
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
|
||||
"""
|
||||
|
||||
import os, subprocess, json
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
SKILLS_DIR = WORKDIR / "skills"
|
||||
TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True)
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
# s07: Skill catalog scan (used by build_system below)
|
||||
def _parse_frontmatter(text: str) -> tuple[dict, str]:
|
||||
"""Parse YAML frontmatter from SKILL.md. Returns (meta, body)."""
|
||||
if not text.startswith("---"):
|
||||
return {}, text
|
||||
parts = text.split("---", 2)
|
||||
if len(parts) < 3:
|
||||
return {}, text
|
||||
meta = {}
|
||||
for line in parts[1].strip().splitlines():
|
||||
if ":" in line:
|
||||
k, v = line.split(":", 1)
|
||||
meta[k.strip()] = v.strip().strip('"').strip("'")
|
||||
return meta, parts[2].strip()
|
||||
|
||||
# Build skill registry at startup (used for safe lookup in load_skill)
|
||||
SKILL_REGISTRY: dict[str, dict] = {}
|
||||
|
||||
def _scan_skills():
|
||||
"""Scan skills/ dir, populate SKILL_REGISTRY with name/description/content."""
|
||||
if not SKILLS_DIR.exists():
|
||||
return
|
||||
for d in sorted(SKILLS_DIR.iterdir()):
|
||||
if not d.is_dir():
|
||||
continue
|
||||
manifest = d / "SKILL.md"
|
||||
if manifest.exists():
|
||||
raw = manifest.read_text()
|
||||
meta, body = _parse_frontmatter(raw)
|
||||
name = meta.get("name", d.name)
|
||||
desc = meta.get("description", raw.split("\n")[0].lstrip("#").strip())
|
||||
SKILL_REGISTRY[name] = {"name": name, "description": desc, "content": raw}
|
||||
|
||||
_scan_skills()
|
||||
|
||||
def list_skills() -> str:
|
||||
"""List all skills (name + one-line description)."""
|
||||
if not SKILL_REGISTRY:
|
||||
return "(no skills found)"
|
||||
return "\n".join(f"- **{s['name']}**: {s['description']}" for s in SKILL_REGISTRY.values())
|
||||
|
||||
# s07: SYSTEM includes skill catalog (cheap — just names + descriptions)
|
||||
def build_system() -> str:
|
||||
"""Build SYSTEM prompt with skill catalog injected at startup."""
|
||||
catalog = list_skills()
|
||||
return (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
f"Skills available:\n{catalog}\n"
|
||||
"Use load_skill to get full details when needed."
|
||||
)
|
||||
|
||||
SYSTEM = build_system()
|
||||
|
||||
# s07: subagent gets its own system prompt — no skill loading, no task
|
||||
SUB_SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Complete the task you were given, then return a concise summary. "
|
||||
"Do not delegate further."
|
||||
)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02-s06 (unchanged): Tool Implementations
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text:
|
||||
return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
def run_todo_write(todos: list) -> str:
|
||||
for i, t in enumerate(todos):
|
||||
if "content" not in t or "status" not in t:
|
||||
return f"Error: todos[{i}] missing 'content' or 'status'"
|
||||
if t["status"] not in ("pending", "in_progress", "completed"):
|
||||
return f"Error: todos[{i}] has invalid status '{t['status']}'"
|
||||
tasks_file = TASKS_DIR / "current_todos.json"
|
||||
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
|
||||
lines = ["\n\033[33m## Current Tasks\033[0m"]
|
||||
for t in todos:
|
||||
icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]]
|
||||
lines.append(f" [{icon}] {t['content']}")
|
||||
print("\n".join(lines))
|
||||
return f"Updated {len(todos)} tasks"
|
||||
|
||||
def extract_text(content) -> str:
|
||||
if not isinstance(content, list):
|
||||
return str(content)
|
||||
return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text")
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s06 (unchanged): Subagent
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
SUB_TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
]
|
||||
SUB_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob}
|
||||
|
||||
def spawn_subagent(description: str) -> str:
|
||||
print(f"\n\033[35m[Subagent spawned]\033[0m")
|
||||
messages = [{"role": "user", "content": description}]
|
||||
for _ in range(30):
|
||||
response = client.messages.create(model=MODEL, system=SUB_SYSTEM,
|
||||
messages=messages, tools=SUB_TOOLS, max_tokens=8000)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
handler = SUB_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
print(f" \033[90m[sub] {block.name}: {str(output)[:100]}\033[0m")
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
result = extract_text(messages[-1]["content"])
|
||||
if not result:
|
||||
for msg in reversed(messages):
|
||||
if msg["role"] == "assistant":
|
||||
result = extract_text(msg["content"])
|
||||
if result:
|
||||
break
|
||||
if not result:
|
||||
result = "Subagent stopped after 30 turns without final answer."
|
||||
print(f"\033[35m[Subagent done]\033[0m")
|
||||
return result
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s07: load_skill — runtime full content loading
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def load_skill(name: str) -> str:
|
||||
"""Load full skill content. Lookup via registry — no path traversal."""
|
||||
skill = SKILL_REGISTRY.get(name)
|
||||
if not skill:
|
||||
return f"Skill not found: {name}"
|
||||
return skill["content"]
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# Tool Registry — all tools from s02-s07
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
{"name": "todo_write", "description": "Create and manage a task list for your current coding session.",
|
||||
"input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}, "required": ["content", "status"]}}}, "required": ["todos"]}},
|
||||
{"name": "task", "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.",
|
||||
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}},
|
||||
# s07: skill tool (catalog is already in SYSTEM prompt, this loads full content)
|
||||
{"name": "load_skill", "description": "Load the full content of a skill by name.",
|
||||
"input_schema": {"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write,
|
||||
"task": spawn_subagent, "load_skill": load_skill,
|
||||
}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s04 (unchanged): Hook System
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
HOOKS = {"UserPromptSubmit": [], "PreToolUse": [], "PostToolUse": [], "Stop": []}
|
||||
|
||||
def register_hook(event: str, callback):
|
||||
HOOKS[event].append(callback)
|
||||
|
||||
def trigger_hooks(event: str, *args):
|
||||
for callback in HOOKS[event]:
|
||||
result = callback(*args)
|
||||
if result is not None:
|
||||
return result
|
||||
return None
|
||||
|
||||
DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if="]
|
||||
|
||||
def permission_hook(block):
|
||||
if block.name == "bash":
|
||||
for p in DENY_LIST:
|
||||
if p in block.input.get("command", ""):
|
||||
print(f"\n\033[31m⛔ Blocked: '{p}'\033[0m")
|
||||
return "Permission denied"
|
||||
return None
|
||||
|
||||
def log_hook(block):
|
||||
print(f"\033[90m[HOOK] {block.name}\033[0m")
|
||||
return None
|
||||
|
||||
def context_inject_hook(query: str):
|
||||
print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m")
|
||||
return None
|
||||
|
||||
def summary_hook(messages: list):
|
||||
tool_count = sum(1 for m in messages
|
||||
for b in (m.get("content") if isinstance(m.get("content"), list) else [])
|
||||
if isinstance(b, dict) and b.get("type") == "tool_result")
|
||||
print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m")
|
||||
return None
|
||||
|
||||
register_hook("UserPromptSubmit", context_inject_hook)
|
||||
register_hook("PreToolUse", permission_hook)
|
||||
register_hook("PreToolUse", log_hook)
|
||||
register_hook("Stop", summary_hook)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — same as s05-s06 + nag reminder
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
rounds_since_todo = 0
|
||||
|
||||
def agent_loop(messages: list):
|
||||
global rounds_since_todo
|
||||
while True:
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
last = messages[-1]
|
||||
if last["role"] == "user" and isinstance(last.get("content"), list):
|
||||
last["content"].insert(0, {
|
||||
"type": "text",
|
||||
"text": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
force = trigger_hooks("Stop", messages)
|
||||
if force:
|
||||
messages.append({"role": "user", "content": force})
|
||||
continue
|
||||
return
|
||||
|
||||
rounds_since_todo += 1
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
|
||||
if block.name == "todo_write":
|
||||
rounds_since_todo = 0
|
||||
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": output})
|
||||
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s07: Skill Loading — catalog in SYSTEM, content on demand")
|
||||
print("Type a question, press Enter. Type q to quit.\n")
|
||||
|
||||
history = []
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms07 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
trigger_hooks("UserPromptSubmit", query)
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
110
s07_skill_loading/images/skill-overview.en.svg
Normal file
@@ -0,0 +1,110 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 380" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="800" height="380" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Skill Loading — catalog at startup, content on demand</text>
|
||||
|
||||
<!-- ===== History preserved ===== -->
|
||||
<text x="50" y="96" fill="#94a3b8" font-size="11" font-weight="600">History preserved</text>
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="40" y="108" width="110" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="95" y="135" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="150" y1="130" x2="198" y2="130" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="200" y="106" width="110" height="48" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="255" y="128" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="255" y="146" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- No → return -->
|
||||
<line x1="255" y1="154" x2="255" y2="178" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
<text x="268" y="172" fill="#2563eb" font-size="9" font-weight="600">No</text>
|
||||
<rect x="200" y="180" width="110" height="28" rx="14" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="255" y="198" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">Return result</text>
|
||||
|
||||
<!-- Yes → PreToolUse -->
|
||||
<line x1="310" y1="130" x2="348" y2="130" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="325" y="122" fill="#d97706" font-size="9" font-weight="600">Yes</text>
|
||||
|
||||
<!-- PreToolUse (s04) -->
|
||||
<rect x="350" y="108" width="90" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="395" y="128" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">trigger_hooks</text>
|
||||
<text x="395" y="142" fill="#64748b" font-size="8" text-anchor="middle">PreToolUse</text>
|
||||
|
||||
<!-- → TOOL_HANDLERS -->
|
||||
<line x1="440" y1="130" x2="488" y2="130" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== TOOL_HANDLERS ===== -->
|
||||
<rect x="490" y="88" width="120" height="130" rx="10" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="550" y="108" fill="#1e3a5f" font-size="10" font-weight="700" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- s06 tools -->
|
||||
<rect x="500" y="116" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="129" fill="#1e3a5f" font-size="8" text-anchor="middle">bash · read · write</text>
|
||||
<rect x="500" y="138" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="151" fill="#1e3a5f" font-size="8" text-anchor="middle">edit · glob · todo</text>
|
||||
<rect x="500" y="160" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="173" fill="#1e3a5f" font-size="8" text-anchor="middle">task (subagent)</text>
|
||||
|
||||
<rect x="500" y="186" width="100" height="22" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="550" y="201" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">load_skill</text>
|
||||
|
||||
<!-- ===== Loop back ===== -->
|
||||
<path d="M 550 218 L 550 270 L 95 270 L 95 152" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="320" y="290" fill="#64748b" font-size="10" text-anchor="middle">Results appended to messages[], loop continues</text>
|
||||
|
||||
<!-- ===== s07 two-level injection labels (right side) ===== -->
|
||||
|
||||
<text x="710" y="220" fill="#16a34a" font-size="11" font-weight="700" text-anchor="middle">s07 new</text>
|
||||
|
||||
<!-- ① Startup SYSTEM injection -->
|
||||
<rect x="640" y="86" width="140" height="56" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="710" y="106" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">① build_system()</text>
|
||||
<text x="710" y="120" fill="#64748b" font-size="8" text-anchor="middle">Scan skills/ first line at startup</text>
|
||||
<text x="710" y="134" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">→ inject SYSTEM prompt</text>
|
||||
|
||||
<!-- ② Runtime load_skill → tool_result -->
|
||||
<rect x="640" y="148" width="140" height="56" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="710" y="168" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">② load_skill(name)</text>
|
||||
<text x="710" y="182" fill="#64748b" font-size="8" text-anchor="middle">Read full SKILL.md at runtime</text>
|
||||
<text x="710" y="196" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">→ inject tool_result</text>
|
||||
|
||||
<!-- ① → LLM (top connection) -->
|
||||
<path d="M 710 86 L 710 68 L 255 68 L 255 106" fill="none" stroke="#16a34a" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-green)"/>
|
||||
<text x="480" y="64" fill="#166534" font-size="7" font-weight="600">SYSTEM has skill catalog, carried every turn</text>
|
||||
|
||||
<!-- ② → load_skill (short connection) -->
|
||||
<line x1="640" y1="176" x2="600" y2="197" stroke="#16a34a" stroke-width="1.5" stroke-dasharray="4,3"/>
|
||||
|
||||
<!-- ===== Legend ===== -->
|
||||
<rect x="60" y="308" width="680" height="44" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="322" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="332" fill="#334155" font-size="10">History preserved (loop, hooks, TODO, subagent — unchanged)</text>
|
||||
<rect x="80" y="338" width="12" height="12" rx="2" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="100" y="348" fill="#334155" font-size="10">s07 new (startup catalog in SYSTEM + load_skill tool)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.9 KiB |
110
s07_skill_loading/images/skill-overview.ja.svg
Normal file
@@ -0,0 +1,110 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 380" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="800" height="380" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Skill Loading — 起動時にカタログ注入、実行時にオンデマンド読み込み</text>
|
||||
|
||||
<!-- ===== 過去章を保持 ===== -->
|
||||
<text x="50" y="96" fill="#94a3b8" font-size="11" font-weight="600">過去章を保持</text>
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="40" y="108" width="110" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="95" y="135" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="150" y1="130" x2="198" y2="130" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="200" y="106" width="110" height="48" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="255" y="128" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="255" y="146" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- No → 戻る -->
|
||||
<line x1="255" y1="154" x2="255" y2="178" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
<text x="268" y="172" fill="#2563eb" font-size="9" font-weight="600">No</text>
|
||||
<rect x="200" y="180" width="110" height="28" rx="14" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="255" y="198" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
|
||||
<!-- Yes → PreToolUse -->
|
||||
<line x1="310" y1="130" x2="348" y2="130" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="325" y="122" fill="#d97706" font-size="9" font-weight="600">Yes</text>
|
||||
|
||||
<!-- PreToolUse (s04) -->
|
||||
<rect x="350" y="108" width="90" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="395" y="128" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">trigger_hooks</text>
|
||||
<text x="395" y="142" fill="#64748b" font-size="8" text-anchor="middle">PreToolUse</text>
|
||||
|
||||
<!-- → TOOL_HANDLERS -->
|
||||
<line x1="440" y1="130" x2="488" y2="130" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== TOOL_HANDLERS ===== -->
|
||||
<rect x="490" y="88" width="120" height="130" rx="10" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="550" y="108" fill="#1e3a5f" font-size="10" font-weight="700" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- s06 ツール -->
|
||||
<rect x="500" y="116" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="129" fill="#1e3a5f" font-size="8" text-anchor="middle">bash · read · write</text>
|
||||
<rect x="500" y="138" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="151" fill="#1e3a5f" font-size="8" text-anchor="middle">edit · glob · todo</text>
|
||||
<rect x="500" y="160" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="173" fill="#1e3a5f" font-size="8" text-anchor="middle">task (subagent)</text>
|
||||
|
||||
<rect x="500" y="186" width="100" height="22" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="550" y="201" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">load_skill</text>
|
||||
|
||||
<!-- ===== ループバック ===== -->
|
||||
<path d="M 550 218 L 550 270 L 95 270 L 95 152" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="320" y="290" fill="#64748b" font-size="10" text-anchor="middle">結果を messages[] に追加、ループ継続</text>
|
||||
|
||||
<!-- ===== s07 2 層注入ラベル(右側) ===== -->
|
||||
|
||||
<text x="710" y="220" fill="#16a34a" font-size="11" font-weight="700" text-anchor="middle">s07 新規</text>
|
||||
|
||||
<!-- ① 起動時 SYSTEM 注入 -->
|
||||
<rect x="640" y="86" width="140" height="56" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="710" y="106" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">① build_system()</text>
|
||||
<text x="710" y="120" fill="#64748b" font-size="8" text-anchor="middle">起動時に skills/ の 1 行目をスキャン</text>
|
||||
<text x="710" y="134" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">→ SYSTEM プロンプトに注入</text>
|
||||
|
||||
<!-- ② 実行時 load_skill → tool_result -->
|
||||
<rect x="640" y="148" width="140" height="56" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="710" y="168" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">② load_skill(name)</text>
|
||||
<text x="710" y="182" fill="#64748b" font-size="8" text-anchor="middle">実行時に完全な SKILL.md を読み取り</text>
|
||||
<text x="710" y="196" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">→ tool_result に注入</text>
|
||||
|
||||
<!-- ① → LLM(上部接続) -->
|
||||
<path d="M 710 86 L 710 68 L 255 68 L 255 106" fill="none" stroke="#16a34a" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-green)"/>
|
||||
<text x="480" y="64" fill="#166534" font-size="7" font-weight="600">SYSTEM にスキルカタログ、毎ターン携帯</text>
|
||||
|
||||
<!-- ② → load_skill(短接続) -->
|
||||
<line x1="640" y1="176" x2="600" y2="197" stroke="#16a34a" stroke-width="1.5" stroke-dasharray="4,3"/>
|
||||
|
||||
<!-- ===== 凡例 ===== -->
|
||||
<rect x="60" y="308" width="680" height="44" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="322" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="332" fill="#334155" font-size="10">過去章を保持(ループ、フック、TODO、サブ Agent — 変更なし)</text>
|
||||
<rect x="80" y="338" width="12" height="12" rx="2" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="100" y="348" fill="#334155" font-size="10">s07 新規(起動時カタログ注入 SYSTEM + load_skill ツール)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.1 KiB |
110
s07_skill_loading/images/skill-overview.svg
Normal file
@@ -0,0 +1,110 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 380" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="800" height="380" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="800" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="800" height="8" fill="url(#header)"/>
|
||||
<text x="400" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Skill Loading — 启动时注入目录,运行时按需加载内容</text>
|
||||
|
||||
<!-- ===== 历史章节保留 ===== -->
|
||||
<text x="50" y="96" fill="#94a3b8" font-size="11" font-weight="600">历史章节保留</text>
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="40" y="108" width="110" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="95" y="135" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- → LLM -->
|
||||
<line x1="150" y1="130" x2="198" y2="130" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="200" y="106" width="110" height="48" rx="8" fill="#fff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="255" y="128" fill="#1e3a5f" font-size="13" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="255" y="146" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- 否 → 返回 -->
|
||||
<line x1="255" y1="154" x2="255" y2="178" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
<text x="268" y="172" fill="#2563eb" font-size="9" font-weight="600">否</text>
|
||||
<rect x="200" y="180" width="110" height="28" rx="14" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="255" y="198" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
|
||||
<!-- 是 → PreToolUse -->
|
||||
<line x1="310" y1="130" x2="348" y2="130" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="325" y="122" fill="#d97706" font-size="9" font-weight="600">是</text>
|
||||
|
||||
<!-- PreToolUse (s04) -->
|
||||
<rect x="350" y="108" width="90" height="44" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="395" y="128" fill="#1e3a5f" font-size="9" font-weight="600" text-anchor="middle">trigger_hooks</text>
|
||||
<text x="395" y="142" fill="#64748b" font-size="8" text-anchor="middle">PreToolUse</text>
|
||||
|
||||
<!-- → TOOL_HANDLERS -->
|
||||
<line x1="440" y1="130" x2="488" y2="130" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== TOOL_HANDLERS ===== -->
|
||||
<rect x="490" y="88" width="120" height="130" rx="10" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="550" y="108" fill="#1e3a5f" font-size="10" font-weight="700" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
|
||||
<!-- s06 工具 -->
|
||||
<rect x="500" y="116" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="129" fill="#1e3a5f" font-size="8" text-anchor="middle">bash · read · write</text>
|
||||
<rect x="500" y="138" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="151" fill="#1e3a5f" font-size="8" text-anchor="middle">edit · glob · todo</text>
|
||||
<rect x="500" y="160" width="100" height="18" rx="3" fill="#fff" stroke="#2563eb" stroke-width="0.8"/>
|
||||
<text x="550" y="173" fill="#1e3a5f" font-size="8" text-anchor="middle">task (subagent)</text>
|
||||
|
||||
<rect x="500" y="186" width="100" height="22" rx="3" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="550" y="201" fill="#166534" font-size="9" font-weight="700" text-anchor="middle">load_skill</text>
|
||||
|
||||
<!-- ===== 回环 ===== -->
|
||||
<path d="M 550 218 L 550 270 L 95 270 L 95 152" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="320" y="290" fill="#64748b" font-size="10" text-anchor="middle">结果追加到 messages[],循环继续</text>
|
||||
|
||||
<!-- ===== s07 两级注入标注(右侧) ===== -->
|
||||
|
||||
<text x="710" y="220" fill="#16a34a" font-size="11" font-weight="700" text-anchor="middle">s07 新增</text>
|
||||
|
||||
<!-- ① 启动时注入 SYSTEM -->
|
||||
<rect x="640" y="86" width="140" height="56" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="710" y="106" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">① build_system()</text>
|
||||
<text x="710" y="120" fill="#64748b" font-size="8" text-anchor="middle">启动时扫描 skills/ 第一行</text>
|
||||
<text x="710" y="134" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">→ 注入 SYSTEM prompt</text>
|
||||
|
||||
<!-- ② 运行时 load_skill → tool_result -->
|
||||
<rect x="640" y="148" width="140" height="56" rx="8" fill="#f0fdf4" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="710" y="168" fill="#166534" font-size="10" font-weight="700" text-anchor="middle">② load_skill(name)</text>
|
||||
<text x="710" y="182" fill="#64748b" font-size="8" text-anchor="middle">运行时读完整 SKILL.md</text>
|
||||
<text x="710" y="196" fill="#166534" font-size="8" font-weight="600" text-anchor="middle">→ 注入 tool_result</text>
|
||||
|
||||
<!-- ① → LLM(顶部连线) -->
|
||||
<path d="M 710 86 L 710 68 L 255 68 L 255 106" fill="none" stroke="#16a34a" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-green)"/>
|
||||
<text x="480" y="64" fill="#166534" font-size="7" font-weight="600">SYSTEM 含技能目录,每轮都带</text>
|
||||
|
||||
<!-- ② → load_skill(短连接线) -->
|
||||
<line x1="640" y1="176" x2="600" y2="197" stroke="#16a34a" stroke-width="1.5" stroke-dasharray="4,3"/>
|
||||
|
||||
<!-- ===== 图例 ===== -->
|
||||
<rect x="60" y="308" width="680" height="44" rx="6" fill="#f1f5f9"/>
|
||||
<rect x="80" y="322" width="12" height="12" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="100" y="332" fill="#334155" font-size="10">历史章节保留(循环、钩子、TODO、subagent — 完全不变)</text>
|
||||
<rect x="80" y="338" width="12" height="12" rx="2" fill="#dcfce7" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="100" y="348" fill="#334155" font-size="10">s07 新增(启动时目录注入 SYSTEM + load_skill 工具)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.9 KiB |
293
s08_context_compact/README.en.md
Normal file
@@ -0,0 +1,293 @@
|
||||
# s08: Context Compact — Context Will Fill Up, Have a Way to Make Room
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
|
||||
> *"Context will fill up — have a way to make room"* — Four-layer compression pipeline: cheap first, expensive last.
|
||||
>
|
||||
> **Harness Layer**: Compression — clean memory, unlimited sessions.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
The agent is running along, then freezes.
|
||||
|
||||
It has bash, read, write — all the capabilities it needs. But it read a 1000-line file (~4000 tokens), then read 30 more files, ran 20 commands. Every command's output, every file's contents, all pile up in the `messages` list.
|
||||
|
||||
The context window is finite. Once full, the API outright rejects the call: `prompt_too_long`.
|
||||
|
||||
Without compression, an agent simply cannot work on large projects.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
The hook structure, skill loading, and sub-Agent from s07 are preserved, with some tools omitted to focus on compaction. The core change: insert three pre-processors (0 API calls) before each LLM call, trigger an LLM summary (1 API call) when tokens still exceed the threshold, and emergency-trim if the API throws an error.
|
||||
|
||||
Core design: cheap first, expensive last.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||

|
||||
|
||||
### L1: snip_compact — Trim Irrelevant Old Conversation
|
||||
|
||||
The agent ran 80 turns of conversation, accumulating 160 `messages`. The very first "help me create hello.py" is barely relevant to current work, yet it still occupies space.
|
||||
|
||||
Message count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle:
|
||||
|
||||
```python
|
||||
def snip_compact(messages, max_messages=50):
|
||||
if len(messages) <= max_messages:
|
||||
return messages
|
||||
keep_head, keep_tail = 3, max_messages - 3
|
||||
snipped = len(messages) - keep_head - keep_tail
|
||||
placeholder = {"role": "user",
|
||||
"content": f"[snipped {snipped} messages from conversation middle]"}
|
||||
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
|
||||
```
|
||||
|
||||
Entire messages are trimmed, but `tool_result` content within remaining messages keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.
|
||||
|
||||
### L2: micro_compact — Placeholder for Old Tool Results
|
||||
|
||||

|
||||
|
||||
The agent read 10 files consecutively. The full contents of reads 1–7 are still sitting in context, no longer needed, but hogging large amounts of space.
|
||||
|
||||
Keep only the 3 most recent `tool_result` entries intact; replace older ones with a one-line placeholder:
|
||||
|
||||
```python
|
||||
KEEP_RECENT_TOOL_RESULTS = 3
|
||||
|
||||
def micro_compact(messages):
|
||||
tool_results = collect_tool_result_blocks(messages)
|
||||
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
|
||||
return messages
|
||||
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
|
||||
if len(block.get("content", "")) > 120:
|
||||
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
|
||||
return messages
|
||||
```
|
||||
|
||||
Old results are cleared, but a single new result can be 500KB — one `cat` of a large file can max out the context. → L3.
|
||||
|
||||
### L3: tool_result_budget — Persist Large Results to Disk
|
||||
|
||||

|
||||
|
||||
The model read 5 large files in one go; all `tool_result` blocks in the last user message total 500KB.
|
||||
|
||||
Sum the size of all `tool_result` blocks in the last user message. If over 200KB → sort by size, starting from the largest, persist to `.task_outputs/tool-results/`, keeping only a `<persisted-output>` marker + a 2000-character preview in context. The model sees the marker and knows the full content is on disk, re-reading it when needed.
|
||||
|
||||
```python
|
||||
def tool_result_budget(messages, max_bytes=200_000):
|
||||
last = messages[-1]
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"])
|
||||
if b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= max_bytes:
|
||||
return messages
|
||||
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
|
||||
for idx, block in ranked:
|
||||
if total <= max_bytes:
|
||||
break
|
||||
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
|
||||
total = recalculate_total(blocks)
|
||||
return messages
|
||||
```
|
||||
|
||||
The first three layers are all plain-text / structural operations — 0 API calls — but they cannot "understand" conversation content. Context may still be too large. → L4.
|
||||
|
||||
### L4: compact_history — Full LLM Summary
|
||||
|
||||

|
||||
|
||||
All three previous layers have run, but after 30 minutes of continuous work on a huge project, tokens still exceed the threshold.
|
||||
|
||||
Three-step process:
|
||||
|
||||
1. **Save transcript**: Write the full conversation to `.transcripts/` in JSONL format. The transcript preserves a recoverable record, but the model's active context only contains the summary. For the model's current reasoning, the details are no longer in context. The teaching code does not provide a transcript retrieval tool.
|
||||
2. **LLM generates summary**: Send conversation history to the LLM, asking it to preserve key information: current goals, important findings, modified files, remaining work, user constraints, etc.
|
||||
3. **Replace message list**: All old messages are replaced with a single summary. The teaching version only keeps the summary; the real Claude Code re-attaches some recent files, plans, agent/skill/tool context after compaction.
|
||||
|
||||
```python
|
||||
def compact_history(messages):
|
||||
transcript_path = write_transcript(messages) # Save full conversation first
|
||||
summary = summarize_history(messages) # LLM generates summary
|
||||
return [{"role": "user",
|
||||
"content": f"[Compacted]\n\n{summary}"}]
|
||||
```
|
||||
|
||||
**Circuit breaker**: After 3 consecutive failures, stop retrying to prevent an infinite loop wasting API calls.
|
||||
|
||||
### Reactive: reactive_compact
|
||||
|
||||
Sometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers.
|
||||
|
||||
This triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, trimming to an API-acceptable size with byte-level precision, keeping only the last 5 messages + summary.
|
||||
|
||||
```python
|
||||
def reactive_compact(messages):
|
||||
transcript = write_transcript(messages)
|
||||
summary = summarize_history(messages)
|
||||
tail = messages[-5:]
|
||||
return [{"role": "user",
|
||||
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
|
||||
```
|
||||
|
||||
Reactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11.
|
||||
|
||||
### Putting It All Together
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# Three pre-processors (0 API calls)
|
||||
# Order: budget first, so large content is persisted before placeholders
|
||||
messages[:] = tool_result_budget(messages) # L3: persist large results
|
||||
messages[:] = snip_compact(messages) # L1: trim middle
|
||||
messages[:] = micro_compact(messages) # L2: old result placeholders
|
||||
|
||||
# Still too much? LLM summary (1 API call)
|
||||
if estimate_token_count(messages) > THRESHOLD:
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(...)
|
||||
except PromptTooLongError:
|
||||
if reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
messages[:] = reactive_compact(messages) # Emergency
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise # retry limit exceeded, raise exception
|
||||
# ... tool execution ...
|
||||
|
||||
# compact tool: when the model actively calls it, triggers compact_history
|
||||
if block.name == "compact":
|
||||
messages[:] = compact_history(messages)
|
||||
results.append({..., "content": "[Compacted. History summarized.]"})
|
||||
messages.append({"role": "user", "content": results})
|
||||
break # end current turn, start fresh with compacted context
|
||||
```
|
||||
|
||||
**The order must not be swapped.** L3 (budget) runs before L2 (micro) because micro replaces old large tool_results with one-line placeholders — budget must persist the full content before that happens. This is why CC source puts `applyToolResultBudget` first.
|
||||
|
||||
---
|
||||
|
||||
## Changes From s07
|
||||
|
||||
| Component | Before (s07) | After (s08) |
|
||||
|-----------|-------------|-------------|
|
||||
| Context management | None (context grows unbounded) | Four-layer compression pipeline + emergency |
|
||||
| New functions | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
|
||||
| Tools | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
|
||||
| Loop | LLM call → tool execution | Three pre-processors before each turn + threshold-triggered compact_history |
|
||||
| Design principle | — | Cheap first, expensive last |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s08_context_compact/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md` (read multiple files consecutively, observe L2 compressing old results)
|
||||
2. `Read every file in s08_context_compact/` (read a large amount of content at once, observe L3 persisting to disk)
|
||||
3. Chat for 20+ turns, observe whether `[auto compact]` or `[reactive compact]` appears
|
||||
|
||||
What to watch for: After each tool execution, are old `tool_result` entries compressed? When tokens exceed the threshold after extended conversation, is summarization triggered automatically?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
Context compression lets an agent run for a long time without crashing. But after each compression, the preferences and constraints the user told it are also lost. Can we let the agent selectively remember important things?
|
||||
|
||||
s09 Memory → three subsystems: choosing what to remember, extracting key information, consolidating and organizing. Across compressions, across sessions.
|
||||
|
||||
<details>
|
||||
<summary>Deep Dive Into CC Source Code</summary>
|
||||
|
||||
> The following is based on analysis of CC source code `compact.ts`, `autoCompact.ts`, `microCompact.ts`, and `query.ts`.
|
||||
|
||||
### Execution Order Comparison
|
||||
|
||||
The teaching version labels layers L1/L2/L3/L4 for pedagogical clarity, but actual execution order does not match the numbering:
|
||||
|
||||
| Dimension | Teaching Version | Claude Code |
|
||||
|-----------|-----------------|-------------|
|
||||
| Execution order | budget → snip → micro → auto | budget → snip → micro → collapse → auto (`query.ts:379-468`) |
|
||||
| snip_compact | Keep head 3 + tail 47 | CC only enables on main thread; implementation not in open-source repo (`HISTORY_SNIP` feature gate), but interface is visible: `snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`, also exposes `SnipTool` for model-initiated snipping. Teaching version's 3/47 are simplified parameters |
|
||||
| micro_compact | Text placeholder replacement | Two paths: time-based clears content directly, cached uses API `cache_edits` (legacy path removed) |
|
||||
| micro_compact whitelist | By position (most recent 3) | time-based triggers by time threshold; cached triggers by count (`microCompact.ts`) |
|
||||
| tool_result_budget | 200KB characters | 200,000 characters (`toolLimits.ts:49`) |
|
||||
| compact_history threshold | Character count estimate | Precise tokens: `contextWindow - maxOutputTokens - 13_000` |
|
||||
| Summary requirements | 5 categories of info | 9 sections + `<analysis>`/`<summary>` dual tags |
|
||||
| Compression prompt | Simple prompt | Double-ended hard guardrails forbidding tool calls |
|
||||
| PTL retry | Yes (simplified) | `truncateHeadForPTLRetry()` retreats by message groups (`compact.ts:243-290`) |
|
||||
| Post-compaction recovery | None (teaching version only keeps summary) | Auto re-read recent files, plans, agent/skill/tool context |
|
||||
| Circuit breaker | 3 times | 3 times (`autoCompact.ts:70`) |
|
||||
| Reactive retry | 1 time | CC has more granular tiered retries |
|
||||
|
||||
### Execution Order Details
|
||||
|
||||
The real order in CC source `query.ts`:
|
||||
|
||||
1. `applyToolResultBudget` (L379): persist large results first, ensuring full content is saved
|
||||
2. `snipCompact` (L403): trim middle messages
|
||||
3. `microcompact` (L414): old result placeholders
|
||||
4. `contextCollapse` (L441): independent context management system (not in teaching version)
|
||||
5. `autoCompact` (L454): LLM full summary
|
||||
|
||||
The teaching version's budget → snip → micro order matches this. The teaching version does not have the contextCollapse mechanism.
|
||||
|
||||
### Full Constant Reference
|
||||
|
||||
| Constant | Value | Source File |
|
||||
|----------|-------|-------------|
|
||||
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
|
||||
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
|
||||
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
|
||||
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
|
||||
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
|
||||
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
|
||||
| Time micro_compact interval | 60 minutes | `timeBasedMCConfig.ts` |
|
||||
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
|
||||
|
||||
### contextCollapse and sessionMemoryCompact
|
||||
|
||||
CC source code has two additional mechanisms not covered in this teaching version:
|
||||
|
||||
- **contextCollapse**: An independent context management system that, when enabled, suppresses proactive autocompact (`autoCompact.ts:215-222`), with collapse's commit/blocking flow taking over context management. Manual `/compact` and reactive fallback remain independent paths, unaffected by contextCollapse.
|
||||
- **sessionMemoryCompact**: Before compact_history, CC first attempts a lightweight summary using existing session memory (covered in s09) without calling the LLM. This mechanism becomes clearer after learning s09.
|
||||
|
||||
### What Does the Compression Prompt Look Like?
|
||||
|
||||
CC's compression prompt has two hard requirements:
|
||||
|
||||
1. **Absolutely no tool calls**: It begins with `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`, and appends another REMINDER at the end
|
||||
2. **Analyze first, then summarize**: The model must first reason in an `<analysis>` tag, then output the formal summary in a `<summary>` tag. The analysis is stripped during formatting
|
||||
|
||||
### Teaching Version Simplifications Are Intentional
|
||||
|
||||
- micro_compact uses text placeholders → we don't have API-level `cache_edits` access
|
||||
- Tokens estimated via character count → precise tokenizers are out of scope
|
||||
- Post-compaction recovery omitted → teaching version only keeps summary, does not auto re-attach files
|
||||
- Two auxiliary mechanisms not covered → they fall in the 10% detail category
|
||||
|
||||
The core design principle, cheap first, expensive last, is fully preserved.
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
293
s08_context_compact/README.ja.md
Normal file
@@ -0,0 +1,293 @@
|
||||
# s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
|
||||
> *"Context will fill up — have a way to make room"* — 4層圧縮戦略、安価なものを先に、高価なものを後に実行。
|
||||
>
|
||||
> **Harness レイヤー**: 圧縮 — クリーンな記憶、無限のセッション。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
Agent が動いている途中で、止まってしまう。
|
||||
|
||||
bash、read、write は揃っており、能力は十分。しかし 1000 行のファイル(~4000 token)を読み、さらに 30 のファイルを読み、20 のコマンドを実行したとします。各コマンドの出力、各ファイルの内容がすべて `messages` リストに蓄積されます。
|
||||
|
||||
コンテキストウィンドウには上限があります。満杯になると、API は即座に拒否します:`prompt_too_long`。
|
||||
|
||||
圧縮しなければ、Agent は大規模プロジェクトではまともに動けません。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
s07 のフック構造、スキルロード、サブ Agent の骨格を維持し、圧縮に焦点を当てるため一部のツールは省略。コアの変更点:各 LLM 呼び出し前に 3 層のプリプロセッサ(0 API)を挿入し、token が閾値を超えた場合は LLM 要約(1 API)をトリガー、API エラー時には緊急トリムを実行。
|
||||
|
||||
コア設計:安価なものを先に、高価なものを後に。
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||

|
||||
|
||||
### L1: snip_compact — 無関係な古い会話を切り捨て
|
||||
|
||||
Agent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。
|
||||
|
||||
メッセージ数が 50 を超えた場合 → 先頭 3 件(初期コンテキスト)と末尾 47 件(現在の作業)を保持し、中間を切り捨て:
|
||||
|
||||
```python
|
||||
def snip_compact(messages, max_messages=50):
|
||||
if len(messages) <= max_messages:
|
||||
return messages
|
||||
keep_head, keep_tail = 3, max_messages - 3
|
||||
snipped = len(messages) - keep_head - keep_tail
|
||||
placeholder = {"role": "user",
|
||||
"content": f"[snipped {snipped} messages from conversation middle]"}
|
||||
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
|
||||
```
|
||||
|
||||
メッセージ全体は切り捨てたが、残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。
|
||||
|
||||
### L2: micro_compact — 古いツール結果をプレースホルダに置換
|
||||
|
||||

|
||||
|
||||
Agent が連続して 10 個のファイルを読んだ。1〜7 回目の完全な内容はまだコンテキストに残っており、もう不要だが、大量のスペースを占有している。
|
||||
|
||||
直近 3 件の `tool_result` の完全な内容のみを保持し、それより古いものは 1 行のプレースホルダに置換:
|
||||
|
||||
```python
|
||||
KEEP_RECENT_TOOL_RESULTS = 3
|
||||
|
||||
def micro_compact(messages):
|
||||
tool_results = collect_tool_result_blocks(messages)
|
||||
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
|
||||
return messages
|
||||
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
|
||||
if len(block.get("content", "")) > 120:
|
||||
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
|
||||
return messages
|
||||
```
|
||||
|
||||
古い結果はクリーンアップされたが、1 件の新しい結果だけで 500KB の可能性がある。大きなファイルを `cat` するだけでコンテキストがいっぱいになる。→ L3。
|
||||
|
||||
### L3: tool_result_budget — 大きな結果をディスクに退避
|
||||
|
||||

|
||||
|
||||
モデルが一度に 5 つの大きなファイルを読み、1 つの user メッセージ内の全 `tool_result` の合計が 500KB に達した。
|
||||
|
||||
最後の user メッセージ内のすべての `tool_result` の合計サイズを集計。200KB を超えた場合 → サイズ順にソートし、最大のものから順に `.task_outputs/tool-results/` に退避。コンテキストには `<persisted-output>` マーカー + 先頭 2000 文字のプレビューのみを残す。モデルはマーカーを見て完全な内容がディスク上にあることを認識し、必要に応じて再読み込みできる。
|
||||
|
||||
```python
|
||||
def tool_result_budget(messages, max_bytes=200_000):
|
||||
last = messages[-1]
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"])
|
||||
if b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= max_bytes:
|
||||
return messages
|
||||
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
|
||||
for idx, block in ranked:
|
||||
if total <= max_bytes:
|
||||
break
|
||||
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
|
||||
total = recalculate_total(blocks)
|
||||
return messages
|
||||
```
|
||||
|
||||
最初の 3 層はすべて純粋なテキスト/構造操作(0 API 呼び出し)だが、会話内容を「理解」することはできない。コンテキストがまだ大きすぎる可能性がある。→ L4。
|
||||
|
||||
### L4: compact_history — LLM 全量要約
|
||||
|
||||

|
||||
|
||||
最初の 3 層がすべて実行されたが、超大規模プロジェクトで 30 分間連続作業すると、token がまだ閾値を超えている。
|
||||
|
||||
3 ステップのフロー:
|
||||
|
||||
1. **transcript を保存**:完全な会話を `.transcripts/` に JSONL 形式で書き出す。transcript は回復可能な記録として保存されるが、モデルのアクティブなコンテキストには要約しか残らない。モデルの現在の推論にとって、詳細はすでにコンテキストにない。教学コードは transcript 検索ツールを提供しない。
|
||||
2. **LLM で要約を生成**:会話履歴を LLM に送り、現在の目標、重要な発見、変更済みファイル、残りの作業、ユーザーの制約などの重要な情報を保持するよう指示。
|
||||
3. **メッセージリストを置換**:すべての古いメッセージが 1 件の要約に置き換えられる。教学版は要約のみを保持する。実際の Claude Code は compact 後に直近のファイル、計画、agent/skill/tool などのコンテキストを再付加する。
|
||||
|
||||
```python
|
||||
def compact_history(messages):
|
||||
transcript_path = write_transcript(messages) # 先に完全な会話を保存
|
||||
summary = summarize_history(messages) # LLM で要約を生成
|
||||
return [{"role": "user",
|
||||
"content": f"[Compacted]\n\n{summary}"}]
|
||||
```
|
||||
|
||||
**サーキットブレーカー**:連続 3 回失敗したらリトライを停止し、無限ループによる API 呼び出しの浪費を防止。
|
||||
|
||||
### 緊急: reactive_compact
|
||||
|
||||
API がまだ `prompt_too_long`(413)を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。
|
||||
|
||||
この時 **reactive_compact** がトリガーされる:compact_history よりもさらに積極的で、末尾からバイト単位の精度で API が受け入れ可能なサイズまで切り詰め、最後の 5 件のメッセージ + 要約のみを保持。
|
||||
|
||||
```python
|
||||
def reactive_compact(messages):
|
||||
transcript = write_transcript(messages)
|
||||
summary = summarize_history(messages)
|
||||
tail = messages[-5:]
|
||||
return [{"role": "user",
|
||||
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
|
||||
```
|
||||
|
||||
reactive compact にはリトライ上限がある(デフォルト 1 回)。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。
|
||||
|
||||
### 合わせて実行
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# 3 つのプリプロセッサ(0 API 呼び出し)
|
||||
# 順序:budget を先に実行し、大きな内容をプレースホルダ化する前に退避
|
||||
messages[:] = tool_result_budget(messages) # L3: 大きな結果を退避
|
||||
messages[:] = snip_compact(messages) # L1: 中間を切り捨て
|
||||
messages[:] = micro_compact(messages) # L2: 古い結果をプレースホルダに
|
||||
|
||||
# まだ足りない?LLM 要約(1 API 呼び出し)
|
||||
if estimate_token_count(messages) > THRESHOLD:
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(...)
|
||||
except PromptTooLongError:
|
||||
if reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
messages[:] = reactive_compact(messages) # 緊急対応
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise # リトライ上限超過、例外をスロー
|
||||
# ... ツール実行 ...
|
||||
|
||||
# compact ツール:モデルが能動的に呼び出した場合、compact_history をトリガー
|
||||
if block.name == "compact":
|
||||
messages[:] = compact_history(messages)
|
||||
results.append({..., "content": "[Compacted. History summarized.]"})
|
||||
messages.append({"role": "user", "content": results})
|
||||
break # 現在のターンを終了し、圧縮後のコンテキストで新しく開始
|
||||
```
|
||||
|
||||
**順序は変えられない。** L3(budget)が L2(micro)の前に実行される理由:micro は古い大きな tool_result を 1 行のプレースホルダに置換するため、budget はその前に完全な内容を退避させる必要がある。CC ソースが `applyToolResultBudget` を最初に配置する理由も同じ。
|
||||
|
||||
---
|
||||
|
||||
## s07 からの変更点
|
||||
|
||||
| コンポーネント | 変更前 (s07) | 変更後 (s08) |
|
||||
|------|-----------|-----------|
|
||||
| コンテキスト管理 | なし(コンテキストが無限に膨張) | 4 層圧縮パイプライン + 緊急対応 |
|
||||
| 新規関数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
|
||||
| ツール | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
|
||||
| ループ | LLM 呼び出し → ツール実行 | 各ラウンド前に 3 層プリプロセッサを実行 + 閾値で compact_history をトリガー |
|
||||
| 設計原則 | — | 安価なものを先に、高価なものを後に |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s08_context_compact/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみてください:
|
||||
|
||||
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(連続して複数のファイルを読み、L2 の古い結果圧縮を観察)
|
||||
2. `Read every file in s08_context_compact/`(一度に大量の内容を読み込み、L3 のディスク退避を観察)
|
||||
3. 20+ ラウンドの対話を繰り返し、`[auto compact]` または `[reactive compact]` が表示されるか観察
|
||||
|
||||
観察のポイント:ツール実行のたびに、古い tool_result は圧縮されているか?連続対話で token が閾値を超えたとき、要約が自動的にトリガーされたか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
コンテキスト圧縮により、Agent は長時間クラッシュせずに動けるようになった。しかし、圧縮のたびにユーザーが以前に伝えた偏好や制約も一緒に失われてしまう。Agent が重要なことを選択的に記憶できるようにできないか?
|
||||
|
||||
s09 Memory → 3 つのサブシステム:何を記憶するかの選択、重要情報の抽出、整理と統合。圧縮を越え、セッションを越えて。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードの詳細</summary>
|
||||
|
||||
> 以下は CC ソースコード `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` の分析に基づく。
|
||||
|
||||
### 実行順序の対応
|
||||
|
||||
教学版は説明の便宜上 L1/L2/L3/L4 と番号を振っているが、実際の実行順序は番号と完全には一致しない:
|
||||
|
||||
| 項目 | 教学版 | Claude Code |
|
||||
|------|--------|-------------|
|
||||
| 実行順序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto(`query.ts:379-468`) |
|
||||
| snip_compact | 先頭 3 + 末尾 47 を保持 | CC はメインスレッドのみ有効;実装はオープンソースリポジトリにない(`HISTORY_SNIP` feature gate)、インターフェースは確認可能:`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`、`SnipTool` もモデルが能動的に呼び出し可能。教学版の 3/47 は簡略パラメータ |
|
||||
| micro_compact | テキストプレースホルダで置換 | 2 つのパス:time-based は直接内容をクリア、cached は API の `cache_edits` を使用(legacy パスは削除済み) |
|
||||
| micro_compact ホワイトリスト | 位置による(直近 3 件) | time-based は時間閾値でトリガー、cached はカウントでトリガー(`microCompact.ts`) |
|
||||
| tool_result_budget | 200KB 文字 | 200,000 文字(`toolLimits.ts:49`) |
|
||||
| compact_history 閾値 | 文字数で推定 | 精密な token 数:`contextWindow - maxOutputTokens - 13_000` |
|
||||
| 要約の要求 | 5 種類の情報 | 9 つのセクション + `<analysis>`/`<summary>` デュアルタグ |
|
||||
| 圧縮プロンプト | シンプルなプロンプト | 先頭と末尾に二重の安全ガードでツール呼び出しを禁止 |
|
||||
| PTL retry | あり(簡略版) | `truncateHeadForPTLRetry()` がメッセージグループ単位でロールバック(`compact.ts:243-290`) |
|
||||
| 圧縮後のリカバリ | なし(教学版は要約のみ保持) | 直近のファイル、計画、agent/skill/tool などの自動再付加 |
|
||||
| サーキットブレーカー | 3 回 | 3 回(`autoCompact.ts:70`) |
|
||||
| reactive リトライ | 1 回 | CC にはより精緻な段階別リトライがある |
|
||||
|
||||
### 実行順序の詳細
|
||||
|
||||
CC ソース `query.ts` での実際の順序:
|
||||
|
||||
1. `applyToolResultBudget`(L379):まず大きな結果を処理し、完全な内容を退避
|
||||
2. `snipCompact`(L403):中間メッセージを切り捨て
|
||||
3. `microcompact`(L414):古い結果のプレースホルダ化
|
||||
4. `contextCollapse`(L441):独立したコンテキスト管理システム(教学版にはなし)
|
||||
5. `autoCompact`(L454):LLM 全量要約
|
||||
|
||||
教学版の budget → snip → micro の順序はこれと一致する。教学版には contextCollapse メカニズムがない。
|
||||
|
||||
### 完全な定数リファレンス
|
||||
|
||||
| 定数 | 値 | ソースファイル |
|
||||
|------|-----|--------|
|
||||
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
|
||||
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
|
||||
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
|
||||
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
|
||||
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
|
||||
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
|
||||
| 時間ベース micro_compact 間隔 | 60 分 | `timeBasedMCConfig.ts` |
|
||||
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
|
||||
|
||||
### contextCollapse と sessionMemoryCompact
|
||||
|
||||
CC ソースコードには、この教学版では展開していない 2 つのメカニズムが存在する:
|
||||
|
||||
- **contextCollapse**:独立したコンテキスト管理システム。有効時には proactive autocompact を抑制し(`autoCompact.ts:215-222`)、collapse の commit/blocking フローがコンテキスト管理を引き継ぐ。ただし manual `/compact` と reactive fallback は独立パスのままで、contextCollapse の影響を受けない。
|
||||
- **sessionMemoryCompact**:compact_history の前に、CC は既存の session memory(s09 で解説)を使った軽量要約を先に試みる。LLM を呼び出さない。このメカニズムは s09 を学んだ後に振り返るとより理解しやすい。
|
||||
|
||||
### 圧縮プロンプトの中身
|
||||
|
||||
CC の圧縮プロンプトには 2 つの厳格な要件がある:
|
||||
|
||||
1. **ツール呼び出しの絶対禁止**:冒頭が `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.` で、末尾にも再度 REMINDER がある
|
||||
2. **先に分析してから要約**:モデルはまず `<analysis>` タグで思考を整理し、その後 `<summary>` タグで正式な要約を出力する。analysis はフォーマット時に除去される
|
||||
|
||||
### 教学版の簡略化は意図的
|
||||
|
||||
- micro_compact でテキストプレースホルダを使用 → API 層の `cache_edits` 権限がないため
|
||||
- token を文字数で推定 → 精密な tokenizer は教学の対象外
|
||||
- 圧縮後のリカバリを省略 → 教学版は要約のみを保持し、ファイルの自動再付加を行わない
|
||||
- 2 つの補助メカニズムを展開しない → 10% の細部に属する
|
||||
|
||||
コア設計思想、安価なものを先に高価なものを後に、は完全に保持されている。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
293
s08_context_compact/README.md
Normal file
@@ -0,0 +1,293 @@
|
||||
# s08: Context Compact — 上下文总会满,要有办法腾地方
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s20
|
||||
> *"上下文总会满, 要有办法腾地方"* — 四层压缩策略, 便宜的先跑贵的后跑。
|
||||
>
|
||||
> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
Agent 跑着跑着,不动了。
|
||||
|
||||
手里有 bash、有 read、有 write,能力是够的。但它读了一个 1000 行的文件(~4000 token),又读了 30 个文件,跑了 20 条命令。每条命令的输出、每个文件的内容,全都堆在 `messages` 列表里。
|
||||
|
||||
上下文窗口是有限的。满了之后,API 直接拒绝:`prompt_too_long`。
|
||||
|
||||
不压缩,Agent 根本没法在大项目里干活。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
保留 s07 的 hook 结构、技能加载、子 Agent 等骨架,省略部分工具细节以聚焦压缩。核心变动:每轮 LLM 调用前插入三层预处理器(0 API),token 仍超阈值时触发 LLM 摘要(1 API),API 报错时应急裁剪。
|
||||
|
||||
核心设计:便宜的先跑,贵的后跑。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||

|
||||
|
||||
### L1: snip_compact — 裁掉无关的旧对话
|
||||
|
||||
Agent 跑了 80 轮对话,`messages` 攒了 160 条。最前面的"帮我创建 hello.py"和当前工作几乎无关了,但全占着位置。
|
||||
|
||||
消息数超过 50 条 → 保留头部 3 条(初始上下文)和尾部 47 条(当前工作),中间裁掉:
|
||||
|
||||
```python
|
||||
def snip_compact(messages, max_messages=50):
|
||||
if len(messages) <= max_messages:
|
||||
return messages
|
||||
keep_head, keep_tail = 3, max_messages - 3
|
||||
snipped = len(messages) - keep_head - keep_tail
|
||||
placeholder = {"role": "user",
|
||||
"content": f"[snipped {snipped} messages from conversation middle]"}
|
||||
return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
|
||||
```
|
||||
|
||||
裁掉了整条消息,但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。
|
||||
|
||||
### L2: micro_compact — 旧工具结果占位
|
||||
|
||||

|
||||
|
||||
Agent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里,早就不需要了,但占着大量空间。
|
||||
|
||||
只保留最近 3 条 `tool_result` 的完整内容,更旧的替换为一行占位符:
|
||||
|
||||
```python
|
||||
KEEP_RECENT_TOOL_RESULTS = 3
|
||||
|
||||
def micro_compact(messages):
|
||||
tool_results = collect_tool_result_blocks(messages)
|
||||
if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:
|
||||
return messages
|
||||
for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:
|
||||
if len(block.get("content", "")) > 120:
|
||||
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
|
||||
return messages
|
||||
```
|
||||
|
||||
旧结果清掉了,但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。
|
||||
|
||||
### L3: tool_result_budget — 大结果落盘
|
||||
|
||||

|
||||
|
||||
模型一次读了 5 个大文件,单条 user 消息里所有 `tool_result` 加起来 500KB。
|
||||
|
||||
统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序,从最大的开始落盘到 `.task_outputs/tool-results/`,上下文里只留 `<persisted-output>` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上,需要时可以重新读。
|
||||
|
||||
```python
|
||||
def tool_result_budget(messages, max_bytes=200_000):
|
||||
last = messages[-1]
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"])
|
||||
if b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= max_bytes:
|
||||
return messages
|
||||
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
|
||||
for idx, block in ranked:
|
||||
if total <= max_bytes:
|
||||
break
|
||||
block["content"] = persist_large_output(block["tool_use_id"], str(block["content"]))
|
||||
total = recalculate_total(blocks)
|
||||
return messages
|
||||
```
|
||||
|
||||
前三层都是纯文本/结构操作,0 API 调用,但也无法"理解"对话内容。上下文可能仍然太大。→ L4。
|
||||
|
||||
### L4: compact_history — LLM 全量摘要
|
||||
|
||||

|
||||
|
||||
前三层全跑完了,但在超大项目中连续工作 30 分钟后,token 仍然超过阈值。
|
||||
|
||||
三步流程:
|
||||
|
||||
1. **保存 transcript**:完整对话写入 `.transcripts/`,JSONL 格式。transcript 保留了可恢复记录,但模型的活跃上下文里只剩摘要。对模型当下推理来说,细节已经不在上下文中了。教学代码没有提供 transcript 检索工具。
|
||||
2. **LLM 生成摘要**:把对话历史发给 LLM,要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。
|
||||
3. **替换消息列表**:所有旧消息被替换为一条摘要。教学版只保留摘要;真实 Claude Code 会在 compact 后重新附加部分最近文件、计划、agent/skill/tool 等上下文。
|
||||
|
||||
```python
|
||||
def compact_history(messages):
|
||||
transcript_path = write_transcript(messages) # 先保存完整对话
|
||||
summary = summarize_history(messages) # LLM 生成摘要
|
||||
return [{"role": "user",
|
||||
"content": f"[Compacted]\n\n{summary}"}]
|
||||
```
|
||||
|
||||
**熔断器**:连续失败 3 次后停止重试,防止死循环浪费 API 调用。
|
||||
|
||||
### 应急: reactive_compact
|
||||
|
||||
有时候 API 还是返回 `prompt_too_long`(413),上下文增长速度快于压缩触发速度时。
|
||||
|
||||
这时触发 **reactive_compact**:比 compact_history 更激进,从尾部回退,以字节级精度裁剪到 API 可接受的大小,只保留最后 5 条消息 + 摘要。
|
||||
|
||||
```python
|
||||
def reactive_compact(messages):
|
||||
transcript = write_transcript(messages)
|
||||
summary = summarize_history(messages)
|
||||
tail = messages[-5:]
|
||||
return [{"role": "user",
|
||||
"content": f"[Reactive compact]\n\n{summary}"}, *tail]
|
||||
```
|
||||
|
||||
reactive compact 有重试上限(默认 1 次)。再失败就抛出异常,不无限循环。完整的错误恢复逻辑留给 s11。
|
||||
|
||||
### 合起来跑
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# 三个预处理器(0 API 调用)
|
||||
# 顺序:budget 先跑,确保大内容落盘后再做占位和裁剪
|
||||
messages[:] = tool_result_budget(messages) # L3: 大结果落盘
|
||||
messages[:] = snip_compact(messages) # L1: 裁中间
|
||||
messages[:] = micro_compact(messages) # L2: 旧结果占位
|
||||
|
||||
# 还不够?LLM 摘要(1 API 调用)
|
||||
if estimate_token_count(messages) > THRESHOLD:
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(...)
|
||||
except PromptTooLongError:
|
||||
if reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
messages[:] = reactive_compact(messages) # 应急
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise # 超过重试上限,抛出异常
|
||||
# ... 工具执行 ...
|
||||
|
||||
# compact 工具:模型主动调用时触发 compact_history
|
||||
if block.name == "compact":
|
||||
messages[:] = compact_history(messages)
|
||||
results.append({..., "content": "[Compacted. History summarized.]"})
|
||||
messages.append({"role": "user", "content": results})
|
||||
break # 结束当前 turn,用压缩后的上下文开始新一轮
|
||||
```
|
||||
|
||||
**顺序不能换。** L3(budget)在 L2(micro)前面,因为 micro 会把旧的大 tool_result 替换成一行占位符,budget 必须在那之前把完整内容落盘。这也是为什么 CC 源码把 `applyToolResultBudget` 放在最前面。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s07 的变更
|
||||
|
||||
| 组件 | 之前 (s07) | 之后 (s08) |
|
||||
|------|-----------|-----------|
|
||||
| 上下文管理 | 无(上下文无限膨胀) | 四层压缩管线 + 应急 |
|
||||
| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |
|
||||
| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill (8) | 8 + compact (9) |
|
||||
| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history |
|
||||
| 设计原则 | — | 便宜的先跑,贵的后跑 |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s08_context_compact/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(连续读多个文件,观察 L2 压缩旧结果)
|
||||
2. `Read every file in s08_context_compact/`(一次性读大量内容,观察 L3 落盘)
|
||||
3. 反复对话 20+ 轮,观察是否出现 `[auto compact]` 或 `[reactive compact]`
|
||||
|
||||
观察重点:每次工具执行后,旧 tool_result 是否被压缩?连续对话后 token 超阈值时,是否自动触发了摘要?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
上下文压缩让 Agent 能跑很久不会崩。但每次压缩后,用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent 有选择地记住重要的事?
|
||||
|
||||
s09 Memory → 三个子系统:选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` 的分析。
|
||||
|
||||
### 执行顺序对照
|
||||
|
||||
教学版为了讲解方便按 L1/L2/L3/L4 编号,但实际执行顺序和编号不完全对应:
|
||||
|
||||
| 维度 | 教学版 | Claude Code |
|
||||
|------|--------|-------------|
|
||||
| 执行顺序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto(`query.ts:379-468`) |
|
||||
| snip_compact | 保留头 3 + 尾 47 | CC 仅主线程启用;实现不在开源仓库中(`HISTORY_SNIP` feature gate),但接口可见:`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`,还暴露了 `SnipTool` 工具让模型主动调用。教学版的 3/47 是简化参数 |
|
||||
| micro_compact | 文本占位符替换 | 两条路径:time-based 直接清内容,cached 走 API `cache_edits`(legacy path 已移除) |
|
||||
| micro_compact 白名单 | 按位置(最近 3 条) | time-based 按时间阈值触发;cached 按计数触发(`microCompact.ts`) |
|
||||
| tool_result_budget | 200KB 字符 | 200,000 字符(`toolLimits.ts:49`) |
|
||||
| compact_history 阈值 | 字符数估算 | 精确 token:`contextWindow - maxOutputTokens - 13_000` |
|
||||
| 摘要要求 | 5 类信息 | 9 个部分 + `<analysis>`/`<summary>` 双标签 |
|
||||
| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 |
|
||||
| PTL retry | 有(简化) | `truncateHeadForPTLRetry()` 按消息组回退(`compact.ts:243-290`) |
|
||||
| 后压缩恢复 | 无(教学版只保留摘要) | 自动重新读取最近文件、计划、agent/skill/tool 等 |
|
||||
| 熔断器 | 3 次 | 3 次(`autoCompact.ts:70`) |
|
||||
| reactive 重试 | 1 次 | CC 有更精细的分级重试 |
|
||||
|
||||
### 执行顺序详解
|
||||
|
||||
CC 源码 `query.ts` 中的真实顺序:
|
||||
|
||||
1. `applyToolResultBudget`(L379):先处理大结果,确保完整内容落盘
|
||||
2. `snipCompact`(L403):裁中间消息
|
||||
3. `microcompact`(L414):旧结果占位
|
||||
4. `contextCollapse`(L441):独立的上下文管理系统(教学版无)
|
||||
5. `autoCompact`(L454):LLM 全量摘要
|
||||
|
||||
教学版的 budget → snip → micro 顺序与此一致。教学版没有 contextCollapse 机制。
|
||||
|
||||
### 完整常量参考
|
||||
|
||||
| 常量 | 值 | 源文件 |
|
||||
|------|-----|--------|
|
||||
| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |
|
||||
| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |
|
||||
| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |
|
||||
| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |
|
||||
| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |
|
||||
| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |
|
||||
| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts` |
|
||||
| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |
|
||||
|
||||
### contextCollapse 和 sessionMemoryCompact
|
||||
|
||||
CC 源码中还有两个机制本教学版没有展开:
|
||||
|
||||
- **contextCollapse**:独立的上下文管理系统,启用时抑制 proactive autocompact(`autoCompact.ts:215-222`),由 collapse 的 commit/blocking 流程接管上下文管理。但 manual `/compact` 和 reactive fallback 仍是独立路径,不受 contextCollapse 影响。
|
||||
- **sessionMemoryCompact**:compact_history 之前,CC 会先尝试用已有的 session memory(s09 会讲到)做轻量摘要,不调 LLM。这个机制等学完 s09 之后回头看会更清楚。
|
||||
|
||||
### 压缩 prompt 长什么样?
|
||||
|
||||
CC 的压缩 prompt 有两个硬性要求:
|
||||
|
||||
1. **绝对禁止调用工具**:开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`,末尾还会再 REMINDER 一次
|
||||
2. **先分析再总结**:模型需要先在 `<analysis>` 标签里理清思路,然后在 `<summary>` 标签里输出正式摘要。analysis 在格式化时被剥离
|
||||
|
||||
### 教学版的简化是刻意的
|
||||
|
||||
- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限
|
||||
- token 用字符数估算 → 精确 tokenizer 不在教学范围内
|
||||
- 后压缩恢复省略 → 教学版只保留摘要,不自动重新附加文件
|
||||
- 两个辅助机制不展开 → 属于 10% 的细节
|
||||
|
||||
核心设计思想,便宜的先跑贵的后跑,完整保留。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
469
s08_context_compact/code.py
Normal file
@@ -0,0 +1,469 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s08_context_compact.py - Context Compact
|
||||
|
||||
Four-layer compaction pipeline inserted before LLM calls:
|
||||
|
||||
L1: snip_compact — trim middle messages when count > 50
|
||||
L2: micro_compact — replace old tool_results with placeholders
|
||||
L3: tool_result_budget — persist large results to disk
|
||||
L4: compact_history — LLM full summary (1 API call)
|
||||
|
||||
Emergency: reactive_compact — when API still returns prompt_too_long
|
||||
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ messages[] │
|
||||
│ ↓ │
|
||||
│ L3 budget ─→ L1 snip ─→ L2 micro ─→ [token > threshold?] │
|
||||
│ ├─ No → LLM │
|
||||
│ └─ Yes → L4 summary │
|
||||
│ ↓ │
|
||||
│ LLM call │
|
||||
│ [prompt_too_long?] │
|
||||
│ └─ Yes → reactive │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
|
||||
Core principle: cheap first, expensive last.
|
||||
Execution order matches CC source: budget → snip → micro → auto.
|
||||
|
||||
Builds on s07 (skill loading). Usage:
|
||||
|
||||
python s08_context_compact/code.py
|
||||
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
|
||||
"""
|
||||
|
||||
import os, subprocess, json, time
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"): os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
SKILLS_DIR = WORKDIR / "skills"
|
||||
TRANSCRIPT_DIR = WORKDIR / ".transcripts"
|
||||
TOOL_RESULTS_DIR = WORKDIR / ".task_outputs" / "tool-results"
|
||||
TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True)
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
# s07: Skill catalog scan (inherited from s07)
|
||||
def _parse_frontmatter(text: str) -> tuple[dict, str]:
|
||||
if not text.startswith("---"):
|
||||
return {}, text
|
||||
parts = text.split("---", 2)
|
||||
if len(parts) < 3:
|
||||
return {}, text
|
||||
meta = {}
|
||||
for line in parts[1].strip().splitlines():
|
||||
if ":" in line:
|
||||
k, v = line.split(":", 1)
|
||||
meta[k.strip()] = v.strip().strip('"').strip("'")
|
||||
return meta, parts[2].strip()
|
||||
|
||||
SKILL_REGISTRY: dict[str, dict] = {}
|
||||
|
||||
def _scan_skills():
|
||||
if not SKILLS_DIR.exists():
|
||||
return
|
||||
for d in sorted(SKILLS_DIR.iterdir()):
|
||||
if not d.is_dir():
|
||||
continue
|
||||
manifest = d / "SKILL.md"
|
||||
if manifest.exists():
|
||||
raw = manifest.read_text()
|
||||
meta, body = _parse_frontmatter(raw)
|
||||
name = meta.get("name", d.name)
|
||||
desc = meta.get("description", raw.split("\n")[0].lstrip("#").strip())
|
||||
SKILL_REGISTRY[name] = {"name": name, "description": desc, "content": raw}
|
||||
|
||||
_scan_skills()
|
||||
|
||||
def list_skills() -> str:
|
||||
if not SKILL_REGISTRY:
|
||||
return "(no skills found)"
|
||||
return "\n".join(f"- **{s['name']}**: {s['description']}" for s in SKILL_REGISTRY.values())
|
||||
|
||||
def load_skill(name: str) -> str:
|
||||
skill = SKILL_REGISTRY.get(name)
|
||||
if not skill:
|
||||
return f"Skill not found: {name}"
|
||||
return skill["content"]
|
||||
|
||||
# s08: SYSTEM includes skill catalog (inherited from s07 build_system)
|
||||
def build_system() -> str:
|
||||
catalog = list_skills()
|
||||
return (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
f"Skills available:\n{catalog}\n"
|
||||
"Use load_skill to get full details when needed."
|
||||
)
|
||||
|
||||
SYSTEM = build_system()
|
||||
|
||||
# s08: subagent gets its own system prompt — no compact, no skill loading
|
||||
SUB_SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Complete the task you were given, then return a concise summary. "
|
||||
"Do not delegate further."
|
||||
)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02-s07 (unchanged): Basic Tools
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR): raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR, capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired: return "Error: Timeout (120s)"
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines): lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path); file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content); return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text: return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_todo_write(todos: list) -> str:
|
||||
for i, t in enumerate(todos):
|
||||
if "content" not in t or "status" not in t:
|
||||
return f"Error: todos[{i}] missing 'content' or 'status'"
|
||||
if t["status"] not in ("pending", "in_progress", "completed"):
|
||||
return f"Error: todos[{i}] has invalid status '{t['status']}'"
|
||||
tasks_file = TASKS_DIR / "current_todos.json"
|
||||
tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False))
|
||||
lines = ["\n\033[33m## Current Tasks\033[0m"]
|
||||
for t in todos:
|
||||
icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]]
|
||||
lines.append(f" [{icon}] {t['content']}")
|
||||
print("\n".join(lines))
|
||||
return f"Updated {len(todos)} tasks"
|
||||
|
||||
def extract_text(content) -> str:
|
||||
if not isinstance(content, list): return str(content)
|
||||
return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text")
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s06-s07 (unchanged): Subagent
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
SUB_TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
]
|
||||
SUB_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob}
|
||||
|
||||
def spawn_subagent(task: str) -> str:
|
||||
print(f"\n\033[35m[Subagent spawned]\033[0m")
|
||||
messages = [{"role": "user", "content": task}]
|
||||
for _ in range(30):
|
||||
response = client.messages.create(model=MODEL, system=SUB_SYSTEM,
|
||||
messages=messages, tools=SUB_TOOLS, max_tokens=8000)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": str(blocked)})
|
||||
continue
|
||||
handler = SUB_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
print(f" \033[90m[sub] {block.name}: {str(output)[:100]}\033[0m")
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
result = extract_text(messages[-1]["content"])
|
||||
if not result:
|
||||
for msg in reversed(messages):
|
||||
if msg["role"] == "assistant":
|
||||
result = extract_text(msg["content"])
|
||||
if result:
|
||||
break
|
||||
if not result:
|
||||
result = "Subagent stopped after 30 turns without final answer."
|
||||
print(f"\033[35m[Subagent done]\033[0m")
|
||||
return result
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s08: Four-Layer Compaction Pipeline
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
CONTEXT_LIMIT = 50000
|
||||
KEEP_RECENT = 3
|
||||
PERSIST_THRESHOLD = 30000
|
||||
|
||||
def estimate_size(msgs): return len(str(msgs))
|
||||
|
||||
|
||||
# L1: snipCompact — trim middle messages
|
||||
def snip_compact(messages, max_messages=50):
|
||||
if len(messages) <= max_messages: return messages
|
||||
keep_head, keep_tail = 3, max_messages - 3
|
||||
snipped = len(messages) - keep_head - keep_tail
|
||||
return messages[:keep_head] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] + messages[-keep_tail:]
|
||||
|
||||
|
||||
# L2: microCompact — old result placeholders
|
||||
def collect_tool_results(messages):
|
||||
blocks = []
|
||||
for mi, msg in enumerate(messages):
|
||||
if msg.get("role") != "user" or not isinstance(msg.get("content"), list): continue
|
||||
for bi, block in enumerate(msg["content"]):
|
||||
if isinstance(block, dict) and block.get("type") == "tool_result":
|
||||
blocks.append((mi, bi, block))
|
||||
return blocks
|
||||
|
||||
def micro_compact(messages):
|
||||
tool_results = collect_tool_results(messages)
|
||||
if len(tool_results) <= KEEP_RECENT: return messages
|
||||
for _, _, block in tool_results[:-KEEP_RECENT]:
|
||||
if len(block.get("content", "")) > 120:
|
||||
block["content"] = "[Earlier tool result compacted. Re-run if needed.]"
|
||||
return messages
|
||||
|
||||
|
||||
# L3: toolResultBudget — persist large results to disk
|
||||
def persist_large_output(tool_use_id, output):
|
||||
if len(output) <= PERSIST_THRESHOLD: return output
|
||||
TOOL_RESULTS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
path = TOOL_RESULTS_DIR / f"{tool_use_id}.txt"
|
||||
if not path.exists(): path.write_text(output)
|
||||
return f"<persisted-output>\nFull output: {path}\nPreview:\n{output[:2000]}\n</persisted-output>"
|
||||
|
||||
def tool_result_budget(messages, max_bytes=200_000):
|
||||
last = messages[-1] if messages else None
|
||||
if not last or last.get("role") != "user" or not isinstance(last.get("content"), list): return messages
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"]) if isinstance(b, dict) and b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= max_bytes: return messages
|
||||
ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True)
|
||||
for _, block in ranked:
|
||||
if total <= max_bytes: break
|
||||
content = str(block.get("content", ""))
|
||||
if len(content) <= PERSIST_THRESHOLD: continue
|
||||
tid = block.get("tool_use_id", "unknown")
|
||||
block["content"] = persist_large_output(tid, content)
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
return messages
|
||||
|
||||
|
||||
# L4: autoCompact — LLM full summary
|
||||
def write_transcript(messages):
|
||||
TRANSCRIPT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
|
||||
with path.open("w") as f:
|
||||
for msg in messages: f.write(json.dumps(msg, default=str) + "\n")
|
||||
return path
|
||||
|
||||
def summarize_history(messages):
|
||||
conversation = json.dumps(messages, default=str)[:80000]
|
||||
prompt = ("Summarize this coding-agent conversation so work can continue.\n"
|
||||
"Preserve: 1. current goal, 2. key findings/decisions, 3. files read/changed, "
|
||||
"4. remaining work, 5. user constraints.\nBe compact but concrete.\n\n" + conversation)
|
||||
response = client.messages.create(model=MODEL, messages=[{"role": "user", "content": prompt}], max_tokens=2000)
|
||||
return "\n".join(
|
||||
getattr(block, "text", "")
|
||||
for block in response.content
|
||||
if getattr(block, "type", None) == "text").strip() or "(empty summary)"
|
||||
|
||||
def compact_history(messages):
|
||||
transcript_path = write_transcript(messages)
|
||||
print(f"[transcript saved: {transcript_path}]")
|
||||
summary = summarize_history(messages)
|
||||
return [{"role": "user", "content": f"[Compacted]\n\n{summary}"}]
|
||||
|
||||
|
||||
# Emergency: reactiveCompact — on API error
|
||||
def reactive_compact(messages):
|
||||
transcript = write_transcript(messages)
|
||||
summary = summarize_history(messages)
|
||||
return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *messages[-5:]]
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s07: Tool Definitions
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
{"name": "todo_write", "description": "Create and manage a task list for your current coding session.",
|
||||
"input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}, "required": ["content", "status"]}}}, "required": ["todos"]}},
|
||||
{"name": "task", "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.",
|
||||
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}},
|
||||
{"name": "load_skill", "description": "Load the full content of a skill by name.",
|
||||
"input_schema": {"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}},
|
||||
# s08 change: new compact tool — triggers compact_history, not a no-op
|
||||
{"name": "compact", "description": "Summarize earlier conversation to free context space.",
|
||||
"input_schema": {"type": "object", "properties": {"focus": {"type": "string"}}}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write,
|
||||
"task": spawn_subagent, "load_skill": load_skill,
|
||||
}
|
||||
|
||||
# FROM s04 (unchanged): Hooks
|
||||
HOOKS = {"PreToolUse": [], "PostToolUse": []}
|
||||
def trigger_hooks(event, *args):
|
||||
for cb in HOOKS[event]:
|
||||
r = cb(*args)
|
||||
if r is not None: return r
|
||||
return None
|
||||
|
||||
DENY_LIST = ["rm -rf /", "sudo", "shutdown"]
|
||||
def permission_hook(block):
|
||||
if block.name == "bash":
|
||||
for p in DENY_LIST:
|
||||
if p in block.input.get("command", ""): return "Permission denied"
|
||||
return None
|
||||
def log_hook(block):
|
||||
print(f"\033[90m[HOOK] {block.name}\033[0m")
|
||||
return None
|
||||
|
||||
HOOKS["PreToolUse"].append(permission_hook)
|
||||
HOOKS["PreToolUse"].append(log_hook)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — s08 core: run compaction pipeline before LLM
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
MAX_REACTIVE_RETRIES = 1 # retry limit for reactive compact
|
||||
|
||||
def agent_loop(messages: list):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# s08 change: three preprocessors (0 API calls, cheap first)
|
||||
# Order matches CC source: budget → snip → micro
|
||||
messages[:] = tool_result_budget(messages) # L3: persist large results first
|
||||
messages[:] = snip_compact(messages) # L1: trim middle
|
||||
messages[:] = micro_compact(messages) # L2: old result placeholders
|
||||
|
||||
# s08 change: tokens still over threshold → LLM summary (1 API call)
|
||||
if estimate_size(messages) > CONTEXT_LIMIT:
|
||||
print("[auto compact]")
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(model=MODEL, system=SYSTEM, messages=messages, tools=TOOLS, max_tokens=8000)
|
||||
reactive_retries = 0 # reset on successful API call
|
||||
except Exception as e:
|
||||
if ("prompt_too_long" in str(e).lower() or "too many tokens" in str(e).lower()) and reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
print("[reactive compact]")
|
||||
messages[:] = reactive_compact(messages)
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise
|
||||
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use": return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use": continue
|
||||
print(f"\033[36m> {block.name}\033[0m")
|
||||
|
||||
# s08: compact tool triggers compact_history, not a no-op string
|
||||
if block.name == "compact":
|
||||
messages[:] = compact_history(messages)
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id,
|
||||
"content": "[Compacted. Conversation history has been summarized.]"})
|
||||
messages.append({"role": "user", "content": results})
|
||||
break # end current turn, start fresh with compacted context
|
||||
|
||||
blocked = trigger_hooks("PreToolUse", block)
|
||||
if blocked:
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(blocked)})
|
||||
continue
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
trigger_hooks("PostToolUse", block, output)
|
||||
print(str(output)[:200])
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(output)})
|
||||
else:
|
||||
# normal path: no compact was called
|
||||
messages.append({"role": "user", "content": results})
|
||||
continue
|
||||
# compact was called: results already appended above
|
||||
continue
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s08: Context Compact — four-layer compaction pipeline")
|
||||
print("输入问题,回车发送。输入 q 退出。\n")
|
||||
history = []
|
||||
while True:
|
||||
try: query = input("\033[36ms08 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt): break
|
||||
if query.strip().lower() in ("q", "exit", ""): break
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text": print(block.text)
|
||||
print()
|
||||
72
s08_context_compact/images/auto-compact.en.svg
Normal file
@@ -0,0 +1,72 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM Full Summary</text>
|
||||
|
||||
<!-- Trigger Condition -->
|
||||
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">Trigger Condition</text>
|
||||
<text x="140" y="70" fill="#991b1b" font-size="11">All three preprocessing layers have run, estimated tokens > contextWindow - maxOutputTokens - 13_000.</text>
|
||||
<text x="140" y="86" fill="#991b1b" font-size="10">Tries sessionMemoryCompact first (lightweight summary from existing memory), only calls LLM if insufficient.</text>
|
||||
|
||||
<!-- Steps -->
|
||||
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Step 1: Save transcript</text>
|
||||
<text x="40" y="152" fill="#475569" font-size="10">Write full conversation to .transcripts/</text>
|
||||
<text x="40" y="168" fill="#475569" font-size="10">JSONL format, one message per line</text>
|
||||
<text x="40" y="184" fill="#475569" font-size="10">Filename: transcript_{timestamp}.jsonl</text>
|
||||
<text x="40" y="200" fill="#94a3b8" font-size="9">No data lost, just moved out of active area</text>
|
||||
|
||||
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Step 2: LLM generates summary</text>
|
||||
<text x="290" y="152" fill="#475569" font-size="10">Send conversation history to LLM</text>
|
||||
<text x="290" y="166" fill="#475569" font-size="9">Summary must include 9 sections:</text>
|
||||
<text x="290" y="180" fill="#94a3b8" font-size="8">request · concepts · files · errors · resolutions</text>
|
||||
<text x="290" y="192" fill="#94a3b8" font-size="8">user messages · todos · current state · next steps</text>
|
||||
<text x="290" y="206" fill="#94a3b8" font-size="9">Generated only once</text>
|
||||
|
||||
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">Step 3: Replace message list</text>
|
||||
<text x="540" y="152" fill="#991b1b" font-size="10">All old messages → 1 summary</text>
|
||||
<text x="540" y="168" fill="#991b1b" font-size="10">Model continues from summary</text>
|
||||
<text x="540" y="184" fill="#991b1b" font-size="10">Includes recently_read file list</text>
|
||||
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ This is an irreversible operation</text>
|
||||
|
||||
<!-- Before/After comparison -->
|
||||
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">Before messages</text>
|
||||
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
|
||||
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 messages, occupying 62K tokens</text>
|
||||
|
||||
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">After messages</text>
|
||||
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] Summary: goal → create hello.py ...</text>
|
||||
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">Recent files: hello.py, README.md ...</text>
|
||||
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 message, occupying 1K tokens</text>
|
||||
|
||||
<!-- Circuit breaker -->
|
||||
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">Circuit breaker:</text>
|
||||
<text x="130" y="362" fill="#475569" font-size="10">3 consecutive autocompact failures → stop retrying. Prevents wasting API calls when context is unrecoverable.</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.7 KiB |
72
s08_context_compact/images/auto-compact.ja.svg
Normal file
@@ -0,0 +1,72 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM 完全要約</text>
|
||||
|
||||
<!-- トリガー条件 -->
|
||||
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">トリガー条件</text>
|
||||
<text x="115" y="70" fill="#991b1b" font-size="11">前 3 層の前処理を全て実行後、推定 token > contextWindow - maxOutputTokens - 13_000。</text>
|
||||
<text x="115" y="86" fill="#991b1b" font-size="10">まず sessionMemoryCompact を試行(既存のメモリで軽量要約)、不足時のみ LLM を呼び出し。</text>
|
||||
|
||||
<!-- ステップ -->
|
||||
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">ステップ 1:transcript 保存</text>
|
||||
<text x="40" y="152" fill="#475569" font-size="10">完全な対話を .transcripts/ に書き込み</text>
|
||||
<text x="40" y="168" fill="#475569" font-size="10">JSONL 形式、1 行 1 メッセージ</text>
|
||||
<text x="40" y="184" fill="#475569" font-size="10">ファイル名:transcript_{timestamp}.jsonl</text>
|
||||
<text x="40" y="200" fill="#94a3b8" font-size="9">情報は失われていない、アクティブ領域から移動のみ</text>
|
||||
|
||||
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">ステップ 2:LLM 要約生成</text>
|
||||
<text x="290" y="152" fill="#475569" font-size="10">対話履歴を LLM に送信</text>
|
||||
<text x="290" y="166" fill="#475569" font-size="9">要約は 9 つのセクションを含む:</text>
|
||||
<text x="290" y="180" fill="#94a3b8" font-size="8">リクエスト・概念・ファイル・エラー・解決</text>
|
||||
<text x="290" y="192" fill="#94a3b8" font-size="8">ユーザーメッセージ・TODO・現在・次ステップ</text>
|
||||
<text x="290" y="206" fill="#94a3b8" font-size="9">1 回のみ生成</text>
|
||||
|
||||
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">ステップ 3:メッセージリスト置換</text>
|
||||
<text x="540" y="152" fill="#991b1b" font-size="10">全旧メッセージ → 1 件の要約に</text>
|
||||
<text x="540" y="168" fill="#991b1b" font-size="10">モデルは要約から作業を継続</text>
|
||||
<text x="540" y="184" fill="#991b1b" font-size="10">recently_read ファイルリストを付与</text>
|
||||
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ これは復元不可能な操作</text>
|
||||
|
||||
<!-- 圧縮前/後 比較 -->
|
||||
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">圧縮前 messages</text>
|
||||
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
|
||||
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 件のメッセージ、62K トークンを占有</text>
|
||||
|
||||
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">圧縮後 messages</text>
|
||||
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] 要約:目標 → hello.py を作成 ...</text>
|
||||
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">最近のファイル:hello.py, README.md ...</text>
|
||||
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 件のメッセージ、1K トークンを占有</text>
|
||||
|
||||
<!-- サーキットブレーカー -->
|
||||
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">サーキットブレーカー:</text>
|
||||
<text x="145" y="362" fill="#475569" font-size="10">autocompact が連続 3 回失敗 → リトライ停止。コンテキストが復元不可能な場合の API 呼び出しの無駄な反復を防止。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.0 KiB |
72
s08_context_compact/images/auto-compact.svg
Normal file
@@ -0,0 +1,72 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 400" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#991b1b"/><stop offset="100%" stop-color="#dc2626"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="400" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L4: autoCompact — LLM 全量摘要</text>
|
||||
|
||||
<!-- 触发条件 -->
|
||||
<rect x="20" y="54" width="680" height="44" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">触发条件</text>
|
||||
<text x="105" y="70" fill="#991b1b" font-size="11">前三层预处理全跑完,估算 token > contextWindow - maxOutputTokens - 13_000。</text>
|
||||
<text x="105" y="86" fill="#991b1b" font-size="10">先尝试 sessionMemoryCompact(用已有记忆做轻量摘要),不足才调 LLM。</text>
|
||||
|
||||
<!-- 步骤 -->
|
||||
<rect x="20" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="120" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">步骤 1:保存 transcript</text>
|
||||
<text x="40" y="152" fill="#475569" font-size="10">完整对话写入 .transcripts/</text>
|
||||
<text x="40" y="168" fill="#475569" font-size="10">JSONL 格式,一行一条消息</text>
|
||||
<text x="40" y="184" fill="#475569" font-size="10">文件名:transcript_{timestamp}.jsonl</text>
|
||||
<text x="40" y="200" fill="#94a3b8" font-size="9">信息没有丢失,只是移出活跃区</text>
|
||||
|
||||
<line x1="225" y1="161" x2="265" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="270" y="106" width="200" height="110" rx="8" fill="#fff" stroke="#94a3b8" stroke-width="1.5"/>
|
||||
<text x="370" y="130" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">步骤 2:LLM 生成摘要</text>
|
||||
<text x="290" y="152" fill="#475569" font-size="10">把对话历史发给 LLM</text>
|
||||
<text x="290" y="166" fill="#475569" font-size="9">摘要需包含 9 个部分:</text>
|
||||
<text x="290" y="180" fill="#94a3b8" font-size="8">请求·概念·文件·错误·解决</text>
|
||||
<text x="290" y="192" fill="#94a3b8" font-size="8">用户消息·待办·当前·下一步</text>
|
||||
<text x="290" y="206" fill="#94a3b8" font-size="9">只生成一次</text>
|
||||
|
||||
<line x1="475" y1="161" x2="515" y2="161" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="520" y="106" width="180" height="110" rx="8" fill="#fef2f2" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="610" y="130" fill="#991b1b" font-size="12" font-weight="700" text-anchor="middle">步骤 3:替换消息列表</text>
|
||||
<text x="540" y="152" fill="#991b1b" font-size="10">所有旧消息 → 1 条摘要</text>
|
||||
<text x="540" y="168" fill="#991b1b" font-size="10">模型从摘要继续工作</text>
|
||||
<text x="540" y="184" fill="#991b1b" font-size="10">附带 recently_read 文件列表</text>
|
||||
<text x="540" y="200" fill="#ef4444" font-size="9">⚠ 这是无法恢复的操作</text>
|
||||
|
||||
<!-- Before/After 对比 -->
|
||||
<rect x="20" y="234" width="320" height="94" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="180" y="256" fill="#64748b" font-size="11" font-weight="600" text-anchor="middle">压缩前 messages</text>
|
||||
<rect x="35" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="40" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="92" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="97" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="149" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="154" y="276" fill="#475569" font-size="8">user</text>
|
||||
<rect x="206" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="211" y="276" fill="#475569" font-size="8">assistant</text>
|
||||
<rect x="263" y="264" width="52" height="16" rx="3" fill="#e2e8f0"/><text x="268" y="276" fill="#475569" font-size="8">user</text>
|
||||
<text x="180" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~180 条消息,占 62K token</text>
|
||||
|
||||
<line x1="345" y1="281" x2="375" y2="281" stroke="#dc2626" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<rect x="380" y="234" width="320" height="94" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1"/>
|
||||
<text x="540" y="256" fill="#991b1b" font-size="11" font-weight="600" text-anchor="middle">压缩后 messages</text>
|
||||
<rect x="395" y="264" width="290" height="32" rx="4" fill="#fee2e2" stroke="#fca5a5" stroke-width="0.5"/>
|
||||
<text x="540" y="276" fill="#991b1b" font-size="9" text-anchor="middle">[Compacted] 摘要:目标 → 创建 hello.py ...</text>
|
||||
<text x="540" y="290" fill="#991b1b" font-size="9" text-anchor="middle">最近文件:hello.py, README.md ...</text>
|
||||
<text x="540" y="318" fill="#94a3b8" font-size="9" text-anchor="middle">~1 条消息,占 1K token</text>
|
||||
|
||||
<!-- 熔断器 -->
|
||||
<rect x="20" y="340" width="680" height="36" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="362" fill="#475569" font-size="11" font-weight="600">熔断器:</text>
|
||||
<text x="95" y="362" fill="#475569" font-size="10">连续 autocompact 失败 3 次 → 停止重试。防止上下文不可恢复时反复浪费 API 调用。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.6 KiB |
138
s08_context_compact/images/compact-overview.en.svg
Normal file
@@ -0,0 +1,138 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- Background -->
|
||||
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
|
||||
<text x="410" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Context Compact — Compression Before LLM Call, Three Trigger Modes</text>
|
||||
|
||||
<!-- Labels -->
|
||||
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 Preserved</text>
|
||||
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 New</text>
|
||||
|
||||
<!-- ===== ① messages[] ===== -->
|
||||
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 preserved)</text>
|
||||
|
||||
<!-- messages → pipeline entry -->
|
||||
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
|
||||
|
||||
<!-- ===== ② Compression Pipeline ===== -->
|
||||
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">Compression Pipeline</text>
|
||||
|
||||
<!-- ── ① Every Turn Auto ── -->
|
||||
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
|
||||
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① Every Turn · Unconditional · 0 API</text>
|
||||
|
||||
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
|
||||
|
||||
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
|
||||
|
||||
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
|
||||
|
||||
<!-- ↓ → ◇ -->
|
||||
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ◇ Decision Diamond -->
|
||||
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">Over threshold?</text>
|
||||
|
||||
<!-- No: right annotation -->
|
||||
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">No → Pass</text>
|
||||
<text x="306" y="252" fill="#94a3b8" font-size="7">Straight to LLM</text>
|
||||
|
||||
<!-- Yes: below annotation -->
|
||||
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">Yes↓</text>
|
||||
|
||||
<!-- ── ② Conditional Trigger ── -->
|
||||
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
|
||||
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② Conditional · Token Over Threshold · 1 API</text>
|
||||
|
||||
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
|
||||
|
||||
<!-- Pipeline exit → LLM -->
|
||||
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== ③ LLM ===== -->
|
||||
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM No → Return -->
|
||||
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600">No</text>
|
||||
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">Return Result</text>
|
||||
|
||||
<!-- LLM Yes → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS -->
|
||||
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
|
||||
|
||||
<!-- LLM API error → emergency compact → retry next turn -->
|
||||
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API error</text>
|
||||
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">retry to compression pipeline</text>
|
||||
|
||||
<!-- ===== ③ Emergency Trigger (after LLM API failure) ===== -->
|
||||
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ Emergency Trigger</text>
|
||||
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API returns prompt_too_long</text>
|
||||
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → retry</text>
|
||||
|
||||
<!-- ===== Loop Back ===== -->
|
||||
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">Tool results appended to messages[] → next turn → compress again → LLM</text>
|
||||
|
||||
<!-- ===== Legend ===== -->
|
||||
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
|
||||
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="94" y="414" fill="#334155" font-size="10">s07 Preserved: loop, hooks, skill loading, sub-agents</text>
|
||||
|
||||
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="94" y="436" fill="#334155" font-size="10">① Every Turn Auto: L3→L1→L2 run unconditionally before each LLM call, 0 API</text>
|
||||
|
||||
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="94" y="458" fill="#334155" font-size="10">② Conditional: after L3/L1/L2, tokens still over threshold → compact_history, 1 API</text>
|
||||
|
||||
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
|
||||
<text x="94" y="480" fill="#334155" font-size="10">③ Emergency: API returns prompt_too_long → reactive_compact → retry</text>
|
||||
|
||||
<text x="70" y="498" fill="#94a3b8" font-size="9">Three modes with increasing cost: 0 API → 1 API → 1 API + more aggressive trimming</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 9.0 KiB |
138
s08_context_compact/images/compact-overview.ja.svg
Normal file
@@ -0,0 +1,138 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
|
||||
<text x="410" y="31" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Context Compact — LLM 呼び出し前に圧縮、3 つのトリガーモード</text>
|
||||
|
||||
<!-- ラベル -->
|
||||
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 保持</text>
|
||||
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 新規</text>
|
||||
|
||||
<!-- ===== ① messages[] ===== -->
|
||||
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 保持)</text>
|
||||
|
||||
<!-- messages → パイプライン入口 -->
|
||||
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
|
||||
|
||||
<!-- ===== ② 圧縮パイプライン ===== -->
|
||||
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">圧縮パイプライン</text>
|
||||
|
||||
<!-- ── ① 毎ターン自動 ── -->
|
||||
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
|
||||
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① 毎ターン自動 · 無条件 · 0 API</text>
|
||||
|
||||
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
|
||||
|
||||
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
|
||||
|
||||
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
|
||||
|
||||
<!-- ↓ → ◇ -->
|
||||
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ◇ 判定ダイヤモンド -->
|
||||
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">閾値超過?</text>
|
||||
|
||||
<!-- いいえ:右側注釈 -->
|
||||
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">No → 通過</text>
|
||||
<text x="306" y="252" fill="#94a3b8" font-size="7">直接 LLM へ</text>
|
||||
|
||||
<!-- はい:下注釈 -->
|
||||
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">Yes↓</text>
|
||||
|
||||
<!-- ── ② 条件トリガー ── -->
|
||||
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
|
||||
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② 条件 · トークン閾値超過 · 1 API</text>
|
||||
|
||||
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
|
||||
|
||||
<!-- パイプライン出口 → LLM -->
|
||||
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== ③ LLM ===== -->
|
||||
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM No → 返却 -->
|
||||
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600">No</text>
|
||||
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
|
||||
<!-- LLM Yes → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600">Yes</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS -->
|
||||
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
|
||||
|
||||
<!-- LLM API 例外 → 緊急圧縮 → 次ターンで再試行 -->
|
||||
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API 例外</text>
|
||||
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">圧縮パイプラインへ再試行</text>
|
||||
|
||||
<!-- ===== ③ 緊急トリガー(LLM API 失敗後) ===== -->
|
||||
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ 緊急トリガー</text>
|
||||
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API が prompt_too_long を返す</text>
|
||||
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → リトライ</text>
|
||||
|
||||
<!-- ===== ループバック ===== -->
|
||||
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">ツール結果を messages[] に追加 → 次ターン → 再圧縮 → LLM</text>
|
||||
|
||||
<!-- ===== 凡例 ===== -->
|
||||
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
|
||||
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="94" y="414" fill="#334155" font-size="10">s07 保持:ループ、フック、スキルロード、サブエージェント</text>
|
||||
|
||||
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="94" y="436" fill="#334155" font-size="10">① 毎ターン自動:L3→L1→L2 が各 LLM 呼び出し前に無条件実行、0 API</text>
|
||||
|
||||
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="94" y="458" fill="#334155" font-size="10">② 条件トリガー:L3/L1/L2 後もトークン超過 → compact_history、1 API</text>
|
||||
|
||||
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
|
||||
<text x="94" y="480" fill="#334155" font-size="10">③ 緊急トリガー:API が prompt_too_long を返す → reactive_compact → リトライ</text>
|
||||
|
||||
<text x="70" y="498" fill="#94a3b8" font-size="9">3 つのモードはコスト増加:0 API → 1 API → 1 API + より積極的なトリム</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 9.2 KiB |
138
s08_context_compact/images/compact-overview.svg
Normal file
@@ -0,0 +1,138 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 520" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-blue" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#2563eb"/>
|
||||
</marker>
|
||||
<marker id="arrow-amber" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d97706"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
<marker id="arrow-red" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#dc2626"/>
|
||||
</marker>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/>
|
||||
<stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<!-- 背景 -->
|
||||
<rect width="820" height="520" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题 -->
|
||||
<rect x="0" y="0" width="820" height="48" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="40" width="820" height="8" fill="url(#header)"/>
|
||||
<text x="410" y="31" fill="#fff" font-size="16" font-weight="700" text-anchor="middle">Context Compact — 压缩插在 LLM 调用前,三种触发模式</text>
|
||||
|
||||
<!-- 标签 -->
|
||||
<text x="50" y="74" fill="#94a3b8" font-size="11" font-weight="600">s07 保留</text>
|
||||
<text x="180" y="74" fill="#d97706" font-size="11" font-weight="600">s08 新增</text>
|
||||
|
||||
<!-- ===== ① messages[] ===== -->
|
||||
<rect x="40" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="90" y="155" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
<text x="90" y="172" fill="#64748b" font-size="9" text-anchor="middle">(s07 保留)</text>
|
||||
|
||||
<!-- messages → 管线入口 -->
|
||||
<line x1="140" y1="158" x2="168" y2="158" stroke="#d97706" stroke-width="2" marker-end="url(#arrow-amber)"/>
|
||||
|
||||
<!-- ===== ② 压缩管线(内部只放标签,不画路径线) ===== -->
|
||||
<rect x="170" y="82" width="200" height="252" rx="10" fill="#fffbeb" stroke="#d97706" stroke-width="2"/>
|
||||
<text x="270" y="102" fill="#92400e" font-size="11" font-weight="700" text-anchor="middle">压缩管线</text>
|
||||
|
||||
<!-- ── ① 每轮自动 ── -->
|
||||
<rect x="186" y="110" width="168" height="16" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="0.8"/>
|
||||
<text x="270" y="122" fill="#92400e" font-size="8" font-weight="700" text-anchor="middle">① 每轮自动 · 无条件 · 0 API</text>
|
||||
|
||||
<rect x="186" y="130" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="146" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L3 tool_result_budget</text>
|
||||
|
||||
<rect x="186" y="158" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="174" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L1 snip_compact</text>
|
||||
|
||||
<rect x="186" y="186" width="168" height="24" rx="4" fill="#fef3c7" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="270" y="202" fill="#92400e" font-size="10" font-weight="600" text-anchor="middle">L2 micro_compact</text>
|
||||
|
||||
<!-- ↓ → ◇ -->
|
||||
<line x1="270" y1="210" x2="270" y2="222" stroke="#555" stroke-width="1.2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ◇ 判断菱形(紧凑) -->
|
||||
<polygon points="270,226 300,244 270,262 240,244" fill="#f0f4ff" stroke="#ea580c" stroke-width="1.5"/>
|
||||
<text x="270" y="247" fill="#9a3412" font-size="7" font-weight="600" text-anchor="middle">超阈值?</text>
|
||||
|
||||
<!-- 否:右侧文字标注 -->
|
||||
<text x="306" y="240" fill="#16a34a" font-size="9" font-weight="700">否 → 通过</text>
|
||||
<text x="306" y="252" fill="#94a3b8" font-size="7">直接进 LLM</text>
|
||||
|
||||
<!-- 是:下方文字标注 -->
|
||||
<text x="284" y="260" fill="#ea580c" font-size="8" font-weight="600">是↓</text>
|
||||
|
||||
<!-- ── ② 条件触发 ── -->
|
||||
<rect x="186" y="268" width="168" height="16" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="0.8"/>
|
||||
<text x="270" y="280" fill="#9a3412" font-size="8" font-weight="700" text-anchor="middle">② 条件触发 · token 超阈值 · 1 API</text>
|
||||
|
||||
<rect x="186" y="288" width="168" height="24" rx="4" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="270" y="304" fill="#9a3412" font-size="10" font-weight="600" text-anchor="middle">L4 compact_history</text>
|
||||
|
||||
<!-- 管线出口 → LLM -->
|
||||
<line x1="370" y1="158" x2="438" y2="158" stroke="#2563eb" stroke-width="2" marker-end="url(#arrow-blue)"/>
|
||||
|
||||
<!-- ===== ③ LLM ===== -->
|
||||
<rect x="440" y="132" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="490" y="155" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="490" y="172" fill="#64748b" font-size="9" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
|
||||
<!-- LLM 否 → 返回 -->
|
||||
<line x1="490" y1="184" x2="490" y2="278" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow-green)"/>
|
||||
<text x="502" y="262" fill="#16a34a" font-size="10" font-weight="600">否</text>
|
||||
<rect x="435" y="280" width="110" height="26" rx="13" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="490" y="297" fill="#166534" font-size="11" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
|
||||
<!-- LLM 是 → TOOL_HANDLERS -->
|
||||
<line x1="540" y1="158" x2="578" y2="158" stroke="#555" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
<text x="554" y="150" fill="#64748b" font-size="10" font-weight="600">是</text>
|
||||
|
||||
<!-- ④ TOOL_HANDLERS -->
|
||||
<rect x="580" y="126" width="130" height="64" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="645" y="150" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="645" y="166" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="645" y="180" fill="#64748b" font-size="9" text-anchor="middle">task · load_skill · ...</text>
|
||||
|
||||
<!-- LLM API 异常 → 应急压缩 → 下一轮重试 -->
|
||||
<path d="M 535 184 L 570 216 L 580 228" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="552" y="204" fill="#991b1b" font-size="8" font-weight="600">API 异常</text>
|
||||
<path d="M 665 266 L 665 340 L 160 340 L 160 142 L 186 142" fill="none" stroke="#dc2626" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
|
||||
<text x="530" y="328" fill="#991b1b" font-size="8" font-weight="600">重试回到压缩管线</text>
|
||||
|
||||
<!-- ===== ③ 异常触发(LLM API 调用失败后) ===== -->
|
||||
<rect x="580" y="210" width="170" height="56" rx="6" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="665" y="228" fill="#991b1b" font-size="9" font-weight="700" text-anchor="middle">③ 异常触发</text>
|
||||
<text x="665" y="242" fill="#991b1b" font-size="8" text-anchor="middle">API 返回 prompt_too_long</text>
|
||||
<text x="665" y="256" fill="#991b1b" font-size="8" text-anchor="middle">→ reactive_compact → 重试</text>
|
||||
|
||||
<!-- ===== 回环(y=348 在管线框底 y=334 下方,完全不穿过) ===== -->
|
||||
<path d="M 710 158 L 760 158 L 760 348 L 90 348 L 90 184" fill="none" stroke="#555" stroke-width="2" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="410" y="366" fill="#64748b" font-size="10" text-anchor="middle">工具结果追加到 messages[] → 下一轮 → 再次压缩 → LLM</text>
|
||||
|
||||
<!-- ===== 图例 ===== -->
|
||||
<rect x="50" y="390" width="720" height="116" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
|
||||
<rect x="70" y="404" width="16" height="12" rx="3" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="94" y="414" fill="#334155" font-size="10">s07 保留:循环、hook、技能加载、子 Agent</text>
|
||||
|
||||
<rect x="70" y="426" width="16" height="12" rx="3" fill="#fde68a" stroke="#d97706" stroke-width="1"/>
|
||||
<text x="94" y="436" fill="#334155" font-size="10">① 每轮自动:L3→L1→L2 在每次 LLM 调用前无条件执行,0 API</text>
|
||||
|
||||
<rect x="70" y="448" width="16" height="12" rx="3" fill="#fed7aa" stroke="#ea580c" stroke-width="1"/>
|
||||
<text x="94" y="458" fill="#334155" font-size="10">② 条件触发:L3/L1/L2 跑完 token 仍超阈值 → compact_history,1 API</text>
|
||||
|
||||
<rect x="70" y="470" width="16" height="12" rx="3" fill="#fef2f2" stroke="#dc2626" stroke-width="1" stroke-dasharray="3,2"/>
|
||||
<text x="94" y="480" fill="#334155" font-size="10">③ 异常触发:API 返回 prompt_too_long → reactive_compact → 重试</text>
|
||||
|
||||
<text x="70" y="498" fill="#94a3b8" font-size="9">三种模式的代价递增:0 API → 1 API → 1 API + 更激进的裁剪</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 9.0 KiB |
98
s08_context_compact/images/compaction-layers.en.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
|
||||
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title bar -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Context Compaction — Pre-processing Pipeline + Auto-compact + Emergency Fallback</text>
|
||||
|
||||
<!-- Design principles (left) -->
|
||||
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Design Principles</text>
|
||||
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">Cheap operations first, expensive later</text>
|
||||
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">Trim text before dropping messages</text>
|
||||
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">Drop messages before calling LLM</text>
|
||||
|
||||
<!-- Cost escalation (right) -->
|
||||
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">Increasing Cost</text>
|
||||
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">Text ops → LLM summary → Emergency trim</text>
|
||||
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
|
||||
|
||||
<!-- ===== Pre-processing pipeline title ===== -->
|
||||
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">Pre-processing Pipeline (execution order: L3 → L1 → L2, before every LLM call, 0 API)</text>
|
||||
|
||||
<!-- L3: toolResultBudget -->
|
||||
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
|
||||
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
|
||||
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result total > 200KB → spill largest item</text>
|
||||
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">keep full content</text>
|
||||
<text x="135" y="218" fill="#2563eb" font-size="9">Trigger: every turn, before microCompact can replace full content</text>
|
||||
|
||||
<!-- Arrow L3→L1 -->
|
||||
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L1: snipCompact -->
|
||||
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
|
||||
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
|
||||
<text x="260" y="260" fill="#1e40af" font-size="11">messages > 50 → trim middle</text>
|
||||
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">keep head/tail</text>
|
||||
<text x="135" y="278" fill="#2563eb" font-size="9">Trigger: message count exceeds threshold</text>
|
||||
|
||||
<!-- Arrow L1→L2 -->
|
||||
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L2: microCompact -->
|
||||
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
|
||||
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
|
||||
<text x="260" y="320" fill="#1e40af" font-size="11">old tool_result → placeholder (keep latest 3)</text>
|
||||
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">compact old</text>
|
||||
<text x="135" y="338" fill="#2563eb" font-size="9">Trigger: every turn automatically; tutorial uses text placeholder</text>
|
||||
|
||||
<!-- ===== Auto-compact title ===== -->
|
||||
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">Auto-compact Decision (triggered when pre-processing is insufficient, 1 API call)</text>
|
||||
|
||||
<!-- L4: autoCompact -->
|
||||
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
|
||||
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
|
||||
<text x="260" y="412" fill="#991b1b" font-size="11">tokens over threshold → LLM summary</text>
|
||||
<text x="650" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API call</text>
|
||||
<text x="135" y="428" fill="#dc2626" font-size="9">Threshold: contextWindow - maxOutputTokens - 13,000 · Try sessionMemoryCompact first, then LLM</text>
|
||||
<text x="135" y="442" fill="#dc2626" font-size="9">Circuit breaker: stop retrying after 3 consecutive failures</text>
|
||||
|
||||
<!-- ===== Emergency fallback title ===== -->
|
||||
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">Emergency Fallback (triggered when API still returns prompt_too_long)</text>
|
||||
|
||||
<!-- Emergency: reactiveCompact -->
|
||||
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
|
||||
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">Emrg</text>
|
||||
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
|
||||
<text x="135" y="528" fill="#9a3412" font-size="10">API returns 413 / prompt_too_long → byte-level trim</text>
|
||||
<text x="135" y="544" fill="#c2410c" font-size="9">Keep last 5 + summary; more aggressive than autoCompact</text>
|
||||
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.7 KiB |
98
s08_context_compact/images/compaction-layers.ja.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
|
||||
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトルバー -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">コンテキスト圧縮 — 前処理パイプライン + 自動圧縮 + 緊急フォールバック</text>
|
||||
|
||||
<!-- 設計原則(左側) -->
|
||||
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">設計原則</text>
|
||||
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">安価な処理を先に、高価な処理を後に</text>
|
||||
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">テキスト修正 → メッセージ削除の順</text>
|
||||
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">メッセージ削除 → LLM 呼び出しの順</text>
|
||||
|
||||
<!-- コスト増加(右側) -->
|
||||
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">コスト増加</text>
|
||||
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">テキスト操作 → LLM 要約 → 緊急トリム</text>
|
||||
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
|
||||
|
||||
<!-- ===== 前処理パイプラインタイトル ===== -->
|
||||
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">前処理パイプライン(実行順:L3 → L1 → L2、各 LLM 呼び出し前に自動実行、0 API)</text>
|
||||
|
||||
<!-- L3: toolResultBudget -->
|
||||
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
|
||||
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
|
||||
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result 合計 > 200KB → 最大項目を退避</text>
|
||||
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">完全内容を保持</text>
|
||||
<text x="135" y="218" fill="#2563eb" font-size="9">トリガー:毎ターン、microCompact が完全内容を置換する前に実行</text>
|
||||
|
||||
<!-- 矢印 L3→L1 -->
|
||||
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L1: snipCompact -->
|
||||
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
|
||||
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
|
||||
<text x="260" y="260" fill="#1e40af" font-size="11">メッセージ > 50 → 中間をトリム</text>
|
||||
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">先頭/末尾保持</text>
|
||||
<text x="135" y="278" fill="#2563eb" font-size="9">トリガー:メッセージ数が閾値を超過</text>
|
||||
|
||||
<!-- 矢印 L1→L2 -->
|
||||
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L2: microCompact -->
|
||||
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
|
||||
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
|
||||
<text x="260" y="320" fill="#1e40af" font-size="11">古い tool_result → プレースホルダー(最新 3 件保持)</text>
|
||||
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">旧結果を圧縮</text>
|
||||
<text x="135" y="338" fill="#2563eb" font-size="9">トリガー:毎ターン自動実行、チュートリアル版はテキストプレースホルダーで模擬</text>
|
||||
|
||||
<!-- ===== 自動圧縮タイトル ===== -->
|
||||
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">自動圧縮判定(前処理で不足時にトリガー、1 API 呼び出し)</text>
|
||||
|
||||
<!-- L4: autoCompact -->
|
||||
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
|
||||
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
|
||||
<text x="260" y="412" fill="#991b1b" font-size="11">トークンが閾値超過 → LLM 全量要約</text>
|
||||
<text x="590" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API 呼び出し</text>
|
||||
<text x="135" y="428" fill="#dc2626" font-size="9">閾値: contextWindow - maxOutputTokens - 13,000 · sessionMemoryCompact を先に試行、不足時のみ LLM 呼び出し</text>
|
||||
<text x="135" y="442" fill="#dc2626" font-size="9">サーキットブレーカー:連続 3 回失敗後にリトライ停止</text>
|
||||
|
||||
<!-- ===== 緊急フォールバックタイトル ===== -->
|
||||
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">緊急フォールバック(API が引き続き prompt_too_long を返す場合にトリガー)</text>
|
||||
|
||||
<!-- 緊急: reactiveCompact -->
|
||||
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
|
||||
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">緊急</text>
|
||||
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
|
||||
<text x="135" y="528" fill="#9a3412" font-size="10">API が 413 / prompt_too_long を返す → バイト単位でトリム</text>
|
||||
<text x="135" y="544" fill="#c2410c" font-size="9">最後の 5 件 + 要約を保持、autoCompact より積極的</text>
|
||||
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.1 KiB |
98
s08_context_compact/images/compaction-layers.svg
Normal file
@@ -0,0 +1,98 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 590" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="pre" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#dbeafe"/><stop offset="100%" stop-color="#bfdbfe"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="auto" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fecaca"/><stop offset="100%" stop-color="#fca5a5"/>
|
||||
</linearGradient>
|
||||
<linearGradient id="emergency" x1="0" y1="0" x2="0" y2="1">
|
||||
<stop offset="0%" stop-color="#fed7aa"/><stop offset="100%" stop-color="#fdba74"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow-d" viewBox="0 0 10 10" refX="5" refY="10" markerWidth="6" markerHeight="6" orient="auto">
|
||||
<path d="M 0 0 L 5 10 L 10 0 z" fill="#94a3b8"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="590" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- 标题栏 -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">上下文压缩 — 预处理管线 + 自动压缩 + 应急兜底</text>
|
||||
|
||||
<!-- 左侧说明 -->
|
||||
<rect x="20" y="62" width="220" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="130" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">设计原则</text>
|
||||
<text x="130" y="100" fill="#475569" font-size="10" text-anchor="middle">便宜的先跑,贵的后跑</text>
|
||||
<text x="130" y="116" fill="#475569" font-size="10" text-anchor="middle">能改文本 → 不删整条</text>
|
||||
<text x="130" y="132" fill="#475569" font-size="10" text-anchor="middle">能删整条 → 不调 LLM</text>
|
||||
|
||||
<!-- 右侧代价箭头 -->
|
||||
<rect x="530" y="62" width="210" height="80" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="635" y="82" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">代价递增</text>
|
||||
<text x="635" y="104" fill="#475569" font-size="10" text-anchor="middle">文本操作 → LLM 摘要 → 应急裁剪</text>
|
||||
<text x="635" y="124" fill="#94a3b8" font-size="9" text-anchor="middle">0 API · 0 API · 0 API · 1 API · 1 API</text>
|
||||
|
||||
<!-- ===== 预处理管线标题 ===== -->
|
||||
<rect x="20" y="146" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="163" fill="#64748b" font-size="11" font-weight="600">预处理管线(执行顺序:L3 → L1 → L2,每轮 LLM 调用前自动执行,0 API)</text>
|
||||
|
||||
<!-- L3: toolResultBudget -->
|
||||
<rect x="80" y="180" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="200" fill="#1e40af" font-size="12" font-weight="600">L3</text>
|
||||
<text x="135" y="200" fill="#1e40af" font-size="13" font-weight="700">toolResultBudget</text>
|
||||
<text x="260" y="200" fill="#1e40af" font-size="11">tool_result 总和 > 200KB → 最大项落盘</text>
|
||||
<text x="650" y="200" fill="#1e40af" font-size="10" text-anchor="end">保留完整内容</text>
|
||||
<text x="135" y="218" fill="#2563eb" font-size="9">触发:每轮自动,必须在 microCompact 之前保留完整内容</text>
|
||||
|
||||
<!-- 箭头 L3→L1 -->
|
||||
<line x1="380" y1="226" x2="380" y2="238" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L1: snipCompact -->
|
||||
<rect x="80" y="240" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="260" fill="#1e40af" font-size="12" font-weight="600">L1</text>
|
||||
<text x="135" y="260" fill="#1e40af" font-size="13" font-weight="700">snipCompact</text>
|
||||
<text x="260" y="260" fill="#1e40af" font-size="11">消息 > 50 条 → 裁掉中间</text>
|
||||
<text x="650" y="260" fill="#1e40af" font-size="10" text-anchor="end">保留头尾</text>
|
||||
<text x="135" y="278" fill="#2563eb" font-size="9">触发:消息数超过阈值</text>
|
||||
|
||||
<!-- 箭头 L1→L2 -->
|
||||
<line x1="380" y1="286" x2="380" y2="298" stroke="#94a3b8" stroke-width="1" marker-end="url(#arrow-d)"/>
|
||||
|
||||
<!-- L2: microCompact -->
|
||||
<rect x="80" y="300" width="600" height="46" rx="7" fill="url(#pre)" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="100" y="320" fill="#1e40af" font-size="12" font-weight="600">L2</text>
|
||||
<text x="135" y="320" fill="#1e40af" font-size="13" font-weight="700">microCompact</text>
|
||||
<text x="260" y="320" fill="#1e40af" font-size="11">旧 tool_result → 占位符(保留最近 3 条)</text>
|
||||
<text x="650" y="320" fill="#1e40af" font-size="10" text-anchor="end">压旧结果</text>
|
||||
<text x="135" y="338" fill="#2563eb" font-size="9">触发:每轮自动,教学版用文本占位符模拟</text>
|
||||
|
||||
<!-- ===== 自动压缩标题 ===== -->
|
||||
<rect x="20" y="358" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="70" y="375" fill="#64748b" font-size="11" font-weight="600">自动压缩决策(预处理不够时触发,1 API 调用)</text>
|
||||
|
||||
<!-- L4: autoCompact -->
|
||||
<rect x="80" y="390" width="600" height="58" rx="7" fill="url(#auto)" stroke="#dc2626" stroke-width="2"/>
|
||||
<text x="100" y="412" fill="#991b1b" font-size="12" font-weight="600">L4</text>
|
||||
<text x="135" y="412" fill="#991b1b" font-size="13" font-weight="700">autoCompact</text>
|
||||
<text x="260" y="412" fill="#991b1b" font-size="11">token 超阈值 → LLM 全量摘要</text>
|
||||
<text x="590" y="412" fill="#991b1b" font-size="10" text-anchor="end">1 API 调用</text>
|
||||
<text x="135" y="428" fill="#dc2626" font-size="9">阈值: contextWindow - maxOutputTokens - 13,000 · 先尝试 sessionMemoryCompact,不够才调 LLM</text>
|
||||
<text x="135" y="442" fill="#dc2626" font-size="9">熔断:连续失败 3 次后停止重试</text>
|
||||
|
||||
<!-- ===== 应急兜底标题 ===== -->
|
||||
<rect x="20" y="460" width="720" height="24" rx="4" fill="#f1f5f9"/>
|
||||
<text x="55" y="477" fill="#64748b" font-size="11" font-weight="600">应急兜底(API 仍然返回 prompt_too_long 时触发)</text>
|
||||
|
||||
<!-- 应急: reactiveCompact -->
|
||||
<rect x="80" y="492" width="600" height="62" rx="7" fill="url(#emergency)" stroke="#c2410c" stroke-width="1.5"/>
|
||||
<text x="100" y="512" fill="#9a3412" font-size="12" font-weight="600">应急</text>
|
||||
<text x="135" y="512" fill="#9a3412" font-size="13" font-weight="700">reactiveCompact</text>
|
||||
<text x="135" y="528" fill="#9a3412" font-size="10">API 返回 413 / prompt_too_long → 字节级裁剪</text>
|
||||
<text x="135" y="544" fill="#c2410c" font-size="9">保留最后 5 条 + 摘要,比 autoCompact 更激进</text>
|
||||
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.6 KiB |
50
s08_context_compact/images/layer1-budget.en.svg
Normal file
@@ -0,0 +1,50 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — Large Result Persistence</text>
|
||||
|
||||
<!-- Pain Point -->
|
||||
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">Pain Point</text>
|
||||
<text x="105" y="72" fill="#991b1b" font-size="11">Model read 30 files in one turn; total tool_result adds up to 500KB, filling the entire context window</text>
|
||||
|
||||
<!-- Before -->
|
||||
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">Before</text>
|
||||
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
|
||||
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
|
||||
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
|
||||
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">Total 510KB → over budget</text>
|
||||
|
||||
<!-- Arrow -->
|
||||
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- After -->
|
||||
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">After</text>
|
||||
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: <persisted-output></text>
|
||||
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
|
||||
<text x="395" y="178" fill="#166534" font-size="9">Preview: (first 2000 chars) ...</text>
|
||||
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">Total 18KB → normal</text>
|
||||
|
||||
<!-- How it works -->
|
||||
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">How</text>
|
||||
<text x="70" y="234" fill="#475569" font-size="10">1. Sum the size of all tool_result in the latest turn</text>
|
||||
<text x="70" y="250" fill="#475569" font-size="10">2. Over 200KB → sort by size, persist the largest to .task_outputs/tool-results/</text>
|
||||
<text x="70" y="266" fill="#475569" font-size="10">3. Keep only <persisted-output> marker + first 2000 chars preview in context</text>
|
||||
|
||||
<!-- Result summary -->
|
||||
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="35" y="312" fill="#166534" font-size="11">Result: No data lost (full data on disk), context drops from 510KB to ~18KB, 0 API calls</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 3.5 KiB |
50
s08_context_compact/images/layer1-budget.ja.svg
Normal file
@@ -0,0 +1,50 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — 大結果の永続化</text>
|
||||
|
||||
<!-- ペインポイント -->
|
||||
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">ペインポイント</text>
|
||||
<text x="100" y="72" fill="#991b1b" font-size="11">モデルが一度に 30 ファイルを読み込み、単一ターンの tool_result が合計 500KB に達し、コンテキストウィンドウを圧迫</text>
|
||||
|
||||
<!-- 圧縮前 -->
|
||||
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">圧縮前</text>
|
||||
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
|
||||
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
|
||||
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
|
||||
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">合計 510KB → 予算超過</text>
|
||||
|
||||
<!-- 矢印 -->
|
||||
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 圧縮後 -->
|
||||
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">圧縮後</text>
|
||||
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: <persisted-output></text>
|
||||
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
|
||||
<text x="395" y="178" fill="#166534" font-size="9">Preview: (先頭 2000 文字) ...</text>
|
||||
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">合計 18KB → 正常</text>
|
||||
|
||||
<!-- 原理説明 -->
|
||||
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">方法</text>
|
||||
<text x="70" y="234" fill="#475569" font-size="10">1. 最終ターンの全 tool_result の合計サイズを集計</text>
|
||||
<text x="70" y="250" fill="#475569" font-size="10">2. 200KB 超過 → サイズ順にソートし、最大のものから .task_outputs/tool-results/ に永続化</text>
|
||||
<text x="70" y="266" fill="#475569" font-size="10">3. コンテキストには <persisted-output> マーカー + 先頭 2000 文字のプレビューのみ残す</text>
|
||||
|
||||
<!-- 変更サマリー -->
|
||||
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="35" y="312" fill="#166534" font-size="11">結果:情報は失われていない(ディスクに完全なデータあり)、コンテキストは 510KB → ~18KB に削減、0 回 API 呼び出し</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 3.7 KiB |
50
s08_context_compact/images/layer1-budget.svg
Normal file
@@ -0,0 +1,50 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 356" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="356" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L3: toolResultBudget — 大结果落盘</text>
|
||||
|
||||
<!-- 痛点 -->
|
||||
<rect x="20" y="54" width="680" height="42" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="72" fill="#991b1b" font-size="11" font-weight="600">痛点</text>
|
||||
<text x="75" y="72" fill="#991b1b" font-size="11">模型一次读了 30 个文件,单轮 tool_result 加起来 500KB,直接把上下文窗口打满</text>
|
||||
|
||||
<!-- Before -->
|
||||
<text x="155" y="118" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">压缩前</text>
|
||||
<rect x="20" y="128" width="270" height="82" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<text x="35" y="148" fill="#475569" font-size="10" font-family="monospace">tool_result: (78KB) ...</text>
|
||||
<text x="35" y="164" fill="#475569" font-size="10" font-family="monospace">tool_result: (142KB) ...</text>
|
||||
<text x="35" y="180" fill="#475569" font-size="10" font-family="monospace">tool_result: (290KB) ...</text>
|
||||
<text x="155" y="202" fill="#ef4444" font-size="9" font-weight="600" text-anchor="middle">合计 510KB → 超预算</text>
|
||||
|
||||
<!-- Arrow -->
|
||||
<line x1="295" y1="163" x2="360" y2="163" stroke="#16a34a" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- After -->
|
||||
<text x="485" y="118" fill="#16a34a" font-size="12" font-weight="600" text-anchor="middle">压缩后</text>
|
||||
<rect x="365" y="128" width="335" height="82" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="380" y="148" fill="#166534" font-size="10" font-family="monospace">tool_result: <persisted-output></text>
|
||||
<text x="395" y="164" fill="#166534" font-size="9">Full output: .task_outputs/t1.txt</text>
|
||||
<text x="395" y="178" fill="#166534" font-size="9">Preview: (前 2000 字符) ...</text>
|
||||
<text x="532" y="202" fill="#16a34a" font-size="9" font-weight="600" text-anchor="middle">合计 18KB → 正常</text>
|
||||
|
||||
<!-- 原理说明 -->
|
||||
<rect x="20" y="214" width="680" height="64" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="234" fill="#1e3a5f" font-size="11" font-weight="600">怎么做</text>
|
||||
<text x="85" y="234" fill="#475569" font-size="10">1. 统计最后一轮所有 tool_result 的总大小</text>
|
||||
<text x="85" y="250" fill="#475569" font-size="10">2. 超过 200KB → 按大小排序,从最大的开始落盘到 .task_outputs/tool-results/</text>
|
||||
<text x="85" y="266" fill="#475569" font-size="10">3. 上下文里只留 <persisted-output> 标记 + 前 2000 字符预览</text>
|
||||
|
||||
<!-- 变化摘要 -->
|
||||
<rect x="20" y="290" width="680" height="36" rx="6" fill="#f0fdf4" stroke="#16a34a" stroke-width="1"/>
|
||||
<text x="35" y="312" fill="#166534" font-size="11">结果:信息没丢(磁盘有完整数据),上下文从 510KB 降到 ~18KB,0 次 API 调用</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 3.5 KiB |
57
s08_context_compact/images/micro-compact.en.svg
Normal file
@@ -0,0 +1,57 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — Old Result Placeholder Replacement</text>
|
||||
|
||||
<!-- Pain Point -->
|
||||
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">Pain Point</text>
|
||||
<text x="110" y="70" fill="#991b1b" font-size="11">Agent read 10 files in a row; the full content of reads 1-7 is still sitting in context, taking space but no longer useful</text>
|
||||
|
||||
<!-- Before -->
|
||||
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">Before (all 10 tool_result complete)</text>
|
||||
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (full content, 3200 chars)...</text>
|
||||
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (full content, 1800 chars)...</text>
|
||||
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (full content, 4500 chars)...</text>
|
||||
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (full content, 2800 chars)</text>
|
||||
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 old results waste ~25K chars</text>
|
||||
|
||||
<!-- Arrow -->
|
||||
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- After -->
|
||||
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">After (keep only latest 3 complete)</text>
|
||||
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
|
||||
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (full content, 2800 chars)</text>
|
||||
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">Keep only latest 3; first 7 become placeholders</text>
|
||||
|
||||
<!-- How -->
|
||||
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">How (teaching version)</text>
|
||||
<text x="155" y="248" fill="#475569" font-size="10">Iterate through tool_result, keep only latest 3 complete, replace older ones with placeholders.</text>
|
||||
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">Real CC</text>
|
||||
<text x="95" y="264" fill="#475569" font-size="10">Clears old results via API cache_edits (without breaking prompt cache prefix), only for COMPACTABLE_TOOLS:</text>
|
||||
<text x="95" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write. Teaching version uses text placeholders to simulate the same effect.</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.4 KiB |
57
s08_context_compact/images/micro-compact.ja.svg
Normal file
@@ -0,0 +1,57 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — 旧結果のプレースホルダー置換</text>
|
||||
|
||||
<!-- ペインポイント -->
|
||||
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">ペインポイント</text>
|
||||
<text x="115" y="70" fill="#991b1b" font-size="11">Agent が連続で 10 ファイルを読み込み、1〜7 回目の完全なファイル内容がコンテキストに残ったまま、場所を占有しつつ既に不要</text>
|
||||
|
||||
<!-- 圧縮前 -->
|
||||
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">圧縮前(10 件の tool_result がすべて完全)</text>
|
||||
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (完全な内容, 3200 文字)...</text>
|
||||
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (完全な内容, 1800 文字)...</text>
|
||||
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (完全な内容, 4500 文字)...</text>
|
||||
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完全な内容, 2800 文字)</text>
|
||||
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 件の旧結果が ~25K 文字を無駄に占有</text>
|
||||
|
||||
<!-- 矢印 -->
|
||||
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 圧縮後 -->
|
||||
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">圧縮後(最新 3 件のみ完全保持)</text>
|
||||
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
|
||||
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完全な内容, 2800 文字)</text>
|
||||
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">最新 3 件のみ保持、前 7 件はプレースホルダー化</text>
|
||||
|
||||
<!-- 原理 -->
|
||||
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">方法(教学版)</text>
|
||||
<text x="130" y="248" fill="#475569" font-size="10">tool_result を走査し、最新 3 件のみ完全保持、古いものはプレースホルダーに置換。</text>
|
||||
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">実際の CC</text>
|
||||
<text x="110" y="264" fill="#475569" font-size="10">API cache_edits で旧結果をクリア(prompt cache プレフィックスを破壊しない)、COMPACTABLE_TOOLS のみ対象:</text>
|
||||
<text x="110" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write。教学版はテキストプレースホルダーで同様の効果を模擬。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.7 KiB |
57
s08_context_compact/images/micro-compact.svg
Normal file
@@ -0,0 +1,57 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 300" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#2563eb"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#ca8a04"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="300" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">L2: microCompact — 旧结果占位替换</text>
|
||||
|
||||
<!-- 痛点 -->
|
||||
<rect x="20" y="54" width="680" height="36" rx="6" fill="#fef2f2" stroke="#fca5a5" stroke-width="1"/>
|
||||
<text x="35" y="70" fill="#991b1b" font-size="11" font-weight="600">痛点</text>
|
||||
<text x="75" y="70" fill="#991b1b" font-size="11">Agent 连续读了 10 个文件,第 1-7 次的完整文件内容还躺在上下文里,占着位置但早就没用了</text>
|
||||
|
||||
<!-- Before -->
|
||||
<text x="155" y="114" fill="#64748b" font-size="12" font-weight="600" text-anchor="middle">压缩前(10 条 tool_result 全部完整)</text>
|
||||
<rect x="20" y="122" width="310" height="95" rx="6" fill="#fff" stroke="#94a3b8" stroke-width="1"/>
|
||||
<rect x="30" y="130" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="138" fill="#94a3b8" font-size="8" font-family="monospace">Read file A: (完整内容, 3200 字符)...</text>
|
||||
<rect x="30" y="145" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="153" fill="#94a3b8" font-size="8" font-family="monospace">Read file B: (完整内容, 1800 字符)...</text>
|
||||
<rect x="30" y="160" width="290" height="10" rx="2" fill="#e2e8f0"/>
|
||||
<text x="38" y="168" fill="#94a3b8" font-size="8" font-family="monospace">Read file C: (完整内容, 4500 字符)...</text>
|
||||
<rect x="30" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="38" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完整内容, 2800 字符)</text>
|
||||
<text x="175" y="212" fill="#ef4444" font-size="9" font-weight="600">7 条旧结果白占 ~25K 字符</text>
|
||||
|
||||
<!-- Arrow -->
|
||||
<line x1="335" y1="170" x2="375" y2="170" stroke="#ca8a04" stroke-width="2" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- After -->
|
||||
<text x="535" y="114" fill="#ca8a04" font-size="12" font-weight="600" text-anchor="middle">压缩后(只保留最近 3 条完整)</text>
|
||||
<rect x="390" y="122" width="310" height="95" rx="6" fill="#fefce8" stroke="#ca8a04" stroke-width="1"/>
|
||||
<rect x="400" y="130" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="138" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="145" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="153" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="160" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="168" fill="#92400e" font-size="8" font-family="monospace">[Earlier result compacted. Re-run if needed.]</text>
|
||||
<rect x="400" y="175" width="290" height="10" rx="2" fill="#fef3c7"/>
|
||||
<text x="408" y="183" fill="#92400e" font-size="8" font-family="monospace">Read file J: (完整内容, 2800 字符)</text>
|
||||
<text x="545" y="212" fill="#ca8a04" font-size="9" font-weight="600">只保留最近 3 条,前 7 条变占位</text>
|
||||
|
||||
<!-- 原理 -->
|
||||
<rect x="20" y="228" width="680" height="62" rx="6" fill="#f8fafc" stroke="#cbd5e1" stroke-width="1"/>
|
||||
<text x="35" y="248" fill="#1e3a5f" font-size="11" font-weight="600">怎么做(教学版)</text>
|
||||
<text x="115" y="248" fill="#475569" font-size="10">遍历 tool_result,只保留最近 3 条完整,更旧的替换为占位符。</text>
|
||||
<text x="35" y="264" fill="#1e3a5f" font-size="11" font-weight="600">真实 CC</text>
|
||||
<text x="95" y="264" fill="#475569" font-size="10">通过 API cache_edits 清除旧结果(不破坏 prompt cache 前缀),仅对 COMPACTABLE_TOOLS 生效:</text>
|
||||
<text x="95" y="280" fill="#94a3b8" font-size="9">Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write。教学版用文本占位模拟同样效果。</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.4 KiB |
279
s09_memory/README.en.md
Normal file
@@ -0,0 +1,279 @@
|
||||
# s09: Memory — Compression Loses Details, Keep a Layer That Doesn't
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s07 → s08 → `s09` → [s10](../s10_system_prompt/) → s11 → ... → s20
|
||||
> *"Compression loses details, keep a layer that doesn't"* — File store + index + on-demand loading, across compactions, across sessions.
|
||||
>
|
||||
> **Harness Layer**: Memory — knowledge that survives compaction and sessions.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
s08's autoCompact preserves current goals, remaining work, and user constraints in the summary, but details get lost: "use tabs not spaces" might get simplified to "user has code style preferences". And when you start a new session, even the summary is gone.
|
||||
|
||||
LLMs have no persistent state; all information lives in the context window. When context fills up, it gets compressed, and compression is lossy. What's needed is a storage layer that doesn't participate in compression and persists across sessions.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
The s08 compression pipeline is preserved, focusing on memory. Storage uses the filesystem: a `.memory/` directory where each memory is a `.md` file with YAML frontmatter (`name` / `description` / `type`). When files accumulate, an index is needed: `MEMORY.md` holds one link per line and gets injected into the SYSTEM.
|
||||
|
||||
Key design: the index stays in SYSTEM prompt (cacheable by prompt cache), file content is injected on demand (matched by filename/description to the current conversation, without breaking the cache). Writing has two paths: the user explicitly says "remember", or extraction runs in the background after each turn. When files accumulate, periodic consolidation deduplicates.
|
||||
|
||||
Four memory types, each answering a different question:
|
||||
|
||||
| Type | Answers | Example |
|
||||
|------|---------|---------|
|
||||
| user | Who you are | "Use tabs not spaces" |
|
||||
| feedback | How to work | "Don't mock the database" |
|
||||
| project | What's happening | "Auth rewrite is compliance-driven" |
|
||||
| reference | Where to find things | "Pipeline bugs are in Linear INGEST" |
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||

|
||||
|
||||
### Storage: Markdown Files + Index
|
||||
|
||||
Each memory is a `.md` file with YAML frontmatter for metadata:
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: user-preference-tabs
|
||||
description: User prefers tabs for indentation
|
||||
type: user
|
||||
---
|
||||
|
||||
User prefers using tabs, not spaces, for indentation.
|
||||
**Why:** Consistency with existing codebase conventions.
|
||||
**How to apply:** Always use tabs when writing or editing files.
|
||||
```
|
||||
|
||||
`MEMORY.md` is the index, one link per line:
|
||||
|
||||
```markdown
|
||||
- [user-preference-tabs](user-preference-tabs.md) — User prefers tabs for indentation
|
||||
```
|
||||
|
||||
Writing a new memory automatically rebuilds the index:
|
||||
|
||||
```python
|
||||
def write_memory_file(name, mem_type, description, body):
|
||||
slug = name.lower().replace(" ", "-")
|
||||
filepath = MEMORY_DIR / f"{slug}.md"
|
||||
filepath.write_text(
|
||||
f"---\nname: {name}\ndescription: {description}\ntype: {mem_type}\n---\n\n{body}\n"
|
||||
)
|
||||
_rebuild_index()
|
||||
```
|
||||
|
||||
### Loading: Two Paths
|
||||
|
||||
**Path 1: Index in SYSTEM.** `build_system()` reads `MEMORY.md` every turn and injects the memory catalog into the SYSTEM prompt. The index in SYSTEM can be cached by prompt cache, avoiding resending it every turn.
|
||||
|
||||
**Path 2: Relevant memories on demand.** Before each LLM call, `load_memories()` sends the recent conversation and the memory catalog (name + description) to the LLM as a lightweight side-query, selects relevant filenames, then reads and injects their contents. Capped at 5 to control cost.
|
||||
|
||||
```python
|
||||
def select_relevant_memories(messages, max_items=5):
|
||||
files = list_memory_files()
|
||||
if not files:
|
||||
return []
|
||||
|
||||
# Build catalog: "0: user-preference-tabs — User prefers tabs..."
|
||||
catalog = "\n".join(f"{i}: {f['name']} — {f['description']}" for i, f in enumerate(files))
|
||||
|
||||
response = client.messages.create(model=MODEL, messages=[{"role": "user",
|
||||
"content": f"Select relevant memory indices. Return JSON array.\n\n"
|
||||
f"Recent conversation:\n{recent}\n\nMemory catalog:\n{catalog}"}],
|
||||
max_tokens=200)
|
||||
indices = json.loads(re.search(r'\[.*?\]', response.content[0].text).group())
|
||||
return [files[i]["filename"] for i in indices if 0 <= i < len(files)]
|
||||
```
|
||||
|
||||
If the side-query fails (API error, JSON parse failure), it falls back to keyword matching on name + description.
|
||||
|
||||
### Writing: Extraction After Each Turn
|
||||
|
||||
Users don't always say "remember this". Preferences are usually scattered across normal dialogue: "tabs are better than spaces", "let's use single quotes from now on".
|
||||
|
||||
`extract_memories()` runs when each turn ends, triggered when the model stops without a tool_use (indicating the conversation has reached a natural break):
|
||||
|
||||
```python
|
||||
# In agent_loop:
|
||||
if response.stop_reason != "tool_use":
|
||||
extract_memories(messages) # Extract new memories from recent dialogue
|
||||
consolidate_memories() # Check if consolidation is needed
|
||||
return
|
||||
```
|
||||
|
||||
Before extraction, existing memories are checked to avoid duplicates. The extraction prompt asks the LLM to return a JSON array of `{name, type, description, body}`, writing files only when genuinely new information is found.
|
||||
|
||||
```python
|
||||
def extract_memories(messages):
|
||||
dialogue = format_recent_messages(messages[-10:])
|
||||
existing = "\n".join(f"- {m['name']}: {m['description']}" for m in list_memory_files())
|
||||
|
||||
prompt = (
|
||||
"Extract user preferences, constraints, or project facts.\n"
|
||||
"Return JSON array: [{name, type, description, body}].\n"
|
||||
"If nothing new or already covered, return [].\n\n"
|
||||
f"Existing memories:\n{existing}\n\nDialogue:\n{dialogue[:4000]}"
|
||||
)
|
||||
# ... parse response, write files ...
|
||||
```
|
||||
|
||||
### Consolidation: Low-Frequency Deduplication
|
||||
|
||||
Memory files accumulate. `consolidate_memories()` triggers when the file count reaches a threshold (default 10), asking the LLM to deduplicate, merge contradictions, and prune stale memories:
|
||||
|
||||
```python
|
||||
CONSOLIDATE_THRESHOLD = 10
|
||||
|
||||
def consolidate_memories():
|
||||
files = list_memory_files()
|
||||
if len(files) < CONSOLIDATE_THRESHOLD:
|
||||
return # Too few, not worth consolidating
|
||||
# Send all memories to LLM, get back deduplicated list
|
||||
# Replace all files with consolidated results
|
||||
```
|
||||
|
||||
CC calls this process **Dream**, with four gates in practice: time interval, scan throttle, session count, file lock. The teaching version simplifies to a file-count threshold.
|
||||
|
||||
### What Memory Stores
|
||||
|
||||
Memory stores information that remains useful across sessions: user preferences, recurring feedback, project background, common entry points, and investigation clues. It focuses on "what will be useful later" and brings that information back through an index plus on-demand loading.
|
||||
|
||||
Session memory focuses on continuity inside one session: what context should survive after compaction. The two work together: Memory handles long-term knowledge; session memory handles the current session across compaction.
|
||||
|
||||
---
|
||||
|
||||
## Changes From s08
|
||||
|
||||
| Component | Before (s08) | After (s09) |
|
||||
|-----------|-------------|-------------|
|
||||
| Memory capability | None (preferences degrade with compaction) | Storage + loading + extraction + consolidation |
|
||||
| New functions | — | write_memory_file, select_relevant_memories, load_memories, extract_memories, consolidate_memories |
|
||||
| Storage | — | .memory/MEMORY.md index + .memory/*.md files |
|
||||
| Tools | bash, read, write, edit, glob, todo_write, task, load_skill, compact (9) | bash, read_file, write_file, edit_file, glob, task (6) |
|
||||
| Loop | Only compression each turn | Memory injection + compression + post-turn extraction + periodic consolidation |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s09_memory/code.py
|
||||
```
|
||||
|
||||
Try these prompts (enter across multiple turns, observe memory accumulation and loading):
|
||||
|
||||
1. `I prefer using tabs for indentation, not spaces. Remember that.`
|
||||
2. `Create a Python file called test.py` (observe whether the Agent uses tabs)
|
||||
3. `What did I tell you about my preferences?` (observe whether the Agent remembers)
|
||||
4. `I also prefer single quotes over double quotes for strings.`
|
||||
|
||||
What to watch for: Does `[Memory: extracted N new memories]` appear after each turn? Are `.md` files generated in `.memory/`? Is `MEMORY.md` index updated? Does the Agent automatically load previous memories in new conversations?
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
Memory, compression, and tools are all in place. But the system prompt is still a hardcoded string. Adding a new tool means manually adding a description; switching projects means rewriting the whole prompt. Prompts should be assembled at runtime.
|
||||
|
||||
s10 System Prompt → segments + runtime assembly. Different projects, different tools, different prompts.
|
||||
|
||||
<details>
|
||||
<summary>Deep Dive Into CC Source Code</summary>
|
||||
|
||||
> The following is based on analysis of CC source code under `src/` in `memdir/`, `services/`, `utils/`, `query/`. Line numbers verified against source.
|
||||
|
||||
### Source Code Paths
|
||||
|
||||
| File | Lines | Responsibility |
|
||||
|------|-------|---------------|
|
||||
| `memdir/memdir.ts` | 507 | Core: MEMORY.md definition (`34-38`), memory behavior instructions distinguishing memory/plan/tasks (`199-266`), `loadMemoryPrompt()` three paths (`419-490`) |
|
||||
| `memdir/findRelevantMemories.ts` | 141 | Sonnet side-query memory selection (`18-24` system prompt, `97-122` call logic) |
|
||||
| `memdir/memoryTypes.ts` | 271 | Type definitions, frontmatter fields |
|
||||
| `memdir/memoryScan.ts` | — | Scan .md files, exclude MEMORY.md, read frontmatter, max 200 files, sorted by mtime desc (`35-94`) |
|
||||
| `services/extractMemories/extractMemories.ts` | 615 | Forked agent extraction, restricted permissions, `skipTranscript: true`, `maxTurns: 5` (`371-427`) |
|
||||
| `services/autoDream/autoDream.ts` | 324 | Dream consolidation, four-layer gating (`63-66` defaults, `130-190` gating, `224-233` forked agent) |
|
||||
| `services/SessionMemory/sessionMemory.ts` | 495 | Session-level memory management |
|
||||
| `services/compact/sessionMemoryCompact.ts` | — | Session memory lightweight summary, thresholds 10K/5/40K (`56-61`) |
|
||||
| `utils/attachments.ts` | — | Injection budget: 200 lines / 4096 bytes per file, 60KB per session (`269-288`); find relevant memory by query (`2196-2241`) |
|
||||
| `query.ts` | — | Memory prefetch at start of each user turn (`301-304`), non-blocking collection (`1592-1614`) |
|
||||
| `query/stopHooks.ts` | — | Stop hook fire-and-forget triggers extraction and Dream (`141-155`) |
|
||||
|
||||
### Memory Selection: LLM, Not Embedding
|
||||
|
||||
CC uses **Sonnet itself to select** (`findRelevantMemories.ts`), not embedding vector similarity:
|
||||
|
||||
1. `memoryScan.ts` scans all `.md` files in `.memory/` (excluding MEMORY.md), max 200 files, sorted by mtime descending
|
||||
2. Lists all memory files' `name` + `description` as a catalog
|
||||
3. Sends to Sonnet side-query: "Select truly useful memories by name and description (max 5). Skip if unsure."
|
||||
4. Sonnet returns `{ selected_memories: ["file1.md", ...] }`
|
||||
5. Selected files' full contents are read (≤ 200 lines / 4096 bytes per file) and injected. Total session budget: 60KB
|
||||
|
||||
At the start of each user turn, `query.ts:301-304` starts memory prefetch (async); after tool execution, `1592-1614` collects completed results non-blocking.
|
||||
|
||||
### Extraction Timing: Stop Hook, Not After autoCompact
|
||||
|
||||
Trigger location (`stopHooks.ts:141-155`): inside `handleStopHooks()`, fire-and-forget triggers extraction and Dream. The teaching version places extraction in the `stop_reason != "tool_use"` branch, matching the direction.
|
||||
|
||||
CC's extraction runs via forked agent (`extractMemories.ts:371-427`): restricted permissions, `skipTranscript: true`, `maxTurns: 5`. Also has overlap protection: if the main Agent already wrote memory files, extraction is skipped.
|
||||
|
||||
### Memory File Format
|
||||
|
||||
CC uses Markdown + YAML frontmatter, consistent with the teaching version. Four types: `user`, `feedback`, `project`, `reference`.
|
||||
|
||||
`memdir.ts:34-38` defines index constraints: `MEMORY.md` max 200 lines / 25KB. `memdir.ts:199-266` builds memory behavior instructions, explicitly distinguishing memory from plan and tasks. Storage location: `~/.claude/projects/<sanitized-git-root>/memory/`.
|
||||
|
||||
### Dream: Four-Layer Gating
|
||||
|
||||
Not "triggered when idle" or "consolidate when count is enough", but four gates (`autoDream.ts`, defaults `63-66`, gating logic `130-190`):
|
||||
|
||||
1. **Time gate**: ≥ 24 hours since last consolidation
|
||||
2. **Scan throttle**: Avoid frequent filesystem scans
|
||||
3. **Session gate**: ≥ 5 session transcripts modified since last consolidation
|
||||
4. **Lock gate**: No other process currently consolidating (`.consolidate-lock` file)
|
||||
|
||||
The merge itself runs via forked agent (`224-233`): locate → collect recent signals → merge and write files → prune and update index. Lock file mtime serves as lastConsolidatedAt. Crash recovery: lock auto-expires after 1 hour.
|
||||
|
||||
### User Memory vs Session Memory
|
||||
|
||||
| | User Memory | Session Memory |
|
||||
|---|---|---|
|
||||
| Persistence | Cross-session | Single session |
|
||||
| Storage | Multiple .md files in `memory/` | `session-memory/<id>/memory.md` |
|
||||
| Loaded into | system prompt | compact summary |
|
||||
| Purpose | Cross-session knowledge accumulation | Cross-compact context continuity |
|
||||
|
||||
sessionMemoryCompact (mentioned in s08) uses Session Memory: before autoCompact, it reads the session memory file and, if sufficient (≥ 10K tokens, ≥ 5 text messages, ≤ 40K tokens, `sessionMemoryCompact.ts:56-61`), uses it as a summary without calling the LLM.
|
||||
|
||||
### Where the Real Implementation Is More Complex
|
||||
|
||||
- **Feature flags**: Memory features have multiple feature gate layers
|
||||
- **Team memory**: Shared team memories, `loadMemoryPrompt()` has a dedicated path (not covered in teaching version)
|
||||
- **KAIROS**: Timing-aware memory extraction strategy, daily-log mode in `loadMemoryPrompt()`
|
||||
- **Prompt cache**: Memory injection must account for prompt cache TTL, avoiding full system prompt rewrites each turn
|
||||
- **File locks**: Concurrency control for multi-process scenarios
|
||||
- **Memory prefetch**: Async prefetch, non-blocking main flow
|
||||
|
||||
### Teaching Version Simplifications Are Intentional
|
||||
|
||||
- LLM side-query → LLM side-query + keyword fallback: teaching version keeps LLM selection, adds fallback path
|
||||
- Memory JSON → Markdown + frontmatter: teaching version matches CC
|
||||
- Stop hook trigger → `stop_reason != "tool_use"` branch: same direction
|
||||
- Four-layer gating → file-count threshold: teaching version lacks transcript system and multi-session concepts
|
||||
- Forked agent + restricted permissions → direct call: teaching version has no subprocess isolation
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
279
s09_memory/README.ja.md
Normal file
@@ -0,0 +1,279 @@
|
||||
# s09: Memory — 圧縮は詳細を失う、失わない層が必要
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s07 → s08 → `s09` → [s10](../s10_system_prompt/) → s11 → ... → s20
|
||||
> *"圧縮は詳細を失う、失わない層が必要"* — ファイルストア + インデックス + オンデマンド読み込み。圧縮を越え、セッションを越えて。
|
||||
>
|
||||
> **Harness レイヤー**: 記憶 — 圧縮とセッションを越える知識の蓄積。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
s08 の autoCompact は現在の目標、残りの作業、ユーザーの制約をサマリに保持するが、詳細は失われる:「タブでインデント、スペース不可」が「ユーザーにコードスタイルの好みあり」と簡略化される。そして新しいセッションを開始すると、サマリすらない。
|
||||
|
||||
LLM には永続状態がなく、すべての情報はコンテキストウィンドウ内にある。コンテキストが満杯になれば圧縮され、圧縮は非可逆。圧縮に参加せず、セッションを越えて保持されるストレージ層が必要。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
s08 の圧縮パイプラインを維持し、記憶に焦点を当てる。ストレージにはファイルシステムを採用:`.memory/` ディレクトリに各記憶を `.md` ファイルとして保存、YAML frontmatter(`name` / `description` / `type`)付き。ファイルが増えたらインデックスが必要:`MEMORY.md` に 1 行 1 リンクを記録し、SYSTEM に注入。
|
||||
|
||||
重要な設計:インデックスは SYSTEM prompt に常駐(prompt cache でキャッシュ可能)、ファイル内容はオンデマンド注入(filename/description で現在の会話にマッチ、cache を破壊しない)。書き込みは 2 つのパス:ユーザーが明示的に「覚えて」と言うか、毎ターン終了後にバックグラウンドで抽出。ファイルが蓄積されたら、定期的に整理して重複排除。
|
||||
|
||||
4 種類の記憶、それぞれ異なる質問に答える:
|
||||
|
||||
| タイプ | 何に答えるか | 例 |
|
||||
|--------|-------------|-----|
|
||||
| user | あなたは誰か | "タブでスペース不可" |
|
||||
| feedback | どう作業するか | "DB をモックしない" |
|
||||
| project | 何が起きているか | "auth 書き直しはコンプライアンス主導" |
|
||||
| reference | どこで探すか | "パイプラインのバグは Linear INGEST" |
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||

|
||||
|
||||
### ストレージ:Markdown ファイル + インデックス
|
||||
|
||||
各記憶は `.md` ファイル、YAML frontmatter でメタデータを記録:
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: user-preference-tabs
|
||||
description: User prefers tabs for indentation
|
||||
type: user
|
||||
---
|
||||
|
||||
User prefers using tabs, not spaces, for indentation.
|
||||
**Why:** Consistency with existing codebase conventions.
|
||||
**How to apply:** Always use tabs when writing or editing files.
|
||||
```
|
||||
|
||||
`MEMORY.md` はインデックス、1 行に 1 リンク:
|
||||
|
||||
```markdown
|
||||
- [user-preference-tabs](user-preference-tabs.md) — User prefers tabs for indentation
|
||||
```
|
||||
|
||||
新しい記憶を書き込むとインデックスを自動再構築:
|
||||
|
||||
```python
|
||||
def write_memory_file(name, mem_type, description, body):
|
||||
slug = name.lower().replace(" ", "-")
|
||||
filepath = MEMORY_DIR / f"{slug}.md"
|
||||
filepath.write_text(
|
||||
f"---\nname: {name}\ndescription: {description}\ntype: {mem_type}\n---\n\n{body}\n"
|
||||
)
|
||||
_rebuild_index()
|
||||
```
|
||||
|
||||
### 読み込み:2 つのパス
|
||||
|
||||
**パス 1:インデックスを SYSTEM に常駐。** `build_system()` は毎ターン SYSTEM を再構築する際に `MEMORY.md` を読み込み、記憶カタログを注入。SYSTEM prompt 内のインデックスは prompt cache でキャッシュ可能で、毎ターン再送不要。
|
||||
|
||||
**パス 2:関連記憶をオンデマンド注入。** 各 LLM 呼び出し前、`load_memories()` は最近の会話と記憶カタログ(name + description)を LLM に軽量 side-query として送信し、関連するファイル名を選択、ファイル内容を読み込んで注入。上限 5 件でコストを制御。
|
||||
|
||||
```python
|
||||
def select_relevant_memories(messages, max_items=5):
|
||||
files = list_memory_files()
|
||||
if not files:
|
||||
return []
|
||||
|
||||
# Build catalog: "0: user-preference-tabs — User prefers tabs..."
|
||||
catalog = "\n".join(f"{i}: {f['name']} — {f['description']}" for i, f in enumerate(files))
|
||||
|
||||
response = client.messages.create(model=MODEL, messages=[{"role": "user",
|
||||
"content": f"Select relevant memory indices. Return JSON array.\n\n"
|
||||
f"Recent conversation:\n{recent}\n\nMemory catalog:\n{catalog}"}],
|
||||
max_tokens=200)
|
||||
indices = json.loads(re.search(r'\[.*?\]', response.content[0].text).group())
|
||||
return [files[i]["filename"] for i in indices if 0 <= i < len(files)]
|
||||
```
|
||||
|
||||
side-query が失敗した場合(API エラー、JSON パース失敗)、name + description のキーワードマッチにフォールバック。
|
||||
|
||||
### 書き込み:毎ターン終了後の抽出
|
||||
|
||||
ユーザーが毎回「これを覚えて」と言うわけではない。好みは通常、通常の会話の中に散らばっている:「タブの方がスペースより良い」「これからはシングルクォートにしよう」。
|
||||
|
||||
`extract_memories()` は各ターン終了時に実行、モデルが tool_use なしで停止した場合にトリガー(会話が自然な区切りに達したことを示す):
|
||||
|
||||
```python
|
||||
# In agent_loop:
|
||||
if response.stop_reason != "tool_use":
|
||||
extract_memories(messages) # 最近の会話から新しい記憶を抽出
|
||||
consolidate_memories() # 整理が必要かチェック
|
||||
return
|
||||
```
|
||||
|
||||
抽出前に既存の記憶を確認し、重複を回避。抽出プロンプトは LLM に `{name, type, description, body}` の JSON 配列を要求、本当に新しい情報がある場合のみファイルに書き込む。
|
||||
|
||||
```python
|
||||
def extract_memories(messages):
|
||||
dialogue = format_recent_messages(messages[-10:])
|
||||
existing = "\n".join(f"- {m['name']}: {m['description']}" for m in list_memory_files())
|
||||
|
||||
prompt = (
|
||||
"Extract user preferences, constraints, or project facts.\n"
|
||||
"Return JSON array: [{name, type, description, body}].\n"
|
||||
"If nothing new or already covered, return [].\n\n"
|
||||
f"Existing memories:\n{existing}\n\nDialogue:\n{dialogue[:4000]}"
|
||||
)
|
||||
# ... parse response, write files ...
|
||||
```
|
||||
|
||||
### 整理:低頻度の重複排除
|
||||
|
||||
記憶ファイルは蓄積される。`consolidate_memories()` はファイル数が閾値(デフォルト 10)に達した時にトリガー、LLM に重複排除、矛盾の統合、古い記憶の剪定を依頼:
|
||||
|
||||
```python
|
||||
CONSOLIDATE_THRESHOLD = 10
|
||||
|
||||
def consolidate_memories():
|
||||
files = list_memory_files()
|
||||
if len(files) < CONSOLIDATE_THRESHOLD:
|
||||
return # 少なすぎる、整理する価値なし
|
||||
# Send all memories to LLM, get back deduplicated list
|
||||
# Replace all files with consolidated results
|
||||
```
|
||||
|
||||
CC はこのプロセスを **Dream** と呼び、実際には 4 層のゲートがある:時間間隔、スキャンスロットル、セッション数、ファイルロック。教学版はファイル数閾値に簡略化。
|
||||
|
||||
### Memory に保存するもの
|
||||
|
||||
Memory はセッションを越えて有用な情報を保存する:ユーザーの好み、繰り返し出るフィードバック、プロジェクト背景、よく使う入口、調査の手がかりなど。「あとでまた使うもの」を対象にし、インデックス + オンデマンド読み込みで現在の会話に戻す。
|
||||
|
||||
session memory は 1 つのセッション内の連続性を扱う:compact 後も現在の会話に残すべき文脈を保持する。両者は役割が分かれている。Memory は長期知識を扱い、session memory は現在のセッションを compact 越しにつなぐ。
|
||||
|
||||
---
|
||||
|
||||
## s08 からの変更点
|
||||
|
||||
| コンポーネント | 変更前 (s08) | 変更後 (s09) |
|
||||
|-----------|-------------|-------------|
|
||||
| 記憶能力 | なし(圧縮後、好みはサマリと共に劣化) | ストレージ + 読み込み + 抽出 + 整理 |
|
||||
| 新規関数 | — | write_memory_file, select_relevant_memories, load_memories, extract_memories, consolidate_memories |
|
||||
| ストレージ | — | .memory/MEMORY.md インデックス + .memory/*.md ファイル |
|
||||
| ツール | bash, read, write, edit, glob, todo_write, task, load_skill, compact (9) | bash, read_file, write_file, edit_file, glob, task (6) |
|
||||
| ループ | 毎ターン圧縮のみ | 記憶注入 + 圧縮 + ターン終了後の抽出 + 定期整理 |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s09_memory/code.py
|
||||
```
|
||||
|
||||
以下のプロンプトを試してみてください(複数ターンに分けて入力し、記憶の蓄積と読み込みを観察):
|
||||
|
||||
1. `I prefer using tabs for indentation, not spaces. Remember that.`
|
||||
2. `Create a Python file called test.py`(Agent がタブを使用したか観察)
|
||||
3. `What did I tell you about my preferences?`(Agent が覚えているか観察)
|
||||
4. `I also prefer single quotes over double quotes for strings.`
|
||||
|
||||
観察のポイント:各ターン終了後に `[Memory: extracted N new memories]` が表示されるか?`.memory/` ディレクトリに `.md` ファイルが生成されたか?`MEMORY.md` インデックスが更新されたか?新しい会話で Agent が以前の記憶を自動的に読み込んだか?
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
記憶、圧縮、ツールはすべて揃った。しかし system prompt はまだハードコードされた文字列。新しいツールを追加するには手動で説明を書き、プロジェクトを変えるにはプロンプト全体を書き直す。プロンプトは実行時に組み立てられるべき。
|
||||
|
||||
s10 System Prompt → セグメント + 実行時組み立て。異なるプロジェクト、異なるツール、異なるプロンプト。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードの詳細</summary>
|
||||
|
||||
> 以下は CC ソースコード `src/` 下の `memdir/`、`services/`、`utils/`、`query/` の分析に基づく。行番号はソースコードと照合済み。
|
||||
|
||||
### ソースコードパス
|
||||
|
||||
| ファイル | 行数 | 職責 |
|
||||
|------|------|------|
|
||||
| `memdir/memdir.ts` | 507 | 核心:MEMORY.md 定義(`34-38`)、記憶動作指示で memory/plan/tasks を区別(`199-266`)、`loadMemoryPrompt()` 3 パス(`419-490`) |
|
||||
| `memdir/findRelevantMemories.ts` | 141 | Sonnet side-query で記憶選択(`18-24` システムプロンプト、`97-122` 呼び出しロジック) |
|
||||
| `memdir/memoryTypes.ts` | 271 | 型定義、frontmatter フィールド |
|
||||
| `memdir/memoryScan.ts` | — | .md ファイルをスキャン、MEMORY.md を除外、frontmatter を読み取り、最大 200 ファイル、mtime 降順(`35-94`) |
|
||||
| `services/extractMemories/extractMemories.ts` | 615 | forked agent で記憶を抽出、制限付き権限、`skipTranscript: true`、`maxTurns: 5`(`371-427`) |
|
||||
| `services/autoDream/autoDream.ts` | 324 | Dream 整理、4 層ゲート(`63-66` デフォルト値、`130-190` ゲート、`224-233` forked agent) |
|
||||
| `services/SessionMemory/sessionMemory.ts` | 495 | セッションレベルの記憶管理 |
|
||||
| `services/compact/sessionMemoryCompact.ts` | — | session memory 軽量サマリ、閾値 10K/5/40K(`56-61`) |
|
||||
| `utils/attachments.ts` | — | 注入予算:200 行 / 4096 バイト/ファイル、60KB/セッション(`269-288`);query で関連記憶を検索(`2196-2241`) |
|
||||
| `query.ts` | — | memory prefetch を毎ターン開始時に起動(`301-304`)、非ブロッキング収集(`1592-1614`) |
|
||||
| `query/stopHooks.ts` | — | stop hook fire-and-forget で抽出と Dream をトリガー(`141-155`) |
|
||||
|
||||
### 記憶選択:embedding ではなく LLM
|
||||
|
||||
CC は **Sonnet 自身で選択**(`findRelevantMemories.ts`)、embedding ベクトル類似度ではない:
|
||||
|
||||
1. `memoryScan.ts` が `.memory/` 下のすべての `.md` ファイルをスキャン(MEMORY.md を除外)、最大 200 ファイル、mtime 降順
|
||||
2. `name` + `description` をカタログとしてリスト化
|
||||
3. Sonnet side-query に送信:「名前と説明から本当に有用な記憶を選択(最大 5 件)。不明ならスキップ。」
|
||||
4. Sonnet が `{ selected_memories: ["file1.md", ...] }` を返却
|
||||
5. 選択されたファイルの完全な内容を読み込み(≤ 200 行 / 4096 バイト/ファイル)、注入。セッション総予算:60KB
|
||||
|
||||
毎ターンのユーザー turn 開始時、`query.ts:301-304` が memory prefetch を起動(非同期);ツール実行後、`1592-1614` が非ブロッキングで結果を収集。
|
||||
|
||||
### 抽出タイミング:stop hook、autoCompact 後ではない
|
||||
|
||||
トリガー位置(`stopHooks.ts:141-155`):`handleStopHooks()` 内で、fire-and-forget で抽出と Dream をトリガー。教学版は `stop_reason != "tool_use"` 分岐に抽出を配置、方向は一致。
|
||||
|
||||
CC の抽出は forked agent で実行(`extractMemories.ts:371-427`):制限付き権限、`skipTranscript: true`、`maxTurns: 5`。重複保護もある:メイン Agent が既に記憶ファイルを書き込んだ場合、抽出をスキップ。
|
||||
|
||||
### 記憶ファイル形式
|
||||
|
||||
CC は Markdown + YAML frontmatter を使用、教学版と一致。4 種類:`user`、`feedback`、`project`、`reference`。
|
||||
|
||||
`memdir.ts:34-38` がインデックス制約を定義:`MEMORY.md` 最大 200 行 / 25KB。`memdir.ts:199-266` が記憶動作指示を構築、memory と plan と tasks を明確に区別。保存場所:`~/.claude/projects/<sanitized-git-root>/memory/`。
|
||||
|
||||
### Dream:4 層ゲート
|
||||
|
||||
「アイドル時にトリガー」や「数が足りたら統合」ではなく、4 層のゲート(`autoDream.ts`、デフォルト値 `63-66`、ゲートロジック `130-190`):
|
||||
|
||||
1. **時間ゲート**:前回の統合から ≥ 24 時間
|
||||
2. **スキャンスロットル**:頻繁なファイルシステムスキャンを回避
|
||||
3. **セッションゲート**:前回の統合以降 ≥ 5 セッションの transcript が変更された
|
||||
4. **ロックゲート**:他のプロセスが統合中でない(`.consolidate-lock` ファイル)
|
||||
|
||||
統合自体は forked agent で実行(`224-233`):定位 → 直近のシグナル収集 → 統合してファイル書き込み → 剪定してインデックス更新。ロックファイルの mtime が lastConsolidatedAt。クラッシュリカバリ:1 時間後にロックが自動期限切れ。
|
||||
|
||||
### User Memory vs Session Memory
|
||||
|
||||
| | User Memory | Session Memory |
|
||||
|---|---|---|
|
||||
| 永続性 | セッション間 | 単一セッション |
|
||||
| ストレージ | `memory/` 下の複数 .md ファイル | `session-memory/<id>/memory.md` |
|
||||
| 注入先 | system prompt | compact サマリ |
|
||||
| 目的 | セッション間の知識蓄積 | compact を越えたコンテキストの連続性 |
|
||||
|
||||
sessionMemoryCompact(s08 で触れた仕組み)は Session Memory を活用:autoCompact の前に session memory ファイルを読み込み、内容が十分であれば(≥ 10K token、≥ 5 テキストメッセージ、≤ 40K token、`sessionMemoryCompact.ts:56-61`)、LLM を呼び出さずにサマリとして使用。
|
||||
|
||||
### 実際の実装が教学版より複雑な点
|
||||
|
||||
- **Feature flags**:記憶関連機能には複数の feature gate 層がある
|
||||
- **Team memory**:チーム共有記憶、`loadMemoryPrompt()` に専用パスあり(教学版では未カバー)
|
||||
- **KAIROS**:タイミング認識型の記憶抽出戦略、`loadMemoryPrompt()` の daily-log モード
|
||||
- **Prompt cache**:記憶注入は prompt cache の TTL を考慮する必要があり、毎ターン system prompt の大部分を書き直すことを避ける
|
||||
- **ファイルロック**:マルチプロセス時の並行制御
|
||||
- **Memory prefetch**:非同期プレフェッチ、メインフローをブロックしない
|
||||
|
||||
### 教学版の簡略化は意図的
|
||||
|
||||
- LLM side-query → LLM side-query + キーワードフォールバック:教学版は LLM 選択を維持し、フォールバックパスを追加
|
||||
- 記憶 JSON → Markdown + frontmatter:教学版は CC と一致
|
||||
- stop hook トリガー → `stop_reason != "tool_use"` 分岐:方向は一致
|
||||
- 4 層ゲート → ファイル数閾値:教学版には transcript システムやマルチセッションの概念がない
|
||||
- forked agent + 制限付き権限 → 直接呼び出し:教学版にはサブプロセス分離がない
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
279
s09_memory/README.md
Normal file
@@ -0,0 +1,279 @@
|
||||
# s09: Memory — 压缩会丢细节,要有一层不丢的
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s07 → s08 → `s09` → [s10](../s10_system_prompt/) → s11 → ... → s20
|
||||
> *"压缩会丢细节, 要有一层不丢的"* — 文件仓库 + 索引 + 按需加载,跨压缩、跨会话。
|
||||
>
|
||||
> **Harness 层**: 记忆 — 跨压缩、跨会话的知识积累。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
s08 的 autoCompact 会把当前目标、剩余工作、用户约束写进摘要,但细节会丢失:"用 tab 缩进不要用空格"可能被简化成"用户有代码风格偏好"。而且新开一个会话,连摘要也没了。
|
||||
|
||||
LLM 没有持久状态,所有信息都在上下文窗口里。上下文满了要压缩,压缩就有损。需要一层不参与压缩、跨会话保留的存储。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
s08 的压缩管线保留,聚焦记忆。存储选文件系统:`.memory/` 目录下,每个记忆一个 `.md` 文件,带 YAML frontmatter(`name` / `description` / `type`)。文件多了需要索引:`MEMORY.md` 一行一个链接,注入 SYSTEM。
|
||||
|
||||
关键设计:索引常驻 SYSTEM prompt(可被 prompt cache 缓存),文件内容按需注入(按 filename/description 匹配当前对话,不破坏 cache)。写入分两条路径:用户显式说"记住",或者每轮结束后后台提取。文件积累多了,定期整理去重。
|
||||
|
||||
四类记忆,各有用途:
|
||||
|
||||
| 类型 | 回答什么 | 示例 |
|
||||
|------|---------|------|
|
||||
| user | 你是谁 | "用 tab 不用空格" |
|
||||
| feedback | 怎么做事 | "别 mock 数据库" |
|
||||
| project | 正在发生什么 | "auth 重写是合规驱动" |
|
||||
| reference | 东西在哪找 | "pipeline bug 在 Linear INGEST" |
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||

|
||||
|
||||
### 存储:Markdown 文件 + 索引
|
||||
|
||||
每个记忆是一个 `.md` 文件,YAML frontmatter 记录元数据:
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: user-preference-tabs
|
||||
description: User prefers tabs for indentation
|
||||
type: user
|
||||
---
|
||||
|
||||
User prefers using tabs, not spaces, for indentation.
|
||||
**Why:** Consistency with existing codebase conventions.
|
||||
**How to apply:** Always use tabs when writing or editing files.
|
||||
```
|
||||
|
||||
`MEMORY.md` 是索引,一行一个链接:
|
||||
|
||||
```markdown
|
||||
- [user-preference-tabs](user-preference-tabs.md) — User prefers tabs for indentation
|
||||
```
|
||||
|
||||
写入新记忆时自动重建索引:
|
||||
|
||||
```python
|
||||
def write_memory_file(name, mem_type, description, body):
|
||||
slug = name.lower().replace(" ", "-")
|
||||
filepath = MEMORY_DIR / f"{slug}.md"
|
||||
filepath.write_text(
|
||||
f"---\nname: {name}\ndescription: {description}\ntype: {mem_type}\n---\n\n{body}\n"
|
||||
)
|
||||
_rebuild_index()
|
||||
```
|
||||
|
||||
### 加载:两条路径
|
||||
|
||||
**路径一:索引常驻 SYSTEM。** `build_system()` 每轮重建 SYSTEM 时读取 `MEMORY.md`,把记忆清单注入。SYSTEM prompt 中的索引可以被 prompt cache 缓存,不需要每轮重新发送。
|
||||
|
||||
**路径二:相关记忆按需注入。** 每轮调用前,`load_memories()` 把最近对话和记忆目录(name + description)一起发给 LLM 做一次轻量 side-query,选出相关的文件名,再读文件内容注入上下文。最多 5 条,控制开销。
|
||||
|
||||
```python
|
||||
def select_relevant_memories(messages, max_items=5):
|
||||
files = list_memory_files()
|
||||
if not files:
|
||||
return []
|
||||
|
||||
# Build catalog: "0: user-preference-tabs — User prefers tabs..."
|
||||
catalog = "\n".join(f"{i}: {f['name']} — {f['description']}" for i, f in enumerate(files))
|
||||
|
||||
response = client.messages.create(model=MODEL, messages=[{"role": "user",
|
||||
"content": f"Select relevant memory indices. Return JSON array.\n\n"
|
||||
f"Recent conversation:\n{recent}\n\nMemory catalog:\n{catalog}"}],
|
||||
max_tokens=200)
|
||||
indices = json.loads(re.search(r'\[.*?\]', response.content[0].text).group())
|
||||
return [files[i]["filename"] for i in indices if 0 <= i < len(files)]
|
||||
```
|
||||
|
||||
如果 side-query 失败(API 错误、JSON 解析失败),降级到关键词匹配 name + description。
|
||||
|
||||
### 写入:每轮结束后提取
|
||||
|
||||
用户不会每次都说"记住这个"。偏好通常散落在正常对话中:"用 tab 比空格好"、"以后都用单引号"。
|
||||
|
||||
`extract_memories()` 在每轮结束时运行,条件是模型停止且没有 tool_use(说明对话告一段落):
|
||||
|
||||
```python
|
||||
# In agent_loop:
|
||||
if response.stop_reason != "tool_use":
|
||||
extract_memories(messages) # 从最近对话提取新记忆
|
||||
consolidate_memories() # 检查是否需要整理
|
||||
return
|
||||
```
|
||||
|
||||
提取前先检查已有记忆,避免重复。提取 prompt 要求 LLM 返回 `{name, type, description, body}` 的 JSON 数组,只有确实有新信息时才写文件。
|
||||
|
||||
```python
|
||||
def extract_memories(messages):
|
||||
dialogue = format_recent_messages(messages[-10:])
|
||||
existing = "\n".join(f"- {m['name']}: {m['description']}" for m in list_memory_files())
|
||||
|
||||
prompt = (
|
||||
"Extract user preferences, constraints, or project facts.\n"
|
||||
"Return JSON array: [{name, type, description, body}].\n"
|
||||
"If nothing new or already covered, return [].\n\n"
|
||||
f"Existing memories:\n{existing}\n\nDialogue:\n{dialogue[:4000]}"
|
||||
)
|
||||
# ... parse response, write files ...
|
||||
```
|
||||
|
||||
### 整理:低频合并去重
|
||||
|
||||
记忆文件会积累。`consolidate_memories()` 在文件数达到阈值(默认 10)时触发,让 LLM 去重、合并矛盾、淘汰过时记忆:
|
||||
|
||||
```python
|
||||
CONSOLIDATE_THRESHOLD = 10
|
||||
|
||||
def consolidate_memories():
|
||||
files = list_memory_files()
|
||||
if len(files) < CONSOLIDATE_THRESHOLD:
|
||||
return # 太少,不值得整理
|
||||
# Send all memories to LLM, get back deduplicated list
|
||||
# Replace all files with consolidated results
|
||||
```
|
||||
|
||||
CC 把这个过程叫 Dream,实际有四层门控:时间间隔、扫描节流、会话数、文件锁。教学版简化为文件数阈值。
|
||||
|
||||
### Memory 适合保存什么
|
||||
|
||||
Memory 保存跨会话仍然有用的信息:用户偏好、反复出现的反馈、项目背景、常用入口和排查线索。它关注“以后还会用到什么”,并通过索引 + 按需加载把这些信息带回当前对话。
|
||||
|
||||
session memory 关注同一会话内的连续性:compact 之后,当前会话还需要保留哪些上下文。两者配合使用:Memory 管长期知识,session memory 管当前会话的压缩续接。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s08 的变更
|
||||
|
||||
| 组件 | 之前 (s08) | 之后 (s09) |
|
||||
|------|-----------|-----------|
|
||||
| 记忆能力 | 无(压缩后偏好随摘要退化) | 存储 + 加载 + 提取 + 整理 |
|
||||
| 新函数 | — | write_memory_file, select_relevant_memories, load_memories, extract_memories, consolidate_memories |
|
||||
| 存储 | — | .memory/MEMORY.md 索引 + .memory/*.md 文件 |
|
||||
| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill, compact (9) | bash, read_file, write_file, edit_file, glob, task (6) |
|
||||
| 循环 | 每轮只做压缩 | 每轮注入记忆 + 压缩 + 每轮结束后提取 + 定期整理 |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s09_memory/code.py
|
||||
```
|
||||
|
||||
试试这些 prompt(分多轮输入,观察记忆的累积和加载):
|
||||
|
||||
1. `I prefer using tabs for indentation, not spaces. Remember that.`
|
||||
2. `Create a Python file called test.py`(观察 Agent 是否用了 tab)
|
||||
3. `What did I tell you about my preferences?`(观察 Agent 是否记得)
|
||||
4. `I also prefer single quotes over double quotes for strings.`
|
||||
|
||||
观察重点:每轮结束后是否出现 `[Memory: extracted N new memories]`?`.memory/` 目录下是否生成了 `.md` 文件?`MEMORY.md` 索引是否更新?新一轮对话时 Agent 是否自动加载了之前的记忆?
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
记忆、压缩、工具都已就绪。但 system prompt 还是硬编码的一大段字符串。加了新工具要手动加描述,换了项目要重写整个 prompt。prompt 应该运行时组装。
|
||||
|
||||
s10 System Prompt → 分段 + 运行时组装。不同项目、不同工具,拼出不同的 prompt。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `src/` 下 `memdir/`、`services/`、`utils/`、`query/` 的分析,行号已对照核实。
|
||||
|
||||
### 源码路径
|
||||
|
||||
| 文件 | 行数 | 职责 |
|
||||
|------|------|------|
|
||||
| `memdir/memdir.ts` | 507 | 核心:MEMORY.md 定义(`34-38`)、记忆行为指令区分 memory/plan/tasks(`199-266`)、`loadMemoryPrompt()` 三条路径(`419-490`) |
|
||||
| `memdir/findRelevantMemories.ts` | 141 | Sonnet side-query 选记忆(`18-24` 系统提示、`97-122` 调用逻辑) |
|
||||
| `memdir/memoryTypes.ts` | 271 | 类型定义,frontmatter 字段 |
|
||||
| `memdir/memoryScan.ts` | — | 扫描 .md 文件,排除 MEMORY.md,读 frontmatter,最多 200 个,按 mtime 降序(`35-94`) |
|
||||
| `services/extractMemories/extractMemories.ts` | 615 | forked agent 提取记忆,受限权限,`skipTranscript: true`,`maxTurns: 5`(`371-427`) |
|
||||
| `services/autoDream/autoDream.ts` | 324 | Dream 整理,四层门控(`63-66` 默认值、`130-190` 门控、`224-233` forked agent) |
|
||||
| `services/SessionMemory/sessionMemory.ts` | 495 | 会话级记忆管理 |
|
||||
| `services/compact/sessionMemoryCompact.ts` | — | session memory 轻量摘要,阈值 10K/5/40K(`56-61`) |
|
||||
| `utils/attachments.ts` | — | 注入预算:200 行 / 4096 字节每文件,60KB 每 session(`269-288`);按 query 找相关 memory(`2196-2241`) |
|
||||
| `query.ts` | — | memory prefetch 每轮启动(`301-304`),非阻塞收集(`1592-1614`) |
|
||||
| `query/stopHooks.ts` | — | stop hook fire-and-forget 触发提取和 Dream(`141-155`) |
|
||||
|
||||
### 记忆选择:LLM 选,不是 embedding
|
||||
|
||||
CC 用 **Sonnet 本身来选**(`findRelevantMemories.ts`),不是 embedding 向量相似度:
|
||||
|
||||
1. `memoryScan.ts` 扫描 `.memory/` 下所有 `.md` 文件(排除 MEMORY.md),最多 200 个,按 mtime 降序
|
||||
2. 把 `name` + `description` 列成清单
|
||||
3. 发给 Sonnet side-query:"根据名称和描述选出真正有用的记忆(最多 5 个)。不确定就不要选。"
|
||||
4. Sonnet 返回 `{ selected_memories: ["file1.md", ...] }`
|
||||
5. 选中文件读取完整内容(每文件 ≤ 200 行 / 4096 字节),注入上下文。单 session 总预算 60KB
|
||||
|
||||
每轮用户 turn 开始时,`query.ts:301-304` 启动 memory prefetch(异步);工具执行后 `1592-1614` 非阻塞收集结果,不卡主流程。
|
||||
|
||||
### 提取时机:stop hook,不是 autoCompact 后
|
||||
|
||||
触发位置(`stopHooks.ts:141-155`):在 `handleStopHooks()` 中,fire-and-forget 触发提取和 Dream。教学版把提取放在 `stop_reason != "tool_use"` 分支里,方向一致。
|
||||
|
||||
CC 的提取通过 forked agent 执行(`extractMemories.ts:371-427`):受限权限、`skipTranscript: true`、`maxTurns: 5`。还有重叠保护:如果主 Agent 已经写入了记忆文件,跳过提取。
|
||||
|
||||
### 记忆文件格式
|
||||
|
||||
CC 用 Markdown + YAML frontmatter,和教学版一致。四种类型:`user`、`feedback`、`project`、`reference`。
|
||||
|
||||
`memdir.ts:34-38` 定义索引约束:`MEMORY.md` 最多 200 行 / 25KB。`memdir.ts:199-266` 构建记忆行为指令,明确区分 memory、plan、tasks。存储位置:`~/.claude/projects/<sanitized-git-root>/memory/`。
|
||||
|
||||
### Dream:四层门控
|
||||
|
||||
不是"空闲时触发"或"数量够了就合并",而是四层门控(`autoDream.ts`,默认值 `63-66`,门控逻辑 `130-190`):
|
||||
|
||||
1. **时间门控**:距上次合并 ≥ 24 小时
|
||||
2. **扫描节流**:避免频繁扫描文件系统
|
||||
3. **会话门控**:自上次合并以来修改了 ≥ 5 个会话 transcript
|
||||
4. **锁门控**:没有其他进程正在合并(`.consolidate-lock` 文件)
|
||||
|
||||
合并本身通过 forked agent 执行(`224-233`):定位 → 收集近期信号 → 合并写文件 → 剪枝更新索引。锁文件 mtime 就是 lastConsolidatedAt。崩溃恢复:1 小时后锁自动过期。
|
||||
|
||||
### User Memory vs Session Memory
|
||||
|
||||
| | User Memory | Session Memory |
|
||||
|---|---|---|
|
||||
| 持久性 | 跨会话 | 单会话 |
|
||||
| 存储 | `memory/` 下多个 .md 文件 | `session-memory/<id>/memory.md` |
|
||||
| 加载到 | system prompt | compact 摘要 |
|
||||
| 用途 | 跨会话的知识积累 | 跨 compact 的上下文连续性 |
|
||||
|
||||
sessionMemoryCompact(s08 中提到的机制)正是使用了 Session Memory:autoCompact 前先读 session memory 文件,如果内容足够(≥ 10K token、≥ 5 条文本消息、≤ 40K token,`sessionMemoryCompact.ts:56-61`),就用它做摘要,不调 LLM。
|
||||
|
||||
### 真实实现比教学版复杂的地方
|
||||
|
||||
- **Feature flags**:记忆相关功能有多层 feature gate 控制
|
||||
- **Team memory**:团队共享记忆,`loadMemoryPrompt()` 有专门路径(教学版未涉及)
|
||||
- **KAIROS**:时机感知的记忆提取策略,`loadMemoryPrompt()` 中 daily-log 模式
|
||||
- **Prompt cache**:记忆注入需要考虑 prompt cache 的 TTL,避免每次都重写 system prompt 的大段内容
|
||||
- **文件锁**:多进程并发时的锁机制
|
||||
- **Memory prefetch**:异步预取,不阻塞主流程
|
||||
|
||||
### 教学版的简化是刻意的
|
||||
|
||||
- LLM side-query → LLM side-query + 关键词降级:教学版保留了 LLM 选择,加了降级路径
|
||||
- 记忆 JSON → Markdown + frontmatter:教学版与 CC 一致
|
||||
- stop hook 触发 → `stop_reason != "tool_use"` 分支:方向一致
|
||||
- 四层门控 → 文件数阈值:教学版没有 transcript 系统和多会话概念
|
||||
- forked agent + 受限权限 → 直接调用:教学版没有子进程隔离
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
616
s09_memory/code.py
Normal file
@@ -0,0 +1,616 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s09_memory.py - Memory System
|
||||
|
||||
Persistent, cross-session knowledge for the coding agent.
|
||||
|
||||
Storage:
|
||||
.memory/
|
||||
MEMORY.md ← index (one line per memory, ≤200 lines)
|
||||
feedback_tabs.md ← individual memory files (Markdown + YAML frontmatter)
|
||||
user_profile.md
|
||||
project_facts.md
|
||||
|
||||
Flow in agent_loop:
|
||||
1. Load MEMORY.md index into SYSTEM prompt (cheap, always present)
|
||||
2. Select relevant memories by filename/description → inject content
|
||||
3. Run compression pipeline from s08
|
||||
4. After each turn ends → extract new memories from original messages
|
||||
5. Periodically consolidate (Dream)
|
||||
|
||||
Builds on s08 (context compact). Usage:
|
||||
|
||||
python s09_memory/code.py
|
||||
Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env
|
||||
"""
|
||||
|
||||
import os, subprocess, json, time, re
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"): os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
MEMORY_DIR = WORKDIR / ".memory"; MEMORY_DIR.mkdir(exist_ok=True)
|
||||
MEMORY_INDEX = MEMORY_DIR / "MEMORY.md"
|
||||
SKILLS_DIR = WORKDIR / "skills"
|
||||
TRANSCRIPT_DIR = WORKDIR / ".transcripts"
|
||||
TOOL_RESULTS_DIR = WORKDIR / ".task_outputs" / "tool-results"
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# NEW in s09: Memory System
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
MEMORY_TYPES = ["user", "feedback", "project", "reference"]
|
||||
|
||||
def _parse_frontmatter(text: str) -> tuple[dict, str]:
|
||||
if not text.startswith("---"):
|
||||
return {}, text
|
||||
parts = text.split("---", 2)
|
||||
if len(parts) < 3:
|
||||
return {}, text
|
||||
meta = {}
|
||||
for line in parts[1].strip().splitlines():
|
||||
if ":" in line:
|
||||
k, v = line.split(":", 1)
|
||||
meta[k.strip()] = v.strip().strip('"').strip("'")
|
||||
return meta, parts[2].strip()
|
||||
|
||||
|
||||
def write_memory_file(name: str, mem_type: str, description: str, body: str):
|
||||
"""Write a single memory file with YAML frontmatter."""
|
||||
slug = name.lower().replace(" ", "-").replace("/", "-")
|
||||
filename = f"{slug}.md"
|
||||
filepath = MEMORY_DIR / filename
|
||||
filepath.write_text(
|
||||
f"---\nname: {name}\ndescription: {description}\ntype: {mem_type}\n---\n\n{body}\n"
|
||||
)
|
||||
_rebuild_index()
|
||||
return filepath
|
||||
|
||||
|
||||
def _rebuild_index():
|
||||
"""Rebuild MEMORY.md index from all memory files."""
|
||||
lines = []
|
||||
for f in sorted(MEMORY_DIR.glob("*.md")):
|
||||
if f.name == "MEMORY.md":
|
||||
continue
|
||||
raw = f.read_text()
|
||||
meta, body = _parse_frontmatter(raw)
|
||||
name = meta.get("name", f.stem)
|
||||
desc = meta.get("description", body.split("\n")[0][:80])
|
||||
lines.append(f"- [{name}]({f.name}) — {desc}")
|
||||
MEMORY_INDEX.write_text("\n".join(lines) + "\n" if lines else "")
|
||||
|
||||
|
||||
def read_memory_index() -> str:
|
||||
"""Read MEMORY.md index (injected into SYSTEM every turn)."""
|
||||
if not MEMORY_INDEX.exists():
|
||||
return ""
|
||||
text = MEMORY_INDEX.read_text().strip()
|
||||
return text if text else ""
|
||||
|
||||
|
||||
def read_memory_file(filename: str) -> str | None:
|
||||
"""Read a single memory file's full content."""
|
||||
path = MEMORY_DIR / filename
|
||||
if not path.exists():
|
||||
return None
|
||||
return path.read_text()
|
||||
|
||||
|
||||
def list_memory_files() -> list[dict]:
|
||||
"""List all memory files with metadata."""
|
||||
result = []
|
||||
for f in sorted(MEMORY_DIR.glob("*.md")):
|
||||
if f.name == "MEMORY.md":
|
||||
continue
|
||||
raw = f.read_text()
|
||||
meta, body = _parse_frontmatter(raw)
|
||||
result.append({
|
||||
"filename": f.name,
|
||||
"name": meta.get("name", f.stem),
|
||||
"description": meta.get("description", ""),
|
||||
"type": meta.get("type", "user"),
|
||||
"body": body,
|
||||
})
|
||||
return result
|
||||
|
||||
|
||||
def select_relevant_memories(messages: list, max_items: int = 5) -> list[str]:
|
||||
"""Select relevant memory filenames by matching recent conversation against
|
||||
memory names/descriptions. Uses a simple LLM call (or falls back to keyword
|
||||
matching on name+description)."""
|
||||
files = list_memory_files()
|
||||
if not files:
|
||||
return []
|
||||
|
||||
# Collect recent user text for context
|
||||
recent_texts = []
|
||||
for msg in reversed(messages):
|
||||
if msg.get("role") == "user":
|
||||
content = msg.get("content", "")
|
||||
if isinstance(content, list):
|
||||
content = " ".join(
|
||||
str(getattr(b, "text", "")) for b in content
|
||||
if getattr(b, "type", None) == "text"
|
||||
)
|
||||
if isinstance(content, str):
|
||||
recent_texts.append(content)
|
||||
if len(recent_texts) >= 3:
|
||||
break
|
||||
recent = " ".join(reversed(recent_texts))[:2000]
|
||||
|
||||
if not recent.strip():
|
||||
return []
|
||||
|
||||
# Build catalog of name + description for LLM to choose from
|
||||
catalog_lines = []
|
||||
for i, f in enumerate(files):
|
||||
catalog_lines.append(f"{i}: {f['name']} — {f['description']}")
|
||||
catalog = "\n".join(catalog_lines)
|
||||
|
||||
prompt = (
|
||||
"Given the recent conversation and the memory catalog below, "
|
||||
"select the indices of memories that are clearly relevant. "
|
||||
"Return ONLY a JSON array of integers, e.g. [0, 3]. "
|
||||
"If none are relevant, return [].\n\n"
|
||||
f"Recent conversation:\n{recent}\n\n"
|
||||
f"Memory catalog:\n{catalog}"
|
||||
)
|
||||
|
||||
try:
|
||||
response = client.messages.create(
|
||||
model=MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
max_tokens=200,
|
||||
)
|
||||
text = response.content[0].text.strip()
|
||||
# Extract JSON array from response
|
||||
match = re.search(r'\[.*?\]', text, re.DOTALL)
|
||||
if match:
|
||||
indices = json.loads(match.group())
|
||||
selected = []
|
||||
for idx in indices:
|
||||
if isinstance(idx, int) and 0 <= idx < len(files):
|
||||
selected.append(files[idx]["filename"])
|
||||
if len(selected) >= max_items:
|
||||
break
|
||||
return selected
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Fallback: keyword matching on name + description
|
||||
keywords = [w.lower() for w in recent.split() if len(w) > 3]
|
||||
selected = []
|
||||
for f in files:
|
||||
text = (f["name"] + " " + f["description"]).lower()
|
||||
if any(kw in text for kw in keywords):
|
||||
selected.append(f["filename"])
|
||||
if len(selected) >= max_items:
|
||||
break
|
||||
return selected
|
||||
|
||||
|
||||
def load_memories(messages: list) -> str:
|
||||
"""Load relevant memory content for injection into context."""
|
||||
selected_files = select_relevant_memories(messages)
|
||||
if not selected_files:
|
||||
return ""
|
||||
|
||||
parts = ["<relevant_memories>"]
|
||||
for filename in selected_files:
|
||||
content = read_memory_file(filename)
|
||||
if content:
|
||||
parts.append(content)
|
||||
parts.append("</relevant_memories>")
|
||||
return "\n\n".join(parts)
|
||||
|
||||
|
||||
def extract_memories(messages: list):
|
||||
"""Extract new memories from recent dialogue. Runs after each turn."""
|
||||
# Collect recent conversation text
|
||||
dialogue_parts = []
|
||||
for msg in messages[-10:]:
|
||||
role = msg.get("role", "?")
|
||||
content = msg.get("content", "")
|
||||
if isinstance(content, list):
|
||||
content = " ".join(
|
||||
str(getattr(b, "text", "")) for b in content
|
||||
if getattr(b, "type", None) == "text"
|
||||
)
|
||||
if isinstance(content, str) and content.strip():
|
||||
dialogue_parts.append(f"{role}: {content}")
|
||||
dialogue = "\n".join(dialogue_parts)
|
||||
|
||||
if not dialogue.strip():
|
||||
return
|
||||
|
||||
# Check existing memories to avoid duplicates
|
||||
existing = list_memory_files()
|
||||
existing_desc = "\n".join(f"- {m['name']}: {m['description']}" for m in existing) if existing else "(none)"
|
||||
|
||||
prompt = (
|
||||
"Extract user preferences, constraints, or project facts from this dialogue.\n"
|
||||
"Return a JSON array. Each item: {name, type, description, body}.\n"
|
||||
"- name: short kebab-case identifier (e.g. 'user-preference-tabs')\n"
|
||||
"- type: one of 'user' (user preference), 'feedback' (guidance), "
|
||||
"'project' (project fact), 'reference' (external pointer)\n"
|
||||
"- description: one-line summary for index lookup\n"
|
||||
"- body: full detail in markdown\n"
|
||||
"If nothing new or already covered by existing memories, return [].\n\n"
|
||||
f"Existing memories:\n{existing_desc}\n\n"
|
||||
f"Dialogue:\n{dialogue[:4000]}"
|
||||
)
|
||||
|
||||
try:
|
||||
response = client.messages.create(
|
||||
model=MODEL, messages=[{"role": "user", "content": prompt}], max_tokens=800
|
||||
)
|
||||
text = response.content[0].text.strip()
|
||||
# Extract JSON array from response
|
||||
match = re.search(r'\[.*\]', text, re.DOTALL)
|
||||
if not match:
|
||||
return
|
||||
items = json.loads(match.group())
|
||||
if not items:
|
||||
return
|
||||
count = 0
|
||||
for mem in items:
|
||||
name = mem.get("name", f"memory_{int(time.time())}")
|
||||
mem_type = mem.get("type", "user")
|
||||
desc = mem.get("description", "")
|
||||
body = mem.get("body", "")
|
||||
if desc and body:
|
||||
write_memory_file(name, mem_type, desc, body)
|
||||
count += 1
|
||||
if count:
|
||||
print(f"\n\033[33m[Memory: extracted {count} new memories]\033[0m")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
CONSOLIDATE_THRESHOLD = 10
|
||||
|
||||
def consolidate_memories():
|
||||
"""Merge duplicate/stale memories. Triggered when file count ≥ threshold."""
|
||||
files = list_memory_files()
|
||||
if len(files) < CONSOLIDATE_THRESHOLD:
|
||||
return
|
||||
|
||||
catalog = "\n\n".join(
|
||||
f"## {f['filename']}\nname: {f['name']}\ndescription: {f['description']}\n{f['body']}"
|
||||
for f in files
|
||||
)
|
||||
|
||||
prompt = (
|
||||
"Consolidate the following memory files. Rules:\n"
|
||||
"1. Merge duplicates into one\n"
|
||||
"2. Remove outdated/contradicted memories\n"
|
||||
"3. Keep the total under 30 memories\n"
|
||||
"4. Preserve important user preferences above all\n"
|
||||
"Return a JSON array. Each item: {name, type, description, body}.\n\n"
|
||||
f"{catalog[:16000]}"
|
||||
)
|
||||
|
||||
try:
|
||||
response = client.messages.create(
|
||||
model=MODEL, messages=[{"role": "user", "content": prompt}], max_tokens=3000
|
||||
)
|
||||
text = response.content[0].text.strip()
|
||||
match = re.search(r'\[.*\]', text, re.DOTALL)
|
||||
if not match:
|
||||
return
|
||||
items = json.loads(match.group())
|
||||
|
||||
# Remove old memory files (keep MEMORY.md)
|
||||
for f in MEMORY_DIR.glob("*.md"):
|
||||
if f.name != "MEMORY.md":
|
||||
f.unlink()
|
||||
|
||||
for mem in items:
|
||||
name = mem.get("name", f"memory_{int(time.time())}")
|
||||
mem_type = mem.get("type", "user")
|
||||
desc = mem.get("description", "")
|
||||
body = mem.get("body", "")
|
||||
if desc and body:
|
||||
write_memory_file(name, mem_type, desc, body)
|
||||
|
||||
print(f"\n\033[33m[Memory: consolidated {len(files)} → {len(items)} memories]\033[0m")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
# Build SYSTEM with memory index
|
||||
def build_system() -> str:
|
||||
index = read_memory_index()
|
||||
memories_section = f"\n\nMemories available:\n{index}" if index else ""
|
||||
return (
|
||||
f"You are a coding agent at {WORKDIR}."
|
||||
f"{memories_section}\n"
|
||||
"Relevant memories are injected below. Respect user preferences from memory.\n"
|
||||
"When the user says 'remember' or expresses a clear preference, extract it as a memory."
|
||||
)
|
||||
|
||||
SYSTEM = build_system()
|
||||
|
||||
SUB_SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Complete the task you were given, then return a concise summary. "
|
||||
"Do not delegate further."
|
||||
)
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s02-s08 (skeleton): Basic tools
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR): raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR, capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired: return "Error: Timeout (120s)"
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines): lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path); file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content); return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_edit(path: str, old_text: str, new_text: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
text = file_path.read_text()
|
||||
if old_text not in text: return f"Error: text not found in {path}"
|
||||
file_path.write_text(text.replace(old_text, new_text, 1))
|
||||
return f"Edited {path}"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def run_glob(pattern: str) -> str:
|
||||
import glob as g
|
||||
try:
|
||||
results = []
|
||||
for match in g.glob(pattern, root_dir=WORKDIR):
|
||||
if (WORKDIR / match).resolve().is_relative_to(WORKDIR):
|
||||
results.append(match)
|
||||
return "\n".join(results) if results else "(no matches)"
|
||||
except Exception as e: return f"Error: {e}"
|
||||
|
||||
def extract_text(content) -> str:
|
||||
if not isinstance(content, list): return str(content)
|
||||
return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text")
|
||||
|
||||
# Subagent (simplified from s06-s07)
|
||||
SUB_TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
]
|
||||
SUB_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write}
|
||||
|
||||
def spawn_subagent(task: str) -> str:
|
||||
print(f"\n\033[35m[Subagent spawned]\033[0m")
|
||||
messages = [{"role": "user", "content": task}]
|
||||
for _ in range(30):
|
||||
response = client.messages.create(model=MODEL, system=SUB_SYSTEM,
|
||||
messages=messages, tools=SUB_TOOLS, max_tokens=8000)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use": break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = SUB_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
print(f" \033[90m[sub] {block.name}: {str(output)[:100]}\033[0m")
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
result = extract_text(messages[-1]["content"])
|
||||
if not result:
|
||||
for msg in reversed(messages):
|
||||
if msg["role"] == "assistant":
|
||||
result = extract_text(msg["content"])
|
||||
if result: break
|
||||
if not result: result = "Subagent stopped after 30 turns without final answer."
|
||||
print(f"\033[35m[Subagent done]\033[0m")
|
||||
return result
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# FROM s08 (skeleton): Compaction pipeline
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
CONTEXT_LIMIT = 50000; KEEP_RECENT = 3; PERSIST_THRESHOLD = 30000
|
||||
|
||||
def estimate_size(msgs): return len(str(msgs))
|
||||
|
||||
def snip_compact(msgs, mx=50):
|
||||
if len(msgs) <= mx: return msgs
|
||||
return msgs[:3] + [{"role": "user", "content": f"[snipped {len(msgs)-mx} msgs]"}] + msgs[-(mx-3):]
|
||||
|
||||
def collect_tool_results(msgs):
|
||||
blocks = []
|
||||
for mi, msg in enumerate(msgs):
|
||||
if msg.get("role") != "user" or not isinstance(msg.get("content"), list): continue
|
||||
for bi, block in enumerate(msg["content"]):
|
||||
if isinstance(block, dict) and block.get("type") == "tool_result": blocks.append((mi, bi, block))
|
||||
return blocks
|
||||
|
||||
def micro_compact(msgs):
|
||||
tr = collect_tool_results(msgs)
|
||||
if len(tr) <= KEEP_RECENT: return msgs
|
||||
for _, _, b in tr[:-KEEP_RECENT]:
|
||||
if len(b.get("content", "")) > 120: b["content"] = "[Earlier tool result compacted.]"
|
||||
return msgs
|
||||
|
||||
def persist_large(tid, out):
|
||||
if len(out) <= PERSIST_THRESHOLD: return out
|
||||
TOOL_RESULTS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
p = TOOL_RESULTS_DIR / f"{tid}.txt"
|
||||
if not p.exists(): p.write_text(out)
|
||||
return f"<persisted-output>\nFull: {p}\nPreview:\n{out[:2000]}\n</persisted-output>"
|
||||
|
||||
def tool_result_budget(msgs, mx=200_000):
|
||||
last = msgs[-1] if msgs else None
|
||||
if not last or last.get("role") != "user" or not isinstance(last.get("content"), list): return msgs
|
||||
blocks = [(i, b) for i, b in enumerate(last["content"]) if isinstance(b, dict) and b.get("type") == "tool_result"]
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
if total <= mx: return msgs
|
||||
for _, block in sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True):
|
||||
if total <= mx: break
|
||||
c = str(block.get("content", ""))
|
||||
if len(c) <= PERSIST_THRESHOLD: continue
|
||||
block["content"] = persist_large(block.get("tool_use_id", "?"), c)
|
||||
total = sum(len(str(b.get("content", ""))) for _, b in blocks)
|
||||
return msgs
|
||||
|
||||
def write_transcript(msgs):
|
||||
TRANSCRIPT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
p = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
|
||||
with p.open("w") as f:
|
||||
for m in msgs: f.write(json.dumps(m, default=str) + "\n")
|
||||
return p
|
||||
|
||||
def summarize_history(msgs):
|
||||
conv = json.dumps(msgs, default=str)[:80000]
|
||||
r = client.messages.create(model=MODEL, messages=[{"role": "user", "content":
|
||||
"Summarize this coding-agent conversation so work can continue.\n"
|
||||
"Preserve: 1. current goal, 2. key findings, 3. files changed, 4. remaining work, 5. user constraints.\n\n" + conv}],
|
||||
max_tokens=2000)
|
||||
return r.content[0].text.strip()
|
||||
|
||||
def compact_history(msgs):
|
||||
write_transcript(msgs)
|
||||
summary = summarize_history(msgs)
|
||||
return [{"role": "user", "content": f"[Compacted]\n\n{summary}"}]
|
||||
|
||||
def reactive_compact(msgs):
|
||||
write_transcript(msgs)
|
||||
summary = summarize_history(msgs)
|
||||
return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *msgs[-5:]]
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# Tool Definitions (skeleton — fewer tools to focus on memory)
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}},
|
||||
{"name": "edit_file", "description": "Replace exact text in a file once.",
|
||||
"input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}},
|
||||
{"name": "glob", "description": "Find files matching a glob pattern.",
|
||||
"input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
|
||||
{"name": "task", "description": "Launch a subagent to handle a subtask.",
|
||||
"input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
"bash": run_bash, "read_file": run_read, "write_file": run_write,
|
||||
"edit_file": run_edit, "glob": run_glob, "task": spawn_subagent,
|
||||
}
|
||||
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# agent_loop — s09: inject memories + extract after each turn
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
MAX_REACTIVE_RETRIES = 1
|
||||
|
||||
def agent_loop(messages: list):
|
||||
reactive_retries = 0
|
||||
while True:
|
||||
# s09: rebuild system with current memory index + relevant memories
|
||||
system = build_system()
|
||||
memories_content = load_memories(messages)
|
||||
if memories_content:
|
||||
system += "\n\n" + memories_content
|
||||
|
||||
# s09: save pre-compression snapshot for accurate memory extraction
|
||||
pre_compress = [m if isinstance(m, dict) else {"role": m.get("role",""),
|
||||
"content": str(m.get("content",""))} for m in messages]
|
||||
|
||||
# s08: compression pipeline (budget → snip → micro)
|
||||
messages[:] = tool_result_budget(messages)
|
||||
messages[:] = snip_compact(messages)
|
||||
messages[:] = micro_compact(messages)
|
||||
|
||||
if estimate_size(messages) > CONTEXT_LIMIT:
|
||||
print("[auto compact]")
|
||||
messages[:] = compact_history(messages)
|
||||
|
||||
try:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=system, messages=messages, tools=TOOLS, max_tokens=8000
|
||||
)
|
||||
reactive_retries = 0
|
||||
except Exception as e:
|
||||
if ("prompt_too_long" in str(e).lower() or "too many tokens" in str(e).lower()) and reactive_retries < MAX_REACTIVE_RETRIES:
|
||||
print("[reactive compact]")
|
||||
messages[:] = reactive_compact(messages)
|
||||
reactive_retries += 1
|
||||
continue
|
||||
raise
|
||||
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
# s09: extract from pre-compression snapshot for full fidelity
|
||||
extract_memories(pre_compress)
|
||||
consolidate_memories()
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use": continue
|
||||
print(f"\033[36m> {block.name}\033[0m")
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
print(str(output)[:200])
|
||||
results.append({"type": "tool_result", "tool_use_id": block.id, "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s09: Memory — persistent cross-session knowledge")
|
||||
print("输入问题,回车发送。输入 q 退出。\n")
|
||||
history = []
|
||||
while True:
|
||||
try: query = input("\033[36ms09 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt): break
|
||||
if query.strip().lower() in ("q", "exit", ""): break
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text": print(block.text)
|
||||
print()
|
||||
104
s09_memory/images/memory-overview.en.svg
Normal file
@@ -0,0 +1,104 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 430" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#7c3aed"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-purple" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="430" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Memory — Memory loading, extraction, and consolidation on s08 compression pipeline</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s08 preserved</text>
|
||||
<rect x="160" y="56" width="12" height="10" rx="2" fill="#f3e8ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="178" y="66" fill="#7c3aed" font-size="10" font-weight="600">s09 new</text>
|
||||
|
||||
<!-- ===== messages[] ===== -->
|
||||
<rect x="30" y="96" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="80" y="126" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- arrow → compression -->
|
||||
<line x1="130" y1="122" x2="152" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== Compression pipeline (s08) ===== -->
|
||||
<rect x="155" y="86" width="135" height="72" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="222" y="108" fill="#1e3a5f" font-size="11" font-weight="700" text-anchor="middle">Compression</text>
|
||||
<text x="222" y="124" fill="#64748b" font-size="9" text-anchor="middle">budget → snip → micro</text>
|
||||
<text x="222" y="138" fill="#64748b" font-size="9" text-anchor="middle">→ autoCompact</text>
|
||||
<text x="222" y="152" fill="#94a3b8" font-size="8" text-anchor="middle">(s08)</text>
|
||||
|
||||
<!-- arrow → Loading (purple) -->
|
||||
<line x1="290" y1="122" x2="317" y2="122" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
|
||||
<!-- ===== Loading (s09) ===== -->
|
||||
<rect x="320" y="86" width="120" height="72" rx="8" fill="#f3e8ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="380" y="108" fill="#5b21b6" font-size="11" font-weight="700" text-anchor="middle">Loading</text>
|
||||
<text x="380" y="124" fill="#7c3aed" font-size="9" text-anchor="middle">LLM side-query select</text>
|
||||
<text x="380" y="138" fill="#7c3aed" font-size="9" text-anchor="middle">inject file contents</text>
|
||||
<text x="380" y="152" fill="#a78bfa" font-size="8" text-anchor="middle">≤ 5 items</text>
|
||||
|
||||
<!-- arrow → LLM -->
|
||||
<line x1="440" y1="122" x2="472" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== LLM (s08) ===== -->
|
||||
<rect x="475" y="96" width="80" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="515" y="114" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="515" y="132" fill="#64748b" font-size="9" text-anchor="middle">stop_reason</text>
|
||||
<text x="515" y="144" fill="#64748b" font-size="9" text-anchor="middle">=tool_use?</text>
|
||||
|
||||
<!-- LLM → no → return result -->
|
||||
<line x1="515" y1="148" x2="515" y2="178" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="528" y="168" fill="#16a34a" font-size="9" font-weight="600">no, stop</text>
|
||||
<rect x="460" y="180" width="110" height="24" rx="12" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="515" y="196" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">return result</text>
|
||||
|
||||
<!-- LLM → yes → TOOL_HANDLERS -->
|
||||
<line x1="555" y1="122" x2="587" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="568" y="114" fill="#64748b" font-size="9" font-weight="600">yes</text>
|
||||
|
||||
<!-- ===== TOOL_HANDLERS (s08) ===== -->
|
||||
<rect x="590" y="88" width="130" height="68" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="655" y="112" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="655" y="128" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="655" y="142" fill="#94a3b8" font-size="8" text-anchor="middle">edit · glob · task</text>
|
||||
|
||||
<!-- ===== Memory Files (s09) ===== -->
|
||||
<rect x="155" y="232" width="430" height="36" rx="6" fill="#faf5ff" stroke="#7c3aed" stroke-width="1.5" stroke-dasharray="4,2"/>
|
||||
<text x="370" y="255" fill="#5b21b6" font-size="11" font-weight="600" text-anchor="middle">.memory/ — MEMORY.md index + *.md files (cross-session persistent)</text>
|
||||
|
||||
<!-- Arrow: Memory Files → Loading -->
|
||||
<path d="M 395 232 L 395 162" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
<text x="408" y="200" fill="#7c3aed" font-size="9">read</text>
|
||||
|
||||
<!-- Arrow: return result → Extraction → Memory Files -->
|
||||
<path d="M 515 204 L 515 232" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
<text x="528" y="222" fill="#7c3aed" font-size="9">Extraction (after each turn)</text>
|
||||
|
||||
<!-- Consolidation note -->
|
||||
<text x="222" y="284" fill="#a78bfa" font-size="9">Consolidation: triggers at ≥ 10 files, dedup·merge·prune</text>
|
||||
|
||||
<!-- ===== Loop back ===== -->
|
||||
<path d="M 720 122 L 748 122 Q 756 122 756 130 L 756 310 Q 756 318 748 318 L 88 318 Q 80 318 80 310 L 80 148" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="400" y="340" fill="#64748b" font-size="10" text-anchor="middle">tool results → messages[] → compress → load memories → LLM → extract after each turn</text>
|
||||
|
||||
<!-- ===== Bottom notes ===== -->
|
||||
<rect x="40" y="358" width="680" height="56" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<rect x="60" y="372" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="80" y="382" fill="#475569" font-size="10">s08 preserved: compression pipeline (budget → snip → micro → auto) + emergency trim + loop</text>
|
||||
<rect x="60" y="392" width="12" height="10" rx="2" fill="#f3e8ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="80" y="402" fill="#475569" font-size="10">s09 new: Loading (index in SYSTEM + on-demand inject) + Extraction (after each turn) + Consolidation (threshold)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.0 KiB |
104
s09_memory/images/memory-overview.ja.svg
Normal file
@@ -0,0 +1,104 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 430" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#7c3aed"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-purple" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="430" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Memory — s08 圧縮パイプラインに記憶の読み込み・抽出・整理を挿入</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s08 維持</text>
|
||||
<rect x="130" y="56" width="12" height="10" rx="2" fill="#f3e8ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="148" y="66" fill="#7c3aed" font-size="10" font-weight="600">s09 追加</text>
|
||||
|
||||
<!-- ===== messages[] ===== -->
|
||||
<rect x="30" y="96" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="80" y="126" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- arrow → compression -->
|
||||
<line x1="130" y1="122" x2="152" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== Compression pipeline (s08) ===== -->
|
||||
<rect x="155" y="86" width="135" height="72" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="222" y="108" fill="#1e3a5f" font-size="11" font-weight="700" text-anchor="middle">圧縮パイプライン</text>
|
||||
<text x="222" y="124" fill="#64748b" font-size="9" text-anchor="middle">budget → snip → micro</text>
|
||||
<text x="222" y="138" fill="#64748b" font-size="9" text-anchor="middle">→ autoCompact</text>
|
||||
<text x="222" y="152" fill="#94a3b8" font-size="8" text-anchor="middle">(s08)</text>
|
||||
|
||||
<!-- arrow → Loading (purple) -->
|
||||
<line x1="290" y1="122" x2="317" y2="122" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
|
||||
<!-- ===== Loading (s09) ===== -->
|
||||
<rect x="320" y="86" width="120" height="72" rx="8" fill="#f3e8ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="380" y="108" fill="#5b21b6" font-size="11" font-weight="700" text-anchor="middle">Loading</text>
|
||||
<text x="380" y="124" fill="#7c3aed" font-size="9" text-anchor="middle">LLM side-query 選択</text>
|
||||
<text x="380" y="138" fill="#7c3aed" font-size="9" text-anchor="middle">ファイル内容を注入</text>
|
||||
<text x="380" y="152" fill="#a78bfa" font-size="8" text-anchor="middle">≤ 5 件</text>
|
||||
|
||||
<!-- arrow → LLM -->
|
||||
<line x1="440" y1="122" x2="472" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== LLM (s08) ===== -->
|
||||
<rect x="475" y="96" width="80" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="515" y="114" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="515" y="132" fill="#64748b" font-size="9" text-anchor="middle">stop_reason</text>
|
||||
<text x="515" y="144" fill="#64748b" font-size="9" text-anchor="middle">=tool_use?</text>
|
||||
|
||||
<!-- LLM → no → return result -->
|
||||
<line x1="515" y1="148" x2="515" y2="178" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="528" y="168" fill="#16a34a" font-size="9" font-weight="600">なし、停止</text>
|
||||
<rect x="460" y="180" width="110" height="24" rx="12" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="515" y="196" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">結果を返す</text>
|
||||
|
||||
<!-- LLM → yes → TOOL_HANDLERS -->
|
||||
<line x1="555" y1="122" x2="587" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="568" y="114" fill="#64748b" font-size="9" font-weight="600">あり</text>
|
||||
|
||||
<!-- ===== TOOL_HANDLERS (s08) ===== -->
|
||||
<rect x="590" y="88" width="130" height="68" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="655" y="112" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="655" y="128" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="655" y="142" fill="#94a3b8" font-size="8" text-anchor="middle">edit · glob · task</text>
|
||||
|
||||
<!-- ===== Memory Files (s09) ===== -->
|
||||
<rect x="155" y="232" width="430" height="36" rx="6" fill="#faf5ff" stroke="#7c3aed" stroke-width="1.5" stroke-dasharray="4,2"/>
|
||||
<text x="370" y="255" fill="#5b21b6" font-size="11" font-weight="600" text-anchor="middle">.memory/ — MEMORY.md インデックス + *.md ファイル(セッション間永続化)</text>
|
||||
|
||||
<!-- Arrow: Memory Files → Loading -->
|
||||
<path d="M 395 232 L 395 162" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
<text x="408" y="200" fill="#7c3aed" font-size="9">読み込み</text>
|
||||
|
||||
<!-- Arrow: return result → Extraction → Memory Files -->
|
||||
<path d="M 515 204 L 515 232" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
<text x="528" y="222" fill="#7c3aed" font-size="9">Extraction(毎ターン終了後)</text>
|
||||
|
||||
<!-- Consolidation note -->
|
||||
<text x="222" y="284" fill="#a78bfa" font-size="9">Consolidation: ファイル ≥ 10 でトリガー、重複排除・統合・剪定</text>
|
||||
|
||||
<!-- ===== Loop back ===== -->
|
||||
<path d="M 720 122 L 748 122 Q 756 122 756 130 L 756 310 Q 756 318 748 318 L 88 318 Q 80 318 80 310 L 80 148" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="400" y="340" fill="#64748b" font-size="10" text-anchor="middle">ツール結果 → messages[] → 圧縮 → 記憶読み込み → LLM → 毎ターン終了後に抽出</text>
|
||||
|
||||
<!-- ===== Bottom notes ===== -->
|
||||
<rect x="40" y="358" width="680" height="56" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<rect x="60" y="372" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="80" y="382" fill="#475569" font-size="10">s08 維持:圧縮パイプライン(budget → snip → micro → auto)+ 緊急トリム + ループ</text>
|
||||
<rect x="60" y="392" width="12" height="10" rx="2" fill="#f3e8ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="80" y="402" fill="#475569" font-size="10">s09 追加:Loading(インデックス常駐 + オンデマンド注入)+ Extraction(毎ターン終了後)+ Consolidation(閾値トリガー)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.2 KiB |
104
s09_memory/images/memory-overview.svg
Normal file
@@ -0,0 +1,104 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 430" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#7c3aed"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-purple" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#16a34a"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="430" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">Memory — 在 s08 压缩管线上,插入记忆加载、提取与整理</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s08 保留</text>
|
||||
<rect x="140" y="56" width="12" height="10" rx="2" fill="#f3e8ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="158" y="66" fill="#7c3aed" font-size="10" font-weight="600">s09 新增</text>
|
||||
|
||||
<!-- ===== messages[] ===== -->
|
||||
<rect x="30" y="96" width="100" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="80" y="126" fill="#1e3a5f" font-size="12" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- arrow → compression -->
|
||||
<line x1="130" y1="122" x2="152" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== Compression pipeline (s08) ===== -->
|
||||
<rect x="155" y="86" width="135" height="72" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="222" y="108" fill="#1e3a5f" font-size="11" font-weight="700" text-anchor="middle">压缩管线</text>
|
||||
<text x="222" y="124" fill="#64748b" font-size="9" text-anchor="middle">budget → snip → micro</text>
|
||||
<text x="222" y="138" fill="#64748b" font-size="9" text-anchor="middle">→ autoCompact</text>
|
||||
<text x="222" y="152" fill="#94a3b8" font-size="8" text-anchor="middle">(s08)</text>
|
||||
|
||||
<!-- arrow → Loading (purple) -->
|
||||
<line x1="290" y1="122" x2="317" y2="122" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
|
||||
<!-- ===== Loading (s09) ===== -->
|
||||
<rect x="320" y="86" width="120" height="72" rx="8" fill="#f3e8ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="380" y="108" fill="#5b21b6" font-size="11" font-weight="700" text-anchor="middle">Loading</text>
|
||||
<text x="380" y="124" fill="#7c3aed" font-size="9" text-anchor="middle">LLM side-query 选文件</text>
|
||||
<text x="380" y="138" fill="#7c3aed" font-size="9" text-anchor="middle">注入文件内容</text>
|
||||
<text x="380" y="152" fill="#a78bfa" font-size="8" text-anchor="middle">≤ 5 条</text>
|
||||
|
||||
<!-- arrow → LLM -->
|
||||
<line x1="440" y1="122" x2="472" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- ===== LLM (s08) ===== -->
|
||||
<rect x="475" y="96" width="80" height="52" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="515" y="114" fill="#1e3a5f" font-size="14" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="515" y="132" fill="#64748b" font-size="9" text-anchor="middle">stop_reason</text>
|
||||
<text x="515" y="144" fill="#64748b" font-size="9" text-anchor="middle">=tool_use?</text>
|
||||
|
||||
<!-- LLM → 否 → 返回结果 -->
|
||||
<line x1="515" y1="148" x2="515" y2="178" stroke="#16a34a" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="528" y="168" fill="#16a34a" font-size="9" font-weight="600">否,停止</text>
|
||||
<rect x="460" y="180" width="110" height="24" rx="12" fill="#dcfce7" stroke="#16a34a" stroke-width="1.5"/>
|
||||
<text x="515" y="196" fill="#166534" font-size="10" font-weight="600" text-anchor="middle">返回结果</text>
|
||||
|
||||
<!-- LLM → 是 → TOOL_HANDLERS -->
|
||||
<line x1="555" y1="122" x2="587" y2="122" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="568" y="114" fill="#64748b" font-size="9" font-weight="600">是</text>
|
||||
|
||||
<!-- ===== TOOL_HANDLERS (s08) ===== -->
|
||||
<rect x="590" y="88" width="130" height="68" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="655" y="112" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="655" y="128" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="655" y="142" fill="#94a3b8" font-size="8" text-anchor="middle">edit · glob · task</text>
|
||||
|
||||
<!-- ===== Memory Files (s09) ===== -->
|
||||
<rect x="155" y="232" width="430" height="36" rx="6" fill="#faf5ff" stroke="#7c3aed" stroke-width="1.5" stroke-dasharray="4,2"/>
|
||||
<text x="370" y="255" fill="#5b21b6" font-size="11" font-weight="600" text-anchor="middle">.memory/ — MEMORY.md 索引 + *.md 文件(跨会话持久化)</text>
|
||||
|
||||
<!-- Arrow: Memory Files → Loading -->
|
||||
<path d="M 395 232 L 395 162" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
<text x="408" y="200" fill="#7c3aed" font-size="9">读取</text>
|
||||
|
||||
<!-- Arrow: 返回结果 → Extraction → Memory Files -->
|
||||
<path d="M 515 204 L 515 232" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow-purple)"/>
|
||||
<text x="528" y="222" fill="#7c3aed" font-size="9">Extraction(每轮结束后)</text>
|
||||
|
||||
<!-- Consolidation note -->
|
||||
<text x="222" y="284" fill="#a78bfa" font-size="9">Consolidation: 文件数 ≥ 10 时触发,去重·合并·剪枝</text>
|
||||
|
||||
<!-- ===== Loop back ===== -->
|
||||
<path d="M 720 122 L 748 122 Q 756 122 756 130 L 756 310 Q 756 318 748 318 L 88 318 Q 80 318 80 310 L 80 148" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="400" y="340" fill="#64748b" font-size="10" text-anchor="middle">工具结果追加到 messages[] → 压缩 → 加载记忆 → LLM → 每轮结束后提取</text>
|
||||
|
||||
<!-- ===== Bottom notes ===== -->
|
||||
<rect x="40" y="358" width="680" height="56" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<rect x="60" y="372" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="80" y="382" fill="#475569" font-size="10">s08 保留:压缩管线(budget → snip → micro → auto)+ 应急裁剪 + 循环</text>
|
||||
<rect x="60" y="392" width="12" height="10" rx="2" fill="#f3e8ff" stroke="#7c3aed" stroke-width="1"/>
|
||||
<text x="80" y="402" fill="#475569" font-size="10">s09 新增:Loading(索引常驻 + 按需注入)+ Extraction(每轮结束后)+ Consolidation(阈值触发)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 7.0 KiB |
78
s09_memory/images/memory-subsystems.en.svg
Normal file
@@ -0,0 +1,78 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 380" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#7c3aed"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="380" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Memory System — Store · Load · Extract · Consolidate</text>
|
||||
|
||||
<!-- Storage -->
|
||||
<rect x="40" y="58" width="145" height="80" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="112" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">Storage</text>
|
||||
<line x1="55" y1="90" x2="170" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="55" y="108" fill="#5b21b6" font-size="10">.memory/*.md files</text>
|
||||
<text x="55" y="124" fill="#5b21b6" font-size="10">MEMORY.md index</text>
|
||||
|
||||
<line x1="190" y1="98" x2="218" y2="98" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Loading -->
|
||||
<rect x="222" y="58" width="200" height="80" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="322" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">Load</text>
|
||||
<line x1="237" y1="90" x2="407" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="237" y="108" fill="#5b21b6" font-size="10">Index in SYSTEM (always)</text>
|
||||
<text x="237" y="124" fill="#5b21b6" font-size="10">LLM side-query select files</text>
|
||||
<text x="237" y="134" fill="#a78bfa" font-size="9">≤ 5 items, fallback to keyword</text>
|
||||
|
||||
<line x1="425" y1="98" x2="453" y2="98" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- Extraction -->
|
||||
<rect x="457" y="58" width="130" height="80" rx="8" fill="#f3e8ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="522" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">Extract</text>
|
||||
<line x1="472" y1="90" x2="572" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="472" y="108" fill="#5b21b6" font-size="10">After each turn ends</text>
|
||||
<text x="472" y="124" fill="#5b21b6" font-size="10">LLM extracts prefs/constraints</text>
|
||||
<text x="472" y="134" fill="#a78bfa" font-size="9">Check existing, avoid duplicates</text>
|
||||
|
||||
<!-- Consolidation -->
|
||||
<rect x="600" y="58" width="100" height="80" rx="8" fill="#f5f3ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="650" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">Consolidate</text>
|
||||
<line x1="615" y1="90" x2="685" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="615" y="108" fill="#5b21b6" font-size="10">Triggers at ≥ 10 files</text>
|
||||
<text x="615" y="124" fill="#5b21b6" font-size="10">Dedup · merge · prune</text>
|
||||
<text x="615" y="134" fill="#a78bfa" font-size="9">CC: 3-layer gating</text>
|
||||
|
||||
<!-- Memory Files -->
|
||||
<rect x="40" y="180" width="660" height="36" rx="6" fill="#f8fafc" stroke="#94a3b8" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="370" y="203" fill="#475569" font-size="11" text-anchor="middle">.memory/ — MEMORY.md index + *.md files (YAML frontmatter: name / description / type)</text>
|
||||
|
||||
<!-- Arrow: Storage → Memory Files -->
|
||||
<path d="M 112 138 L 112 174 Q 112 180 118 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="120" y="162" fill="#7c3aed" font-size="9">read/write</text>
|
||||
|
||||
<!-- Arrow: Extraction → Memory Files -->
|
||||
<path d="M 522 138 L 522 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="536" y="164" fill="#7c3aed" font-size="9">write</text>
|
||||
|
||||
<!-- Arrow: Consolidation → Memory Files -->
|
||||
<path d="M 650 138 L 650 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="662" y="164" fill="#7c3aed" font-size="9">overwrite</text>
|
||||
|
||||
<!-- Four types -->
|
||||
<rect x="40" y="240" width="660" height="40" rx="6" fill="#faf5ff" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="60" y="260" fill="#5b21b6" font-size="10" font-weight="600">Four types:</text>
|
||||
<text x="140" y="260" fill="#475569" font-size="10">user (who you are) · feedback (how to work) · project (what's happening) · reference (where to find things)</text>
|
||||
|
||||
<!-- CC source comparison -->
|
||||
<rect x="40" y="296" width="660" height="72" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="60" y="316" fill="#5b21b6" font-size="11" font-weight="600">CC Source Comparison</text>
|
||||
<text x="60" y="334" fill="#475569" font-size="10">• Selection: LLM side-query (Sonnet selects), not embedding vector similarity</text>
|
||||
<text x="60" y="350" fill="#475569" font-size="10">• Extraction timing: stop hook (after each turn ends), not after autoCompact</text>
|
||||
<text x="60" y="366" fill="#475569" font-size="10">• Dream consolidation: 3-layer gating (time ≥ 24h + sessions ≥ 5 + file lock), not simple count</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.2 KiB |
78
s09_memory/images/memory-subsystems.ja.svg
Normal file
@@ -0,0 +1,78 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 380" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#7c3aed"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="380" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Memory System — ストレージ · 読み込み · 抽出 · 整理</text>
|
||||
|
||||
<!-- ストレージ -->
|
||||
<rect x="40" y="58" width="145" height="80" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="112" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">ストレージ</text>
|
||||
<line x1="55" y1="90" x2="170" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="55" y="108" fill="#5b21b6" font-size="10">.memory/*.md ファイル</text>
|
||||
<text x="55" y="124" fill="#5b21b6" font-size="10">MEMORY.md インデックス</text>
|
||||
|
||||
<line x1="190" y1="98" x2="218" y2="98" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 読み込み -->
|
||||
<rect x="222" y="58" width="200" height="80" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="322" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">読み込み</text>
|
||||
<line x1="237" y1="90" x2="407" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="237" y="108" fill="#5b21b6" font-size="10">インデックスを SYSTEM に常駐</text>
|
||||
<text x="237" y="124" fill="#5b21b6" font-size="10">LLM side-query でファイル選択</text>
|
||||
<text x="237" y="134" fill="#a78bfa" font-size="9">≤ 5 件、失敗時はキーワードに降格</text>
|
||||
|
||||
<line x1="425" y1="98" x2="453" y2="98" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 抽出 -->
|
||||
<rect x="457" y="58" width="130" height="80" rx="8" fill="#f3e8ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="522" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">抽出</text>
|
||||
<line x1="472" y1="90" x2="572" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="472" y="108" fill="#5b21b6" font-size="10">毎ターン終了後にトリガー</text>
|
||||
<text x="472" y="124" fill="#5b21b6" font-size="10">LLM が好み/制約を抽出</text>
|
||||
<text x="472" y="134" fill="#a78bfa" font-size="9">既存を確認、重複回避</text>
|
||||
|
||||
<!-- 整理 -->
|
||||
<rect x="600" y="58" width="100" height="80" rx="8" fill="#f5f3ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="650" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">整理</text>
|
||||
<line x1="615" y1="90" x2="685" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="615" y="108" fill="#5b21b6" font-size="10">ファイル ≥ 10 でトリガー</text>
|
||||
<text x="615" y="124" fill="#5b21b6" font-size="10">重複排除・統合・剪定</text>
|
||||
<text x="615" y="134" fill="#a78bfa" font-size="9">CC: 3 層ゲート</text>
|
||||
|
||||
<!-- Memory Files -->
|
||||
<rect x="40" y="180" width="660" height="36" rx="6" fill="#f8fafc" stroke="#94a3b8" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="370" y="203" fill="#475569" font-size="11" text-anchor="middle">.memory/ — MEMORY.md インデックス + *.md ファイル(YAML frontmatter: name / description / type)</text>
|
||||
|
||||
<!-- Arrow: ストレージ → Memory Files -->
|
||||
<path d="M 112 138 L 112 174 Q 112 180 118 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="120" y="162" fill="#7c3aed" font-size="9">読み/書き</text>
|
||||
|
||||
<!-- Arrow: 抽出 → Memory Files -->
|
||||
<path d="M 522 138 L 522 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="536" y="164" fill="#7c3aed" font-size="9">書き込み</text>
|
||||
|
||||
<!-- Arrow: 整理 → Memory Files -->
|
||||
<path d="M 650 138 L 650 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="662" y="164" fill="#7c3aed" font-size="9">上書き</text>
|
||||
|
||||
<!-- 4 種類の記憶 -->
|
||||
<rect x="40" y="240" width="660" height="40" rx="6" fill="#faf5ff" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="60" y="260" fill="#5b21b6" font-size="10" font-weight="600">4 種類の記憶:</text>
|
||||
<text x="148" y="260" fill="#475569" font-size="10">user(あなたは誰か)· feedback(どう作業するか)· project(何が起きているか)· reference(どこで探すか)</text>
|
||||
|
||||
<!-- CC ソースコード対照 -->
|
||||
<rect x="40" y="296" width="660" height="72" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="60" y="316" fill="#5b21b6" font-size="11" font-weight="600">CC ソースコード対照</text>
|
||||
<text x="60" y="334" fill="#475569" font-size="10">• 記憶選択:LLM side-query(Sonnet が選択)、embedding ベクトル類似度ではない</text>
|
||||
<text x="60" y="350" fill="#475569" font-size="10">• 抽出タイミング:stop hook(毎ターン終了後)、autoCompact 後ではない</text>
|
||||
<text x="60" y="366" fill="#475569" font-size="10">• Dream 整理:3 層ゲート(時間 ≥ 24h + セッション ≥ 5 + ファイルロック)、単純な計数ではない</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.5 KiB |
78
s09_memory/images/memory-subsystems.svg
Normal file
@@ -0,0 +1,78 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 720 380" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#7c3aed"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#7c3aed"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="720" height="380" fill="#fafbfc" rx="8"/>
|
||||
<rect x="0" y="0" width="720" height="38" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="30" width="720" height="8" fill="url(#header)"/>
|
||||
<text x="360" y="25" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">Memory System — 存储 · 加载 · 提取 · 整理</text>
|
||||
|
||||
<!-- 存储 -->
|
||||
<rect x="40" y="58" width="145" height="80" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="112" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">存储</text>
|
||||
<line x1="55" y1="90" x2="170" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="55" y="108" fill="#5b21b6" font-size="10">.memory/*.md 文件</text>
|
||||
<text x="55" y="124" fill="#5b21b6" font-size="10">MEMORY.md 索引</text>
|
||||
|
||||
<line x1="190" y1="98" x2="218" y2="98" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 加载 -->
|
||||
<rect x="222" y="58" width="200" height="80" rx="8" fill="#ede9fe" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="322" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">加载</text>
|
||||
<line x1="237" y1="90" x2="407" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="237" y="108" fill="#5b21b6" font-size="10">索引常驻 SYSTEM</text>
|
||||
<text x="237" y="124" fill="#5b21b6" font-size="10">LLM side-query 选文件</text>
|
||||
<text x="237" y="134" fill="#a78bfa" font-size="9">≤ 5 条,失败降级到关键词</text>
|
||||
|
||||
<line x1="425" y1="98" x2="453" y2="98" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 提取 -->
|
||||
<rect x="457" y="58" width="130" height="80" rx="8" fill="#f3e8ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="522" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">提取</text>
|
||||
<line x1="472" y1="90" x2="572" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="472" y="108" fill="#5b21b6" font-size="10">每轮结束后触发</text>
|
||||
<text x="472" y="124" fill="#5b21b6" font-size="10">LLM 提取偏好/约束</text>
|
||||
<text x="472" y="134" fill="#a78bfa" font-size="9">检查已有,避免重复</text>
|
||||
|
||||
<!-- 整理 -->
|
||||
<rect x="600" y="58" width="100" height="80" rx="8" fill="#f5f3ff" stroke="#7c3aed" stroke-width="2"/>
|
||||
<text x="650" y="80" fill="#5b21b6" font-size="13" font-weight="700" text-anchor="middle">整理</text>
|
||||
<line x1="615" y1="90" x2="685" y2="90" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="615" y="108" fill="#5b21b6" font-size="10">文件 ≥ 10 触发</text>
|
||||
<text x="615" y="124" fill="#5b21b6" font-size="10">去重·合并·剪枝</text>
|
||||
<text x="615" y="134" fill="#a78bfa" font-size="9">CC: 三层门控</text>
|
||||
|
||||
<!-- Memory Files -->
|
||||
<rect x="40" y="180" width="660" height="36" rx="6" fill="#f8fafc" stroke="#94a3b8" stroke-width="1" stroke-dasharray="4,2"/>
|
||||
<text x="370" y="203" fill="#475569" font-size="11" text-anchor="middle">.memory/ — MEMORY.md 索引 + *.md 文件(YAML frontmatter: name / description / type)</text>
|
||||
|
||||
<!-- Arrow: 存储 → Memory Files -->
|
||||
<path d="M 112 138 L 112 174 Q 112 180 118 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="120" y="162" fill="#7c3aed" font-size="9">写入/读取</text>
|
||||
|
||||
<!-- Arrow: Extraction → Memory Files -->
|
||||
<path d="M 522 138 L 522 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="536" y="164" fill="#7c3aed" font-size="9">写入</text>
|
||||
|
||||
<!-- Arrow: 整理 → Memory Files -->
|
||||
<path d="M 650 138 L 650 180" fill="none" stroke="#7c3aed" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="662" y="164" fill="#7c3aed" font-size="9">覆写</text>
|
||||
|
||||
<!-- 四类记忆 -->
|
||||
<rect x="40" y="240" width="660" height="40" rx="6" fill="#faf5ff" stroke="#c4b5fd" stroke-width="0.5"/>
|
||||
<text x="60" y="260" fill="#5b21b6" font-size="10" font-weight="600">四类记忆:</text>
|
||||
<text x="140" y="260" fill="#475569" font-size="10">user(你是谁)· feedback(怎么做事)· project(正在发生什么)· reference(东西在哪找)</text>
|
||||
|
||||
<!-- CC 源码对照 -->
|
||||
<rect x="40" y="296" width="660" height="72" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<text x="60" y="316" fill="#5b21b6" font-size="11" font-weight="600">CC 源码对照</text>
|
||||
<text x="60" y="334" fill="#475569" font-size="10">• 记忆选择:LLM side-query(Sonnet 选),不是 embedding 向量相似度</text>
|
||||
<text x="60" y="350" fill="#475569" font-size="10">• 提取时机:stop hook 中触发(每轮结束后),不是 autoCompact 后</text>
|
||||
<text x="60" y="366" fill="#475569" font-size="10">• Dream 整理:三层门控(时间 ≥ 24h + 会话 ≥ 5 + 文件锁),不是简单计数</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 5.2 KiB |
254
s10_system_prompt/README.en.md
Normal file
@@ -0,0 +1,254 @@
|
||||
# s10: System Prompt — Assembled at Runtime, Never Hardcoded
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s08 → s09 → `s10` → [s11](../s11_error_recovery/) → s12 → ... → s20
|
||||
> *"prompt is assembled, not hardcoded"* — Sections + on-demand assembly + caching.
|
||||
>
|
||||
> **Harness Layer**: Prompt — assembled at runtime, never hardcoded.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
From s01 to s09, the system prompt was always one hardcoded line:
|
||||
|
||||
```python
|
||||
SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks."
|
||||
```
|
||||
|
||||
That worked for s01 — only bash, read, write. But by s09, the agent has memory, compression, skill loading. The prompt needs to describe more and more capabilities:
|
||||
|
||||
```python
|
||||
SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Use tools to solve tasks. Act, don't explain. "
|
||||
"Before starting any multi-step task, use todo_write. "
|
||||
"Skills are available via list_skills and load_skill. "
|
||||
"Relevant memories are injected below when available. "
|
||||
# ... add a capability, add a line
|
||||
)
|
||||
```
|
||||
|
||||
Three problems:
|
||||
|
||||
1. **Switching projects requires rewriting the entire prompt** — no way to know what to change and what to keep
|
||||
2. **One change can break others** — adding a tool description might conflict with earlier instructions
|
||||
3. **Every request carries everything** — even when the current conversation doesn't need certain sections, they waste tokens
|
||||
|
||||
The system prompt should be a configuration assembled at runtime based on current state: which tools are enabled, which context is visible, which memories are relevant, and which content must remain stable to hit prompt cache.
|
||||
|
||||
---
|
||||
|
||||
## The Solution
|
||||
|
||||

|
||||
|
||||
s10 focuses on prompt assembly. It builds on the s08-s09 capabilities but doesn't re-implement compression or memory. The core change: split the hardcoded `SYSTEM` into independent sections, assemble them at runtime based on real state, and cache the result.
|
||||
|
||||
Four sections, two loading strategies:
|
||||
|
||||
| Section | Strategy | Content | Condition |
|
||||
|---------|----------|---------|-----------|
|
||||
| identity | always | who you are, how to work | always present |
|
||||
| tools | always | available tool list | `enabled_tools` |
|
||||
| workspace | always | working directory | always present |
|
||||
| memory | on-demand | relevant memory content | whether `.memory/MEMORY.md` exists |
|
||||
|
||||
Key design: whether a section loads depends on real state (tools exist, files exist), not keywords in messages.
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
### PROMPT_SECTIONS: Topic-Keyed Fragments
|
||||
|
||||
Split the monolithic string into a dictionary, each key is a topic:
|
||||
|
||||
```python
|
||||
PROMPT_SECTIONS = {
|
||||
"identity": "You are a coding agent. Act, don't explain.",
|
||||
"tools": "Available tools: bash, read_file, write_file.",
|
||||
"workspace": f"Working directory: {WORKDIR}",
|
||||
"memory": "Relevant memories are injected below when available.",
|
||||
}
|
||||
```
|
||||
|
||||
Each section is maintained independently. Changing `tools` doesn't affect `identity`; adding `memory` doesn't touch `workspace`.
|
||||
|
||||
### assemble_system_prompt: On-Demand Assembly
|
||||
|
||||
Not every section is needed every turn. No memory files? Loading the memory section just wastes tokens. Assembly is based on real state in context:
|
||||
|
||||
```python
|
||||
def assemble_system_prompt(context: dict) -> str:
|
||||
sections = []
|
||||
|
||||
# Always loaded
|
||||
sections.append(PROMPT_SECTIONS["identity"])
|
||||
sections.append(PROMPT_SECTIONS["tools"])
|
||||
sections.append(PROMPT_SECTIONS["workspace"])
|
||||
|
||||
# On-demand — based on real state, not keywords
|
||||
memories = context.get("memories", "")
|
||||
if memories:
|
||||
sections.append(f"Relevant memories:\n{memories}")
|
||||
|
||||
return "\n\n".join(sections)
|
||||
```
|
||||
|
||||
"Always loaded" sections are needed every turn: identity, tools, workspace. "On-demand" sections are only useful under specific conditions.
|
||||
|
||||
Why not load everything? Tokens have cost (system prompt is billed every turn), and fewer instructions means more focused output (irrelevant instructions are noise).
|
||||
|
||||
### get_system_prompt: Cache to Avoid Re-Assembly
|
||||
|
||||
When context hasn't changed (multiple LLM calls in the same turn with the same context), re-assembling is wasteful. Use deterministic serialization to detect changes and return cached result:
|
||||
|
||||
```python
|
||||
def get_system_prompt(context: dict) -> str:
|
||||
global _last_context_key, _last_prompt
|
||||
key = json.dumps(context, sort_keys=True, ensure_ascii=False, default=str)
|
||||
if key == _last_context_key and _last_prompt:
|
||||
return _last_prompt
|
||||
_last_context_key = key
|
||||
_last_prompt = assemble_system_prompt(context)
|
||||
return _last_prompt
|
||||
```
|
||||
|
||||
`json.dumps` instead of `hash()`: Python's built-in `hash()` has process randomization (unsuitable for stable cache keys) and throws `unhashable type` on nested dicts/lists.
|
||||
|
||||
Note: this cache only avoids redundant string assembly within a process. It's not the same as CC's API prompt cache, which uses `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` to separate static and dynamic parts — the static parts hit global cache and don't invalidate when dynamic content changes.
|
||||
|
||||
### context: Real State, Not Keyword Guessing
|
||||
|
||||
Context reflects the actual runtime state:
|
||||
|
||||
```python
|
||||
def update_context(context: dict, messages: list) -> dict:
|
||||
memories = ""
|
||||
if MEMORY_INDEX.exists():
|
||||
content = MEMORY_INDEX.read_text().strip()
|
||||
if content:
|
||||
memories = content
|
||||
return {
|
||||
"enabled_tools": list(TOOL_HANDLERS.keys()),
|
||||
"workspace": str(WORKDIR),
|
||||
"memories": memories,
|
||||
}
|
||||
```
|
||||
|
||||
`enabled_tools` lists actually registered tools. `memories` checks whether `.memory/MEMORY.md` exists. Section loading is based on this real state, not searching for keywords in messages.
|
||||
|
||||
### Putting It Together
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list, context: dict):
|
||||
system = get_system_prompt(context)
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=system, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000)
|
||||
# ... tool execution ...
|
||||
context = update_context(context, messages)
|
||||
system = get_system_prompt(context)
|
||||
```
|
||||
|
||||
At the start of each loop iteration, get the system prompt. If context changed, re-assemble; if not, return cached version.
|
||||
|
||||
---
|
||||
|
||||
## Changes From s09
|
||||
|
||||
| Component | Before (s09) | After (s10) |
|
||||
|-----------|-------------|-------------|
|
||||
| prompt | Hardcoded SYSTEM string | PROMPT_SECTIONS + assemble_system_prompt |
|
||||
| caching | None | get_system_prompt (json.dumps detection + cache) |
|
||||
| new functions | — | assemble_system_prompt, get_system_prompt, update_context |
|
||||
| tools | bash, read_file, write_file (3) | bash, read_file, write_file (3) — unchanged |
|
||||
| loop | Uses fixed SYSTEM | Uses get_system_prompt(context) |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s10_system_prompt/code.py
|
||||
```
|
||||
|
||||
What to watch for:
|
||||
|
||||
1. Output shows which sections were loaded (`[assembled] sections: ...` label)
|
||||
2. Cache hits show `[cache hit]` during continued conversation
|
||||
3. Creating `.memory/MEMORY.md` makes the memory section appear on the next turn
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. `Read the file README.md` (observe the three always-loaded sections)
|
||||
2. `Create a file called .memory/MEMORY.md with content "- [test](test.md) — test memory"` (write a memory index)
|
||||
3. `Read the file code.py` (observe whether the memory section appears)
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
System prompts can now be assembled at runtime. But the agent still crashes on errors. Network hiccups, API rate limits, truncated output, context overflow — these aren't bugs, they're normal.
|
||||
|
||||
s11 Error Recovery → four recovery paths. Upgrade tokens, compress context, exponential backoff, switch models.
|
||||
|
||||
<details>
|
||||
<summary>Deep Dive Into CC Source Code</summary>
|
||||
|
||||
> The following is based on analysis of CC source code `constants/prompts.ts` (914 lines), `constants/systemPromptSections.ts` (68 lines), `context.ts` (189 lines), `utils/api.ts` (718 lines), `utils/systemPrompt.ts` (123 lines), and `bootstrap/state.ts`.
|
||||
|
||||
### How many sections does CC's system prompt have?
|
||||
|
||||
The count varies based on feature flags, output style, KAIROS/Proactive mode, user type, token budget, etc. Roughly two categories:
|
||||
|
||||
**Static sections** (always loaded): identity, system, doing_tasks, actions, using_tools, tone_style, output_efficiency, etc.
|
||||
|
||||
**Dynamic sections** (loaded by state): session_guidance, memory, ant_model_override, env_info_simple, language, output_style, mcp_instructions, scratchpad, frc, summarize_tool_results, numeric_length_anchors, token_budget, brief, etc.
|
||||
|
||||
`mcp_instructions` is the only volatile section (created via `DANGEROUS_uncachedSystemPromptSection()`), because MCP servers can connect and disconnect between turns.
|
||||
|
||||
### Assembly Function
|
||||
|
||||
```typescript
|
||||
getSystemPrompt(tools, model, additionalWorkingDirs?, mcpClients?): Promise<string[]>
|
||||
```
|
||||
|
||||
Returns `string[]` (each element is a section), separated by `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` between static and dynamic parts.
|
||||
|
||||
### cache scope
|
||||
|
||||
When global cache boundary is enabled, static sections are merged into one global cache block, and dynamic sections don't use global cache (`cacheScope: null`). Only paths without boundary or skipping global cache fall back to org scope.
|
||||
|
||||
The teaching version's cache only avoids redundant string assembly. CC's three-layer cache:
|
||||
|
||||
1. **lodash memoize**: `getSystemContext` and `getUserContext` cached per session (`context.ts`)
|
||||
2. **Section registry cache**: `STATE.systemPromptSectionCache` caches dynamic section results, cleared on `/clear` or `/compact`
|
||||
3. **API-level cache**: `splitSysPromptPrefix()` (`api.ts`) splits prompt into blocks with different cache scopes via boundary
|
||||
|
||||
### getUserContext vs getSystemContext
|
||||
|
||||
| | getSystemContext | getUserContext |
|
||||
|---|---|---|
|
||||
| Content | gitStatus, cacheBreaker | CLAUDE.md content, currentDate |
|
||||
| Injection | appended to system prompt array | prepended as `<system-reminder>` user message |
|
||||
| When skipped | custom system prompt | always runs |
|
||||
|
||||
### How modes change the prompt
|
||||
|
||||
- **CLAUDE_CODE_SIMPLE**: entire prompt is 2 lines
|
||||
- **Proactive/KAIROS**: compact prompt replaces all standard sections
|
||||
- **Coordinator**: coordinator-specific prompt fully replaces default
|
||||
- **Agent mode**: agent-defined prompt replaces or appends to default
|
||||
|
||||
### Total size
|
||||
|
||||
Standard interactive mode system prompt core is ~20-30KB text. CLAUDE_CODE_SIMPLE is ~150 characters. User context (CLAUDE.md) and system context (git status) add on top.
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
254
s10_system_prompt/README.ja.md
Normal file
@@ -0,0 +1,254 @@
|
||||
# s10: System Prompt — 実行時アセンブリ、ハードコードなし
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s08 → s09 → `s10` → [s11](../s11_error_recovery/) → s12 → ... → s20
|
||||
> *"prompt は組み立てるもの、固定するものではない"* — セグメント + オンデマンド結合 + キャッシュ。
|
||||
>
|
||||
> **Harness レイヤー**: プロンプト — 実行時組み立て、ハードコードなし。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
s01 から s09 まで、system prompt は常に 1 行のハードコード:
|
||||
|
||||
```python
|
||||
SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks."
|
||||
```
|
||||
|
||||
s01 では十分だった。bash、read、write の 3 ツールのみ。しかし s09 では、Agent に記憶、圧縮、スキル読み込みがある。prompt が説明すべき能力が増え続ける:
|
||||
|
||||
```python
|
||||
SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Use tools to solve tasks. Act, don't explain. "
|
||||
"Before starting any multi-step task, use todo_write. "
|
||||
"Skills are available via list_skills and load_skill. "
|
||||
"Relevant memories are injected below when available. "
|
||||
# ... 能力を追加するたびに 1 行増える
|
||||
)
|
||||
```
|
||||
|
||||
3 つの問題:
|
||||
|
||||
1. **プロジェクトを変えるには prompt 全体を書き直す**必要がある。何を変え、何を残すべきか不明
|
||||
2. **一箇所の変更が全体に影響する**。ツール説明を追加すると、前の指示と矛盾する可能性
|
||||
3. **毎回のリクエストが全内容を送信する**。現在の会話で不要なセクションも token を無駄に消費
|
||||
|
||||
System prompt は、実行時の現在状態に基づいて組み立てられる設定であるべき:どのツールが有効か、どのコンテキストが可視か、どの記憶が関連するか、どの内容を prompt cache に命中させるために安定させるべきか。
|
||||
|
||||
---
|
||||
|
||||
## ソリューション
|
||||
|
||||

|
||||
|
||||
s10 は prompt アセンブリ機構に焦点を当てる。s08-s09 の能力を背景とするが、圧縮や記憶システムは再実装しない。核心の変更:ハードコードされた `SYSTEM` を独立セクションに分割し、実行時に実際の状態に基づいてオンデマンドで組み立て、結果をキャッシュして再組み立てを回避。
|
||||
|
||||
4 つのセクション、2 つの読み込み戦略:
|
||||
|
||||
| セクション | 戦略 | 内容 | 判断基準 |
|
||||
|-----------|------|------|---------|
|
||||
| identity | 常に | あなたは誰か、どう作業するか | 常に存在 |
|
||||
| tools | 常に | 利用可能ツール一覧 | `enabled_tools` |
|
||||
| workspace | 常に | 作業ディレクトリ | 常に存在 |
|
||||
| memory | オンデマンド | 関連記憶内容 | `.memory/MEMORY.md` が存在するか |
|
||||
|
||||
重要な設計:セクションをロードするかどうかは実際の状態(ツールが存在するか、ファイルが存在するか)で決まり、メッセージ内のキーワードではない。
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||
### PROMPT_SECTIONS: トピック別フラグメント
|
||||
|
||||
単一の文字列を辞書に分割、各キーがトピック:
|
||||
|
||||
```python
|
||||
PROMPT_SECTIONS = {
|
||||
"identity": "You are a coding agent. Act, don't explain.",
|
||||
"tools": "Available tools: bash, read_file, write_file.",
|
||||
"workspace": f"Working directory: {WORKDIR}",
|
||||
"memory": "Relevant memories are injected below when available.",
|
||||
}
|
||||
```
|
||||
|
||||
各セクションは独立して管理。`tools` を変更しても `identity` に影響しない。`memory` を追加しても `workspace` はそのまま。
|
||||
|
||||
### assemble_system_prompt: オンデマンド組み立て
|
||||
|
||||
すべてのセクションが毎ターン必要なわけではない。記憶ファイルがなければ、memory セクションをロードしても token の無駄。context の実際の状態に基づいて組み立てる:
|
||||
|
||||
```python
|
||||
def assemble_system_prompt(context: dict) -> str:
|
||||
sections = []
|
||||
|
||||
# 常にロード
|
||||
sections.append(PROMPT_SECTIONS["identity"])
|
||||
sections.append(PROMPT_SECTIONS["tools"])
|
||||
sections.append(PROMPT_SECTIONS["workspace"])
|
||||
|
||||
# オンデマンド — 実際の状態に基づく、キーワードではない
|
||||
memories = context.get("memories", "")
|
||||
if memories:
|
||||
sections.append(f"Relevant memories:\n{memories}")
|
||||
|
||||
return "\n\n".join(sections)
|
||||
```
|
||||
|
||||
「常にロード」は毎ターン必要なもの:アイデンティティ、ツール、作業ディレクトリ。「オンデマンド」は特定条件下でのみ有用。
|
||||
|
||||
なぜ全部ロードしないのか?token にはコストがあり(system prompt は毎ターン課金)、情報が少ないほど LLM は集中する(無関係な指示はノイズ)。
|
||||
|
||||
### get_system_prompt: キャッシュで再組み立てを回避
|
||||
|
||||
コンテキストが変わっていない時(同じターン内で複数の LLM 呼び出し、context が同じ)、再組み立ては無駄。確定的シリアライズで変化を検出し、キャッシュヒット時は即座に返却:
|
||||
|
||||
```python
|
||||
def get_system_prompt(context: dict) -> str:
|
||||
global _last_context_key, _last_prompt
|
||||
key = json.dumps(context, sort_keys=True, ensure_ascii=False, default=str)
|
||||
if key == _last_context_key and _last_prompt:
|
||||
return _last_prompt
|
||||
_last_context_key = key
|
||||
_last_prompt = assemble_system_prompt(context)
|
||||
return _last_prompt
|
||||
```
|
||||
|
||||
`hash()` ではなく `json.dumps` を使用:Python 組み込みの `hash()` にはプロセスランダム化があり(安定したキャッシュキーに不適切)、list/dict で `unhashable type` エラーになる。
|
||||
|
||||
注意:このキャッシュは「プロセス内での文字列再組み立ての回避」のみ。CC の API prompt cache とは別物。CC の prompt cache は `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` で静的/動的部分を分離し、静的部分が global cache に命中する。動的内容が変化しても静的部分は無効化されない。
|
||||
|
||||
### context: 実際の状態、キーワード推測ではない
|
||||
|
||||
context は現在の実行時状態の実際の状態を反映:
|
||||
|
||||
```python
|
||||
def update_context(context: dict, messages: list) -> dict:
|
||||
memories = ""
|
||||
if MEMORY_INDEX.exists():
|
||||
content = MEMORY_INDEX.read_text().strip()
|
||||
if content:
|
||||
memories = content
|
||||
return {
|
||||
"enabled_tools": list(TOOL_HANDLERS.keys()),
|
||||
"workspace": str(WORKDIR),
|
||||
"memories": memories,
|
||||
}
|
||||
```
|
||||
|
||||
`enabled_tools` は実際に登録されたツールを一覧。`memories` は `.memory/MEMORY.md` が存在するかを確認。セクションの読み込みはこの実際の状態に基づき、メッセージ内のキーワード検索ではない。
|
||||
|
||||
### 組み合わせて実行
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list, context: dict):
|
||||
system = get_system_prompt(context)
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=system, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000)
|
||||
# ... ツール実行 ...
|
||||
context = update_context(context, messages)
|
||||
system = get_system_prompt(context)
|
||||
```
|
||||
|
||||
各ループ反復の開始時に system prompt を取得。context が変わっていれば再組み立て、変わっていなければキャッシュを返却。
|
||||
|
||||
---
|
||||
|
||||
## s09 からの変更点
|
||||
|
||||
| コンポーネント | 変更前 (s09) | 変更後 (s10) |
|
||||
|-----------|-------------|-------------|
|
||||
| prompt | ハードコード SYSTEM 文字列 | PROMPT_SECTIONS + assemble_system_prompt |
|
||||
| キャッシュ | なし | get_system_prompt(json.dumps 検出 + キャッシュ) |
|
||||
| 新規関数 | — | assemble_system_prompt, get_system_prompt, update_context |
|
||||
| ツール | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 変更なし |
|
||||
| ループ | 固定 SYSTEM を使用 | get_system_prompt(context) を使用 |
|
||||
|
||||
---
|
||||
|
||||
## 試してみよう
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s10_system_prompt/code.py
|
||||
```
|
||||
|
||||
観察のポイント:
|
||||
|
||||
1. 出力にロードされたセクションが表示される(`[assembled] sections: ...` ラベル)
|
||||
2. 継続会話でキャッシュヒット時は `[cache hit]` と表示
|
||||
3. `.memory/MEMORY.md` を作成すると、次のターンで memory セクションが自動ロード
|
||||
|
||||
以下のプロンプトを試してみてください:
|
||||
|
||||
1. `Read the file README.md`(常にロードされる 3 つのセクションを観察)
|
||||
2. `Create a file called .memory/MEMORY.md with content "- [test](test.md) — test memory"`(記憶インデックスを書き込み)
|
||||
3. `Read the file code.py`(memory セクションが表示されるか観察)
|
||||
|
||||
---
|
||||
|
||||
## 次へ
|
||||
|
||||
System prompt を実行時に組み立てられるようになった。しかし Agent はエラーでまだクラッシュする。ネットワークの不安定性、API レート制限、出力の切り詰め、コンテキスト超過、これらはバグではなく日常。
|
||||
|
||||
s11 Error Recovery → 4 つのリカバリパス。token のアップグレード、コンテキスト圧縮、指数バックオフ、モデル切り替え。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコードの詳細</summary>
|
||||
|
||||
> 以下は CC ソースコード `constants/prompts.ts`(914 行)、`constants/systemPromptSections.ts`(68 行)、`context.ts`(189 行)、`utils/api.ts`(718 行)、`utils/systemPrompt.ts`(123 行)、`bootstrap/state.ts` の分析に基づく。
|
||||
|
||||
### CC の system prompt にはいくつのセクションがあるか?
|
||||
|
||||
数は固定されておらず、feature flag、output style、KAIROS/Proactive モード、ユーザータイプ、token 予算などに影響される。大まかに 2 つのカテゴリ:
|
||||
|
||||
**静的セクション**(常にロード):identity、system、doing_tasks、actions、using_tools、tone_style、output_efficiency など。
|
||||
|
||||
**動的セクション**(状態に応じてロード):session_guidance、memory、ant_model_override、env_info_simple、language、output_style、mcp_instructions、scratchpad、frc、summarize_tool_results、numeric_length_anchors、token_budget、brief など。
|
||||
|
||||
`mcp_instructions` は唯一の揮発性セクション(`DANGEROUS_uncachedSystemPromptSection()` で作成)。MCP server はターン間で接続・切断可能なため。
|
||||
|
||||
### 組み立て関数
|
||||
|
||||
```typescript
|
||||
getSystemPrompt(tools, model, additionalWorkingDirs?, mcpClients?): Promise<string[]>
|
||||
```
|
||||
|
||||
`string[]`(各要素がセクション)を返却。`SYSTEM_PROMPT_DYNAMIC_BOUNDARY` で静的/動的部分を分離。
|
||||
|
||||
### cache scope
|
||||
|
||||
global cache boundary が有効な場合、静的セクションは 1 つの global cache block にマージされ、動的セクションは global cache を使用しない(`cacheScope: null`)。boundary なしまたは global cache をスキップするパスでのみ org scope にフォールバック。
|
||||
|
||||
教学版のキャッシュは文字列の再組み立てを回避するのみ。CC の 3 層キャッシュ:
|
||||
|
||||
1. **lodash memoize**: `getSystemContext` と `getUserContext` がセッション中キャッシュ(`context.ts`)
|
||||
2. **セクション登録キャッシュ**: `STATE.systemPromptSectionCache` が動的セクションの結果をキャッシュ、`/clear` や `/compact` でクリア
|
||||
3. **API レベルキャッシュ**: `splitSysPromptPrefix()`(`api.ts`)が boundary を通じて異なる cache scope のブロックに分割
|
||||
|
||||
### getUserContext vs getSystemContext
|
||||
|
||||
| | getSystemContext | getUserContext |
|
||||
|---|---|---|
|
||||
| 内容 | gitStatus、cacheBreaker | CLAUDE.md 内容、currentDate |
|
||||
| 注入方式 | system prompt 配列に追加 | `<system-reminder>` ユーザーメッセージとして先頭に配置 |
|
||||
| スキップ条件 | カスタム system prompt 時 | 常に実行 |
|
||||
|
||||
### モードによる prompt の変化
|
||||
|
||||
- **CLAUDE_CODE_SIMPLE**: prompt 全体が 2 行のみ
|
||||
- **Proactive/KAIROS**: コンパクト版 prompt が標準セクション全体を置換
|
||||
- **Coordinator**: コーディネータ専用 prompt がデフォルトを完全に置換
|
||||
- **Agent モード**: Agent 定義の prompt がデフォルトを置換または追加
|
||||
|
||||
### 総サイズ
|
||||
|
||||
標準インタラクティブモードの system prompt コアは約 20-30KB テキスト。CLAUDE_CODE_SIMPLE は約 150 文字。ユーザーコンテキスト(CLAUDE.md)とシステムコンテキスト(git status)がこれに加算。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
254
s10_system_prompt/README.md
Normal file
@@ -0,0 +1,254 @@
|
||||
# s10: System Prompt — 运行时组装,不硬编码
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s08 → s09 → `s10` → [s11](../s11_error_recovery/) → s12 → ... → s20
|
||||
> *"prompt 是组装出来的, 不是写死的"* — 分段 + 按需拼接 + 缓存。
|
||||
>
|
||||
> **Harness 层**: 提示 — 运行时组装, 不硬编码。
|
||||
|
||||
---
|
||||
|
||||
## 问题
|
||||
|
||||
从 s01 到 s09,system prompt 都是一行硬编码:
|
||||
|
||||
```python
|
||||
SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks."
|
||||
```
|
||||
|
||||
s01 够用,只有 bash、read、write 三个工具。但到 s09,Agent 已经有记忆、有压缩、有技能加载。prompt 该提的能力越来越多:
|
||||
|
||||
```python
|
||||
SYSTEM = (
|
||||
f"You are a coding agent at {WORKDIR}. "
|
||||
"Use tools to solve tasks. Act, don't explain. "
|
||||
"Before starting any multi-step task, use todo_write. "
|
||||
"Skills are available via list_skills and load_skill. "
|
||||
"Relevant memories are injected below when available. "
|
||||
# ... 加一个能力就多一段
|
||||
)
|
||||
```
|
||||
|
||||
三个问题:
|
||||
|
||||
1. **换项目要重写整个 prompt**,不知道哪些该改、哪些该留
|
||||
2. **修改一处可能影响全局**,加一段工具描述可能跟前面的指令冲突
|
||||
3. **每次请求都带全部内容**,即使当前对话用不到某些段落也浪费 token
|
||||
|
||||
System prompt 应该是运行时根据当前状态组装的配置:哪些工具启用、哪些上下文可见、哪些记忆相关、哪些内容必须保持稳定以命中 prompt cache。
|
||||
|
||||
---
|
||||
|
||||
## 解决方案
|
||||
|
||||

|
||||
|
||||
s10 聚焦 prompt 组装机制。以 s08-s09 的能力为背景,但不重复实现压缩和记忆系统。核心变动:把硬编码的 `SYSTEM` 拆成独立段落(section),运行时根据真实状态按需拼接,缓存结果避免重复组装。
|
||||
|
||||
四个 section,两种加载策略:
|
||||
|
||||
| Section | 加载策略 | 内容 | 判断依据 |
|
||||
|---------|---------|------|---------|
|
||||
| identity | 始终 | 你是谁、怎么做事 | 始终存在 |
|
||||
| tools | 始终 | 可用工具列表 | `enabled_tools` |
|
||||
| workspace | 始终 | 工作目录 | 始终存在 |
|
||||
| memory | 按需 | 相关记忆内容 | `.memory/MEMORY.md` 是否存在 |
|
||||
|
||||
关键设计:section 是否加载取决于真实状态(工具是否存在、文件是否存在),不是消息里的关键词。
|
||||
|
||||
---
|
||||
|
||||
## 工作原理
|
||||
|
||||
### PROMPT_SECTIONS: 分段定义
|
||||
|
||||
把一大段字符串拆成字典,每个 key 是一个主题:
|
||||
|
||||
```python
|
||||
PROMPT_SECTIONS = {
|
||||
"identity": "You are a coding agent. Act, don't explain.",
|
||||
"tools": "Available tools: bash, read_file, write_file.",
|
||||
"workspace": f"Working directory: {WORKDIR}",
|
||||
"memory": "Relevant memories are injected below when available.",
|
||||
}
|
||||
```
|
||||
|
||||
每个 section 独立维护。修改 `tools` 不影响 `identity`,新增 `memory` 不动 `workspace`。
|
||||
|
||||
### assemble_system_prompt: 按需拼接
|
||||
|
||||
不是所有 section 每次都需要。当前没有记忆文件,加载 memory section 只是浪费 token。根据 context 的真实状态决定加载哪些:
|
||||
|
||||
```python
|
||||
def assemble_system_prompt(context: dict) -> str:
|
||||
sections = []
|
||||
|
||||
# 始终加载
|
||||
sections.append(PROMPT_SECTIONS["identity"])
|
||||
sections.append(PROMPT_SECTIONS["tools"])
|
||||
sections.append(PROMPT_SECTIONS["workspace"])
|
||||
|
||||
# 按需加载 — 基于真实状态,不是关键词
|
||||
memories = context.get("memories", "")
|
||||
if memories:
|
||||
sections.append(f"Relevant memories:\n{memories}")
|
||||
|
||||
return "\n\n".join(sections)
|
||||
```
|
||||
|
||||
"始终加载"的是每轮都需要的:身份、工具、工作目录。"按需加载"的只在特定条件下才有用。
|
||||
|
||||
为什么不全加载?token 有成本(system prompt 每轮计费),信息越少 LLM 越专注(无关指令是噪音)。
|
||||
|
||||
### get_system_prompt: 缓存避免重复拼接
|
||||
|
||||
上下文没变时(同一轮对话的多次 LLM 调用,context 相同),重新拼接是浪费。用确定性序列化检测变化,命中缓存直接返回:
|
||||
|
||||
```python
|
||||
def get_system_prompt(context: dict) -> str:
|
||||
global _last_context_key, _last_prompt
|
||||
key = json.dumps(context, sort_keys=True, ensure_ascii=False, default=str)
|
||||
if key == _last_context_key and _last_prompt:
|
||||
return _last_prompt
|
||||
_last_context_key = key
|
||||
_last_prompt = assemble_system_prompt(context)
|
||||
return _last_prompt
|
||||
```
|
||||
|
||||
用 `json.dumps` 而不是 `hash()`:Python 内置 `hash()` 有进程随机化,不适合做稳定 cache key,而且遇到 list/dict 会报 `unhashable type`。
|
||||
|
||||
注意:这里的缓存只是"避免重复拼接字符串",和 CC 的 API prompt cache 不是一回事。CC 的 prompt cache 通过 `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` 分隔静态和动态部分,静态部分命中 global cache,不因动态内容变化而失效。
|
||||
|
||||
### context: 真实状态,不是关键词猜测
|
||||
|
||||
context 反映当前运行态的真实状态:
|
||||
|
||||
```python
|
||||
def update_context(context: dict, messages: list) -> dict:
|
||||
memories = ""
|
||||
if MEMORY_INDEX.exists():
|
||||
content = MEMORY_INDEX.read_text().strip()
|
||||
if content:
|
||||
memories = content
|
||||
return {
|
||||
"enabled_tools": list(TOOL_HANDLERS.keys()),
|
||||
"workspace": str(WORKDIR),
|
||||
"memories": memories,
|
||||
}
|
||||
```
|
||||
|
||||
`enabled_tools` 列出实际注册的工具。`memories` 检查 `.memory/MEMORY.md` 是否存在。section 加载基于这些真实状态,不在消息里搜关键词。
|
||||
|
||||
### 合起来跑
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list, context: dict):
|
||||
system = get_system_prompt(context)
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=system, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000)
|
||||
# ... 工具执行 ...
|
||||
context = update_context(context, messages)
|
||||
system = get_system_prompt(context)
|
||||
```
|
||||
|
||||
每轮循环开头拿一次 system prompt。context 变了就重新组装,没变就返回缓存。
|
||||
|
||||
---
|
||||
|
||||
## 相对 s09 的变更
|
||||
|
||||
| 组件 | 之前 (s09) | 之后 (s10) |
|
||||
|------|-----------|-----------|
|
||||
| prompt | 硬编码 SYSTEM 字符串 | PROMPT_SECTIONS + assemble_system_prompt |
|
||||
| 缓存 | 无 | get_system_prompt(json.dumps 检测 + 缓存) |
|
||||
| 新函数 | — | assemble_system_prompt, get_system_prompt, update_context |
|
||||
| 工具 | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 不变 |
|
||||
| 循环 | 用固定 SYSTEM | 用 get_system_prompt(context) |
|
||||
|
||||
---
|
||||
|
||||
## 试一下
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s10_system_prompt/code.py
|
||||
```
|
||||
|
||||
观察重点:
|
||||
|
||||
1. 输出中能看到哪些 section 被加载了(`[assembled] sections: ...` 标签)
|
||||
2. 连续对话时,缓存命中显示 `[cache hit]`
|
||||
3. 创建 `.memory/MEMORY.md` 文件后,下一轮 memory section 自动加载
|
||||
|
||||
试试这些 prompt:
|
||||
|
||||
1. `Read the file README.md`(观察始终加载的三个 section)
|
||||
2. `Create a file called .memory/MEMORY.md with content "- [test](test.md) — test memory"`(写入记忆索引)
|
||||
3. `Read the file code.py`(观察 memory section 是否出现)
|
||||
|
||||
---
|
||||
|
||||
## 接下来
|
||||
|
||||
System prompt 可以运行时组装了,但 Agent 碰到错误还是会崩。网络抖动、API 限流、输出被截断、上下文超限,这些不是 bug,是常态。
|
||||
|
||||
s11 Error Recovery → 四条恢复路径。升级 token、压缩上下文、指数退避、切换模型。
|
||||
|
||||
<details>
|
||||
<summary>深入 CC 源码</summary>
|
||||
|
||||
> 以下基于 CC 源码 `constants/prompts.ts`(914 行)、`constants/systemPromptSections.ts`(68 行)、`context.ts`(189 行)、`utils/api.ts`(718 行)、`utils/systemPrompt.ts`(123 行)、`bootstrap/state.ts` 的分析。
|
||||
|
||||
### CC 的 system prompt 有多少 section?
|
||||
|
||||
数量不固定,受 feature flag、output style、KAIROS/Proactive 模式、用户类型、token 预算等影响。大致分两类:
|
||||
|
||||
**静态 section**(始终加载):identity、system、doing_tasks、actions、using_tools、tone_style、output_efficiency 等。
|
||||
|
||||
**动态 section**(按状态加载):session_guidance、memory、ant_model_override、env_info_simple、language、output_style、mcp_instructions、scratchpad、frc、summarize_tool_results、numeric_length_anchors、token_budget、brief 等。
|
||||
|
||||
`mcp_instructions` 是唯一的易失性 section(通过 `DANGEROUS_uncachedSystemPromptSection()` 创建),因为 MCP server 可以在轮次间连接和断开。
|
||||
|
||||
### 组装函数
|
||||
|
||||
```typescript
|
||||
getSystemPrompt(tools, model, additionalWorkingDirs?, mcpClients?): Promise<string[]>
|
||||
```
|
||||
|
||||
返回 `string[]`(每个元素是一个 section),由 `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` 分隔静态和动态部分。
|
||||
|
||||
### cache scope
|
||||
|
||||
启用 global cache boundary 时,静态 section 合并成一个 global cache block,动态 section 不使用 global cache(`cacheScope: null`)。没有 boundary 或跳过 global cache 的路径才会走 org scope。
|
||||
|
||||
教学版的缓存只避免重复拼接字符串。CC 的三层缓存:
|
||||
|
||||
1. **lodash memoize**:`getSystemContext` 和 `getUserContext` 在会话中缓存(`context.ts`)
|
||||
2. **section 注册缓存**:`STATE.systemPromptSectionCache` 缓存动态 section 结果,`/clear` 或 `/compact` 时清除
|
||||
3. **API 级缓存**:`splitSysPromptPrefix()`(`api.ts`)把 prompt 按 boundary 分成不同 cache scope 的块
|
||||
|
||||
### getUserContext vs getSystemContext
|
||||
|
||||
| | getSystemContext | getUserContext |
|
||||
|---|---|---|
|
||||
| 内容 | gitStatus、cacheBreaker | CLAUDE.md 内容、currentDate |
|
||||
| 注入方式 | 追加到 system prompt 数组 | 前置为 `<system-reminder>` 用户消息 |
|
||||
| 何时跳过 | 自定义 system prompt 时 | 始终运行 |
|
||||
|
||||
### 模式如何改变 prompt
|
||||
|
||||
- **CLAUDE_CODE_SIMPLE**:整个 prompt 只有 2 行
|
||||
- **Proactive/KAIROS**:用紧凑版 prompt 替换所有标准 section
|
||||
- **Coordinator**:用协调器专用 prompt 完全替换
|
||||
- **Agent 模式**:Agent 定义的 prompt 替换或追加到默认 prompt
|
||||
|
||||
### 总大小
|
||||
|
||||
标准交互模式下 system prompt 核心约 20-30KB 文本。CLAUDE_CODE_SIMPLE 约 150 字符。用户上下文(CLAUDE.md)和系统上下文(git status)在此基础上累加。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
218
s10_system_prompt/code.py
Normal file
@@ -0,0 +1,218 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
s10: System Prompt — Runtime prompt assembly with caching.
|
||||
|
||||
Run: python s10_system_prompt/code.py
|
||||
Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY
|
||||
|
||||
Changes from s09:
|
||||
- PROMPT_SECTIONS: topic-keyed dict of prompt fragments
|
||||
- assemble_system_prompt(context): select + join sections by real state
|
||||
- get_system_prompt(context): deterministic cache via json.dumps
|
||||
- agent_loop uses get_system_prompt(context) instead of hardcoded SYSTEM
|
||||
|
||||
Memory section loads when .memory/MEMORY.md exists (real state, not keywords).
|
||||
"""
|
||||
|
||||
import os, subprocess, json
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import readline
|
||||
readline.parse_and_bind('set bind-tty-special-chars off')
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
if os.getenv("ANTHROPIC_BASE_URL"):
|
||||
os.environ.pop("ANTHROPIC_AUTH_TOKEN", None)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
MEMORY_DIR = WORKDIR / ".memory"
|
||||
MEMORY_INDEX = MEMORY_DIR / "MEMORY.md"
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.environ["MODEL_ID"]
|
||||
|
||||
|
||||
# ── Prompt Sections ──
|
||||
|
||||
PROMPT_SECTIONS = {
|
||||
"identity": "You are a coding agent. Act, don't explain.",
|
||||
"tools": "Available tools: bash, read_file, write_file.",
|
||||
"workspace": f"Working directory: {WORKDIR}",
|
||||
"memory": "Relevant memories are injected below when available.",
|
||||
}
|
||||
|
||||
|
||||
def assemble_system_prompt(context: dict) -> str:
|
||||
"""Select and join prompt sections based on current context."""
|
||||
sections = []
|
||||
|
||||
# Always loaded — identity, tools, workspace
|
||||
sections.append(PROMPT_SECTIONS["identity"])
|
||||
sections.append(PROMPT_SECTIONS["tools"])
|
||||
sections.append(PROMPT_SECTIONS["workspace"])
|
||||
|
||||
# Conditional — memory loaded when MEMORY.md exists and has content
|
||||
memories = context.get("memories", "")
|
||||
if memories:
|
||||
sections.append(f"Relevant memories:\n{memories}")
|
||||
|
||||
return "\n\n".join(sections)
|
||||
|
||||
|
||||
_last_context_key = None
|
||||
_last_prompt = None
|
||||
|
||||
|
||||
def get_system_prompt(context: dict) -> str:
|
||||
"""Cache wrapper — reassemble only when context changes.
|
||||
|
||||
Uses json.dumps for deterministic serialization, not Python's hash()
|
||||
which has process randomization and fails on nested dicts/lists.
|
||||
This cache only avoids redundant string assembly within a process.
|
||||
Real Claude Code additionally protects API-level prompt cache via
|
||||
stable section ordering and SYSTEM_PROMPT_DYNAMIC_BOUNDARY.
|
||||
"""
|
||||
global _last_context_key, _last_prompt
|
||||
key = json.dumps(context, sort_keys=True, ensure_ascii=False, default=str)
|
||||
if key == _last_context_key and _last_prompt:
|
||||
print(" \033[90m[cache hit] system prompt unchanged\033[0m")
|
||||
return _last_prompt
|
||||
_last_context_key = key
|
||||
_last_prompt = assemble_system_prompt(context)
|
||||
|
||||
loaded = ["identity", "tools", "workspace"]
|
||||
if context.get("memories"):
|
||||
loaded.append("memory")
|
||||
print(f" \033[32m[assembled] sections: {', '.join(loaded)}\033[0m")
|
||||
return _last_prompt
|
||||
|
||||
|
||||
# ── Tools ──
|
||||
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
|
||||
def run_bash(command: str) -> str:
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=120)
|
||||
out = (r.stdout + r.stderr).strip()
|
||||
return out[:50000] if out else "(no output)"
|
||||
except subprocess.TimeoutExpired:
|
||||
return "Error: Timeout (120s)"
|
||||
|
||||
|
||||
def run_read(path: str, limit: int | None = None) -> str:
|
||||
try:
|
||||
lines = safe_path(path).read_text().splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"]
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
def run_write(path: str, content: str) -> str:
|
||||
try:
|
||||
file_path = safe_path(path)
|
||||
file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
file_path.write_text(content)
|
||||
return f"Wrote {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
|
||||
TOOLS = [
|
||||
{"name": "bash", "description": "Run a shell command.",
|
||||
"input_schema": {"type": "object",
|
||||
"properties": {"command": {"type": "string"}},
|
||||
"required": ["command"]}},
|
||||
{"name": "read_file", "description": "Read file contents.",
|
||||
"input_schema": {"type": "object",
|
||||
"properties": {"path": {"type": "string"},
|
||||
"limit": {"type": "integer"}},
|
||||
"required": ["path"]}},
|
||||
{"name": "write_file", "description": "Write content to a file.",
|
||||
"input_schema": {"type": "object",
|
||||
"properties": {"path": {"type": "string"},
|
||||
"content": {"type": "string"}},
|
||||
"required": ["path", "content"]}},
|
||||
]
|
||||
|
||||
TOOL_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write}
|
||||
|
||||
|
||||
# ── Context ──
|
||||
|
||||
def update_context(context: dict, messages: list) -> dict:
|
||||
"""Derive context from real state: which tools exist, whether memory files exist."""
|
||||
memories = ""
|
||||
if MEMORY_INDEX.exists():
|
||||
content = MEMORY_INDEX.read_text().strip()
|
||||
if content:
|
||||
memories = content
|
||||
return {
|
||||
"enabled_tools": list(TOOL_HANDLERS.keys()),
|
||||
"workspace": str(WORKDIR),
|
||||
"memories": memories,
|
||||
}
|
||||
|
||||
|
||||
# ── Agent Loop ──
|
||||
|
||||
def agent_loop(messages: list, context: dict):
|
||||
"""Main loop — uses assembled system prompt instead of hardcoded SYSTEM."""
|
||||
system = get_system_prompt(context)
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=system, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type != "tool_use":
|
||||
continue
|
||||
print(f"\033[36m> {block.name}\033[0m")
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler else f"Unknown: {block.name}"
|
||||
print(str(output)[:200])
|
||||
results.append({"type": "tool_result",
|
||||
"tool_use_id": block.id, "content": output})
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
# Re-evaluate context and prompt after each tool round
|
||||
context = update_context(context, messages)
|
||||
system = get_system_prompt(context)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("s10: system prompt — runtime assembly")
|
||||
print("Enter a question, press Enter to send. Type q to quit.\n")
|
||||
history = []
|
||||
context = update_context({}, [])
|
||||
while True:
|
||||
try:
|
||||
query = input("\033[36ms10 >> \033[0m")
|
||||
except (EOFError, KeyboardInterrupt):
|
||||
break
|
||||
if query.strip().lower() in ("q", "exit", ""):
|
||||
break
|
||||
history.append({"role": "user", "content": query})
|
||||
agent_loop(history, context)
|
||||
context = update_context(context, history)
|
||||
for block in history[-1]["content"]:
|
||||
if getattr(block, "type", None) == "text":
|
||||
print(block.text)
|
||||
print()
|
||||
107
s10_system_prompt/images/system-prompt-overview.en.svg
Normal file
@@ -0,0 +1,107 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 420" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#059669"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#059669"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="420" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">System Prompt — PROMPT_SECTIONS + On-Demand Assembly + Cache</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s09 Preserved</text>
|
||||
<rect x="160" y="56" width="12" height="10" rx="2" fill="#ecfdf5" stroke="#059669" stroke-width="1"/>
|
||||
<text x="178" y="66" fill="#059669" font-size="10" font-weight="600">s10 New</text>
|
||||
|
||||
<!-- ===== Prompt Assembly (green, s10) ===== -->
|
||||
|
||||
<!-- PROMPT_SECTIONS -->
|
||||
<rect x="40" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="125" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">PROMPT_SECTIONS</text>
|
||||
<text x="55" y="116" fill="#065f46" font-size="9">✓ identity (always)</text>
|
||||
<text x="55" y="130" fill="#065f46" font-size="9">✓ tools (always)</text>
|
||||
<text x="55" y="144" fill="#065f46" font-size="9">✓ workspace (always)</text>
|
||||
<text x="55" y="158" fill="#6b7280" font-size="9">○ memory</text>
|
||||
|
||||
<!-- arrow → assemble -->
|
||||
<line x1="210" y1="126" x2="235" y2="126" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
|
||||
<!-- assemble_system_prompt -->
|
||||
<rect x="238" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="323" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">assemble_system_prompt</text>
|
||||
<text x="253" y="118" fill="#065f46" font-size="9">Input: context dict</text>
|
||||
<text x="253" y="132" fill="#065f46" font-size="9">Always: identity + tools + workspace</text>
|
||||
<text x="253" y="146" fill="#065f46" font-size="9">On-demand: memory</text>
|
||||
<text x="253" y="160" fill="#6b7280" font-size="9">Output: "\n\n".join(selected)</text>
|
||||
|
||||
<!-- arrow → cache -->
|
||||
<line x1="408" y1="126" x2="433" y2="126" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
|
||||
<!-- get_system_prompt -->
|
||||
<rect x="436" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="521" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">get_system_prompt</text>
|
||||
<text x="451" y="118" fill="#065f46" font-size="9">json.dumps(context)</text>
|
||||
<text x="451" y="132" fill="#065f46" font-size="9">Hit → return cached</text>
|
||||
<text x="451" y="146" fill="#065f46" font-size="9">Miss → assemble + store</text>
|
||||
<text x="451" y="160" fill="#6b7280" font-size="9">(s10 new)</text>
|
||||
|
||||
<!-- Arrow: cache → LLM -->
|
||||
<path d="M 521 170 L 521 195 L 410 195 L 410 212" fill="none" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="462" y="189" fill="#059669" font-size="9">system=get_system_prompt(context)</text>
|
||||
|
||||
<!-- ===== s09 Agent Loop (blue) ===== -->
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="30" y="214" width="100" height="46" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="80" y="241" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- arrow → compression+loading -->
|
||||
<line x1="130" y1="237" x2="155" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- compression + loading -->
|
||||
<rect x="158" y="206" width="170" height="62" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="243" y="228" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">Compression + Loading</text>
|
||||
<text x="243" y="242" fill="#64748b" font-size="9" text-anchor="middle">snip → micro → budget → auto</text>
|
||||
<text x="243" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">→ load memory (s09)</text>
|
||||
|
||||
<!-- arrow → LLM -->
|
||||
<line x1="328" y1="237" x2="358" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="360" y="214" width="100" height="46" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="410" y="231" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="410" y="246" fill="#64748b" font-size="8" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
<text x="410" y="258" fill="#059669" font-size="8" text-anchor="middle">system assembled</text>
|
||||
|
||||
<!-- arrow → TOOLS -->
|
||||
<line x1="460" y1="237" x2="490" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="466" y="229" fill="#64748b" font-size="8">yes</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="493" y="206" width="130" height="62" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="558" y="228" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="558" y="242" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="558" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">(s09 preserved)</text>
|
||||
|
||||
<!-- ===== Loop back ===== -->
|
||||
<path d="M 623 237 L 660 237 L 660 312 L 80 312 L 80 260" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="370" y="328" fill="#64748b" font-size="10" text-anchor="middle">Tool results → messages[] → compress → load memory → assemble prompt → LLM</text>
|
||||
|
||||
<!-- ===== Bottom notes ===== -->
|
||||
<rect x="40" y="350" width="680" height="56" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<rect x="60" y="362" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="80" y="372" fill="#475569" font-size="10">s09 Preserved: loop, compression pipeline, memory loading, tool execution</text>
|
||||
<rect x="60" y="382" width="12" height="10" rx="2" fill="#ecfdf5" stroke="#059669" stroke-width="1"/>
|
||||
<text x="80" y="392" fill="#475569" font-size="10">s10 New: PROMPT_SECTIONS (4 sections) + assemble_system_prompt + get_system_prompt (cache)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.8 KiB |
107
s10_system_prompt/images/system-prompt-overview.ja.svg
Normal file
@@ -0,0 +1,107 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 420" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#059669"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#059669"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="420" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- タイトル -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="14" font-weight="700" text-anchor="middle">System Prompt — PROMPT_SECTIONS + オンデマンド組み立て + キャッシュ</text>
|
||||
|
||||
<!-- 凡例 -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s09 保持</text>
|
||||
<rect x="140" y="56" width="12" height="10" rx="2" fill="#ecfdf5" stroke="#059669" stroke-width="1"/>
|
||||
<text x="158" y="66" fill="#059669" font-size="10" font-weight="600">s10 新規</text>
|
||||
|
||||
<!-- ===== プロンプトアセンブリ(緑、s10) ===== -->
|
||||
|
||||
<!-- PROMPT_SECTIONS -->
|
||||
<rect x="40" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="125" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">PROMPT_SECTIONS</text>
|
||||
<text x="55" y="116" fill="#065f46" font-size="9">✓ identity (常時)</text>
|
||||
<text x="55" y="130" fill="#065f46" font-size="9">✓ tools (常時)</text>
|
||||
<text x="55" y="144" fill="#065f46" font-size="9">✓ workspace (常時)</text>
|
||||
<text x="55" y="158" fill="#6b7280" font-size="9">○ memory</text>
|
||||
|
||||
<!-- 矢印 → assemble -->
|
||||
<line x1="210" y1="126" x2="235" y2="126" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
|
||||
<!-- assemble_system_prompt -->
|
||||
<rect x="238" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="323" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">assemble_system_prompt</text>
|
||||
<text x="253" y="118" fill="#065f46" font-size="9">入力: context dict</text>
|
||||
<text x="253" y="132" fill="#065f46" font-size="9">常時: identity + tools + workspace</text>
|
||||
<text x="253" y="146" fill="#065f46" font-size="9">オンデマンド: memory</text>
|
||||
<text x="253" y="160" fill="#6b7280" font-size="9">出力: "\n\n".join(selected)</text>
|
||||
|
||||
<!-- 矢印 → cache -->
|
||||
<line x1="408" y1="126" x2="433" y2="126" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
|
||||
<!-- get_system_prompt -->
|
||||
<rect x="436" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="521" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">get_system_prompt</text>
|
||||
<text x="451" y="118" fill="#065f46" font-size="9">json.dumps(context)</text>
|
||||
<text x="451" y="132" fill="#065f46" font-size="9">ヒット → キャッシュ返却</text>
|
||||
<text x="451" y="146" fill="#065f46" font-size="9">ミス → assemble + 保存</text>
|
||||
<text x="451" y="160" fill="#6b7280" font-size="9">(s10 新規)</text>
|
||||
|
||||
<!-- 矢印: cache → LLM -->
|
||||
<path d="M 521 170 L 521 195 L 410 195 L 410 212" fill="none" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="462" y="189" fill="#059669" font-size="9">system=get_system_prompt(context)</text>
|
||||
|
||||
<!-- ===== s09 Agent Loop(青) ===== -->
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="30" y="214" width="100" height="46" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="80" y="241" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- 矢印 → compression+loading -->
|
||||
<line x1="130" y1="237" x2="155" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- 圧縮 + ロード -->
|
||||
<rect x="158" y="206" width="170" height="62" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="243" y="228" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">圧縮 + ロード</text>
|
||||
<text x="243" y="242" fill="#64748b" font-size="9" text-anchor="middle">snip → micro → budget → auto</text>
|
||||
<text x="243" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">→ 記憶ロード (s09)</text>
|
||||
|
||||
<!-- 矢印 → LLM -->
|
||||
<line x1="328" y1="237" x2="358" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="360" y="214" width="100" height="46" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="410" y="231" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="410" y="246" fill="#64748b" font-size="8" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
<text x="410" y="258" fill="#059669" font-size="8" text-anchor="middle">system assembled</text>
|
||||
|
||||
<!-- 矢印 → TOOLS -->
|
||||
<line x1="460" y1="237" x2="490" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="466" y="229" fill="#64748b" font-size="8">あり</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="493" y="206" width="130" height="62" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="558" y="228" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="558" y="242" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="558" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">(s09 保持)</text>
|
||||
|
||||
<!-- ===== ループバック ===== -->
|
||||
<path d="M 623 237 L 660 237 L 660 312 L 80 312 L 80 260" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="370" y="328" fill="#64748b" font-size="10" text-anchor="middle">ツール結果 → messages[] → 圧縮 → 記憶ロード → プロンプト組み立て → LLM</text>
|
||||
|
||||
<!-- ===== 下部ノート ===== -->
|
||||
<rect x="40" y="350" width="680" height="56" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<rect x="60" y="362" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="80" y="372" fill="#475569" font-size="10">s09 保持:ループ、圧縮パイプライン、記憶ロード、ツール実行</text>
|
||||
<rect x="60" y="382" width="12" height="10" rx="2" fill="#ecfdf5" stroke="#059669" stroke-width="1"/>
|
||||
<text x="80" y="392" fill="#475569" font-size="10">s10 新規:PROMPT_SECTIONS(4 セクション)+ assemble_system_prompt + get_system_prompt(キャッシュ)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.9 KiB |
107
s10_system_prompt/images/system-prompt-overview.svg
Normal file
@@ -0,0 +1,107 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 760 420" font-family="system-ui, -apple-system, sans-serif">
|
||||
<defs>
|
||||
<linearGradient id="header" x1="0" y1="0" x2="1" y2="0">
|
||||
<stop offset="0%" stop-color="#1e3a5f"/><stop offset="100%" stop-color="#059669"/>
|
||||
</linearGradient>
|
||||
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
|
||||
</marker>
|
||||
<marker id="arrow-green" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
||||
<path d="M 0 0 L 10 5 L 0 10 z" fill="#059669"/>
|
||||
</marker>
|
||||
</defs>
|
||||
|
||||
<rect width="760" height="420" fill="#fafbfc" rx="8"/>
|
||||
|
||||
<!-- Title -->
|
||||
<rect x="0" y="0" width="760" height="44" fill="url(#header)" rx="8"/>
|
||||
<rect x="0" y="36" width="760" height="8" fill="url(#header)"/>
|
||||
<text x="380" y="28" fill="#fff" font-size="15" font-weight="700" text-anchor="middle">System Prompt — PROMPT_SECTIONS + 按需拼接 + 缓存</text>
|
||||
|
||||
<!-- Legend -->
|
||||
<rect x="40" y="56" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="58" y="66" fill="#2563eb" font-size="10" font-weight="600">s09 保留</text>
|
||||
<rect x="140" y="56" width="12" height="10" rx="2" fill="#ecfdf5" stroke="#059669" stroke-width="1"/>
|
||||
<text x="158" y="66" fill="#059669" font-size="10" font-weight="600">s10 新增</text>
|
||||
|
||||
<!-- ===== Prompt Assembly (green, s10) ===== -->
|
||||
|
||||
<!-- PROMPT_SECTIONS -->
|
||||
<rect x="40" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="125" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">PROMPT_SECTIONS</text>
|
||||
<text x="55" y="116" fill="#065f46" font-size="9">✓ identity (始终)</text>
|
||||
<text x="55" y="130" fill="#065f46" font-size="9">✓ tools (始终)</text>
|
||||
<text x="55" y="144" fill="#065f46" font-size="9">✓ workspace (始终)</text>
|
||||
<text x="55" y="158" fill="#6b7280" font-size="9">○ memory</text>
|
||||
|
||||
<!-- arrow → assemble -->
|
||||
<line x1="210" y1="126" x2="235" y2="126" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
|
||||
<!-- assemble_system_prompt -->
|
||||
<rect x="238" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="323" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">assemble_system_prompt</text>
|
||||
<text x="253" y="118" fill="#065f46" font-size="9">输入: context dict</text>
|
||||
<text x="253" y="132" fill="#065f46" font-size="9">始终: identity + tools + workspace</text>
|
||||
<text x="253" y="146" fill="#065f46" font-size="9">按需: memory</text>
|
||||
<text x="253" y="160" fill="#6b7280" font-size="9">输出: "\n\n".join(selected)</text>
|
||||
|
||||
<!-- arrow → cache -->
|
||||
<line x1="408" y1="126" x2="433" y2="126" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
|
||||
<!-- get_system_prompt -->
|
||||
<rect x="436" y="82" width="170" height="88" rx="8" fill="#ecfdf5" stroke="#059669" stroke-width="2"/>
|
||||
<text x="521" y="100" fill="#065f46" font-size="11" font-weight="700" text-anchor="middle">get_system_prompt</text>
|
||||
<text x="451" y="118" fill="#065f46" font-size="9">json.dumps(context)</text>
|
||||
<text x="451" y="132" fill="#065f46" font-size="9">命中 → 返回缓存</text>
|
||||
<text x="451" y="146" fill="#065f46" font-size="9">未命中 → assemble + 存</text>
|
||||
<text x="451" y="160" fill="#6b7280" font-size="9">(s10 新增)</text>
|
||||
|
||||
<!-- Arrow: cache → LLM -->
|
||||
<path d="M 521 170 L 521 195 L 410 195 L 410 212" fill="none" stroke="#059669" stroke-width="1.5" marker-end="url(#arrow-green)"/>
|
||||
<text x="462" y="189" fill="#059669" font-size="9">system=get_system_prompt(context)</text>
|
||||
|
||||
<!-- ===== s09 Agent Loop (blue) ===== -->
|
||||
|
||||
<!-- messages[] -->
|
||||
<rect x="30" y="214" width="100" height="46" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="80" y="241" fill="#1e3a5f" font-size="11" font-weight="600" text-anchor="middle">messages[]</text>
|
||||
|
||||
<!-- arrow → compression+loading -->
|
||||
<line x1="130" y1="237" x2="155" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- compression + loading -->
|
||||
<rect x="158" y="206" width="170" height="62" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="243" y="228" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">压缩 + Loading</text>
|
||||
<text x="243" y="242" fill="#64748b" font-size="9" text-anchor="middle">snip → micro → budget → auto</text>
|
||||
<text x="243" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">→ 加载记忆 (s09)</text>
|
||||
|
||||
<!-- arrow → LLM -->
|
||||
<line x1="328" y1="237" x2="358" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
|
||||
<!-- LLM -->
|
||||
<rect x="360" y="214" width="100" height="46" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="410" y="231" fill="#1e3a5f" font-size="12" font-weight="700" text-anchor="middle">LLM</text>
|
||||
<text x="410" y="246" fill="#64748b" font-size="8" text-anchor="middle">stop_reason=tool_use?</text>
|
||||
<text x="410" y="258" fill="#059669" font-size="8" text-anchor="middle">system assembled</text>
|
||||
|
||||
<!-- arrow → TOOLS -->
|
||||
<line x1="460" y1="237" x2="490" y2="237" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)"/>
|
||||
<text x="466" y="229" fill="#64748b" font-size="8">是</text>
|
||||
|
||||
<!-- TOOL_HANDLERS -->
|
||||
<rect x="493" y="206" width="130" height="62" rx="8" fill="#f0f4ff" stroke="#2563eb" stroke-width="1.5"/>
|
||||
<text x="558" y="228" fill="#1e3a5f" font-size="10" font-weight="600" text-anchor="middle">TOOL_HANDLERS</text>
|
||||
<text x="558" y="242" fill="#64748b" font-size="9" text-anchor="middle">bash · read · write</text>
|
||||
<text x="558" y="256" fill="#94a3b8" font-size="8" text-anchor="middle">(s09 保留)</text>
|
||||
|
||||
<!-- ===== Loop back ===== -->
|
||||
<path d="M 623 237 L 660 237 L 660 312 L 80 312 L 80 260" fill="none" stroke="#555" stroke-width="1.5" marker-end="url(#arrow)" stroke-dasharray="6,3"/>
|
||||
<text x="370" y="328" fill="#64748b" font-size="10" text-anchor="middle">工具结果 → messages[] → 压缩 → 加载记忆 → 组装 prompt → LLM</text>
|
||||
|
||||
<!-- ===== Bottom notes ===== -->
|
||||
<rect x="40" y="350" width="680" height="56" rx="6" fill="#f8fafc" stroke="#e2e8f0" stroke-width="1"/>
|
||||
<rect x="60" y="362" width="12" height="10" rx="2" fill="#f0f4ff" stroke="#2563eb" stroke-width="1"/>
|
||||
<text x="80" y="372" fill="#475569" font-size="10">s09 保留:循环、压缩管线、记忆加载、工具执行</text>
|
||||
<rect x="60" y="382" width="12" height="10" rx="2" fill="#ecfdf5" stroke="#059669" stroke-width="1"/>
|
||||
<text x="80" y="392" fill="#475569" font-size="10">s10 新增:PROMPT_SECTIONS(4 段)+ assemble_system_prompt + get_system_prompt(缓存)</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 6.7 KiB |
277
s11_error_recovery/README.en.md
Normal file
@@ -0,0 +1,277 @@
|
||||
# s11: Error Recovery — Errors aren't the end, they're the start of a retry
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s20
|
||||
> *"Errors aren't the end, they're the start of a retry"* — escalate tokens, compact context, switch models.
|
||||
>
|
||||
> **Harness layer**: Resilience — classify and recover when the main loop hits errors.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
The Agent is running along and then errors out:
|
||||
|
||||
```
|
||||
Error: 529 overloaded
|
||||
```
|
||||
|
||||
The Agent crashes. It doesn't retry, doesn't switch models, doesn't reduce context — it just crashes.
|
||||
|
||||
In production, API errors are the norm. The three most common failure modes: **truncated output** (the model runs out of tokens mid-sentence), **context overflow** (still too long even after compaction), and **transient failures** (429 rate limiting / 529 overload). An Agent that doesn't handle errors is like a car that stalls at the slightest touch.
|
||||
|
||||
---
|
||||
|
||||
## Solution
|
||||
|
||||

|
||||
|
||||
The loop and prompt assembly from s10 are fully preserved. The only change: the LLM call is wrapped in try/except, with different recovery paths based on error type. After recovery, `continue` loops back to the top to call the LLM again.
|
||||
|
||||
The three most common recovery patterns (the teaching version only handles 429/529; real systems also cover connection errors, timeouts, cloud vendor credential caches, etc. CC actually has 13+ reason codes; see the Deep Dive for the rest):
|
||||
|
||||
| Pattern | Trigger | Recovery Action |
|
||||
|----------|---------|-----------------|
|
||||
| Output truncated | `max_tokens` | Escalate 8K→64K / continuation prompt |
|
||||
| Context overflow | `prompt_too_long` | Reactive compact → retry |
|
||||
| Transient failure | 429 / 529 | Exponential backoff + jitter, fallback model on consecutive 529 |
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
### Path 1: Output Truncated
|
||||
|
||||
The model runs out of tokens mid-sentence — `max_tokens` is exhausted. The default 8000 tokens isn't enough for a complete response.
|
||||
|
||||
On the first occurrence, escalate `max_tokens` from 8K to 64K (8x the space) and retry the same request — the truncated output is NOT appended to messages, keeping the original request intact. If 64K is still not enough, save the truncated output and inject a continuation prompt telling the model to pick up where it left off, up to 3 times:
|
||||
|
||||
```python
|
||||
if response.stop_reason == "max_tokens":
|
||||
# First escalation: don't append truncated output, retry same request
|
||||
if not state.has_escalated:
|
||||
max_tokens = ESCALATED_MAX_TOKENS
|
||||
state.has_escalated = True
|
||||
continue # messages unchanged, same request with more tokens
|
||||
# 64K still truncated: save output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if state.recovery_count < MAX_RECOVERY_RETRIES:
|
||||
messages.append({"role": "user", "content":
|
||||
"Output token limit hit. Resume directly — "
|
||||
"no apology, no recap. Pick up mid-thought."})
|
||||
state.recovery_count += 1
|
||||
continue
|
||||
return # still truncated after 3 continuations
|
||||
# Normal: append after max_tokens check
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
```
|
||||
|
||||
Escalation gets one chance; continuation gets up to 3. After that, exit — further continuations won't produce meaningful output.
|
||||
|
||||
### Path 2: Context Overflow
|
||||
|
||||
The LLM says "your context is too long" (`prompt_too_long`). All four compaction layers from s08 have already run, and it's still over the limit.
|
||||
|
||||
Trigger reactive compact — more aggressive than auto compact. The teaching version keeps only the last 5 messages to simulate compaction; real CC generates a compact summary via LLM, then retries with the compacted message list. Retry after compacting. But if it's still over the limit after one compaction, the only option is to exit — compacting again won't make it any smaller:
|
||||
|
||||
```python
|
||||
except PromptTooLongError:
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return # Already compacted and still over limit — must exit
|
||||
```
|
||||
|
||||
### Path 3: Transient Failures
|
||||
|
||||
Network blips, 429 rate limiting, 529 overload — these aren't bugs, they're normal in distributed systems.
|
||||
|
||||
Both 429 and 529 use exponential backoff + jitter: wait 0.5 seconds on the first attempt, 1 second on the second, 2 seconds on the third, up to 10 retries. Random jitter prevents concurrent requests from all retrying at the same instant. Three consecutive 529 overload errors → switch to the fallback model (if `FALLBACK_MODEL_ID` environment variable is configured):
|
||||
|
||||
```python
|
||||
def retry_delay(attempt, retry_after=None):
|
||||
if retry_after:
|
||||
return retry_after
|
||||
base = min(500 * (2 ** attempt), 32000) / 1000
|
||||
return base + random.uniform(0, base * 0.25)
|
||||
|
||||
def with_retry(fn, state, max_retries=10):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
return fn()
|
||||
except (RateLimitError, OverloadedError):
|
||||
delay = retry_delay(attempt)
|
||||
time.sleep(delay)
|
||||
if is_overloaded:
|
||||
state.consecutive_529 += 1
|
||||
if state.consecutive_529 >= 3 and FALLBACK_MODEL:
|
||||
state.current_model = FALLBACK_MODEL
|
||||
raise MaxRetriesExceeded()
|
||||
```
|
||||
|
||||
Backoff formula: `min(500 × 2^attempt, 32000) + random(0~25%)`. If the server returns a `Retry-After` header, that value takes priority.
|
||||
|
||||
### Putting It All Together
|
||||
|
||||
```python
|
||||
def agent_loop(messages, context):
|
||||
system = get_system_prompt(context)
|
||||
state = RecoveryState()
|
||||
max_tokens = 8000
|
||||
|
||||
while True:
|
||||
try:
|
||||
response = with_retry(
|
||||
lambda: client.messages.create(
|
||||
model=state.current_model, system=system,
|
||||
messages=messages, tools=TOOLS,
|
||||
max_tokens=max_tokens),
|
||||
state)
|
||||
except Exception as e:
|
||||
if is_prompt_too_long_error(e):
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return
|
||||
log_error(e)
|
||||
return
|
||||
|
||||
# max_tokens check BEFORE appending to messages
|
||||
if response.stop_reason == "max_tokens":
|
||||
if not state.has_escalated:
|
||||
max_tokens = 64000
|
||||
state.has_escalated = True
|
||||
continue # retry same request, messages unchanged
|
||||
# save truncated output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
|
||||
continue
|
||||
# Normal completion
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
# ... tool execution ...
|
||||
```
|
||||
|
||||
The outer try/except catches API exceptions (prompt_too_long, etc.), `with_retry` handles transient errors (429/529), and `stop_reason` checks handle truncation. Three recovery mechanisms, each handling its own error type.
|
||||
|
||||
---
|
||||
|
||||
## Changes from s10
|
||||
|
||||
| Component | Before (s10) | After (s11) |
|
||||
|-----------|-------------|-------------|
|
||||
| Error handling | None (crashes on any error) | Three recovery patterns + exponential backoff |
|
||||
| New constants | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500, FALLBACK_MODEL |
|
||||
| New functions | — | with_retry, retry_delay, reactive_compact, is_prompt_too_long_error, RecoveryState |
|
||||
| Tools | bash, read_file, write_file (3) | bash, read_file, write_file (3) — unchanged |
|
||||
| Loop | Bare LLM call | Wrapped in try/except + continue retry |
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s11_error_recovery/code.py
|
||||
```
|
||||
|
||||
Try these prompts:
|
||||
|
||||
1. Ask the Agent to generate a very long piece of code, and observe whether it automatically continues after truncation (look for the `[max_tokens] escalating` log)
|
||||
2. Read many files consecutively to bloat the context, and observe reactive compact
|
||||
3. If you encounter 429/529, observe the exponential backoff log output
|
||||
|
||||
---
|
||||
|
||||
## What's Next
|
||||
|
||||
The Agent can now automatically recover from errors. But the tasks it handles are still one-shot — you give it a task, it finishes, it's done.
|
||||
|
||||
What if the Agent could manage a **task list** — with dependencies, persisted to disk, resumable across sessions? A TODO list is not a task system.
|
||||
|
||||
s12 Task System → Tasks form a dependency graph with state and persistence. This is the foundation for multi-Agent collaboration.
|
||||
|
||||
<details>
|
||||
<summary>Deep Dive into CC Source</summary>
|
||||
|
||||
> The following is based on CC source code: `query.ts` (1729 lines), `services/api/withRetry.ts` (822 lines), `query/tokenBudget.ts` (93 lines), and `utils/tokenBudget.ts` (73 lines).
|
||||
|
||||
### 1. A Dozen-Plus Reason/Transition Codes (Not Just 3)
|
||||
|
||||
The teaching version covers 3 of the most common recovery patterns. CC actually has a dozen-plus reason/transition codes, evaluated after every LLM call:
|
||||
|
||||
| Reason/Transition | Teaching Version | CC Behavior |
|
||||
|---|---|---|
|
||||
| `completed` | Normal completion | Return result |
|
||||
| `next_turn` | Normal tool call | Continue to next tool execution round |
|
||||
| `max_output_tokens_escalate` | Path 1 | 8K→64K escalation |
|
||||
| `max_output_tokens_recovery` | Path 1 continuation | Continuation prompt (up to 3 times) |
|
||||
| `reactive_compact_retry` | Path 2 | Reactive compact → retry |
|
||||
| `prompt_too_long` | Path 2 | Same as above |
|
||||
| `collapse_drain_retry` | Not covered | Context collapse — commit staged content first |
|
||||
| `model_error` | Not covered | Retry |
|
||||
| `image_error` | Not covered | `ImageSizeError` / `ImageResizeError` handled specifically |
|
||||
| `aborted_streaming` | Not covered | Streaming abort recovery |
|
||||
| `aborted_tools` | Not covered | Tool abort |
|
||||
| `stop_hook_blocking` | Not covered | Inject blocking error → model self-corrects |
|
||||
| `stop_hook_prevented` | Not covered | Hooks prevent execution |
|
||||
| `hook_stopped` | Not covered | Hook stopped execution |
|
||||
| `token_budget_continuation` | Not covered | Continue when token usage < 90% |
|
||||
| `blocking_limit` | Not covered | Blocking limit reached |
|
||||
| `max_turns` | Not covered | Maximum turns reached |
|
||||
|
||||
The teaching version only expands on the first 5 (most common); each of the rest has its own dedicated handling logic.
|
||||
|
||||
### 2. Precise Exponential Backoff Formula
|
||||
|
||||
CC's backoff delay (`withRetry.ts:530-548`):
|
||||
|
||||
```
|
||||
delay = min(500 × 2^(attempt-1), 32000) + random(0~25%)
|
||||
```
|
||||
|
||||
| Attempt | Base Delay | + Jitter |
|
||||
|---------|-----------|----------|
|
||||
| 1 | 500ms | 0-125ms |
|
||||
| 2 | 1000ms | 0-250ms |
|
||||
| 4 | 4000ms | 0-1000ms |
|
||||
| 7+ | 32000ms (cap) | 0-8000ms |
|
||||
|
||||
If the server returns a `Retry-After` header, that value takes priority.
|
||||
|
||||
### 3. Original CONTINUATION Prompt
|
||||
|
||||
CC's continuation prompt (`query.ts:1225-1227`):
|
||||
|
||||
```
|
||||
Output token limit hit. Resume directly — no apology, no recap of what
|
||||
you were doing. Pick up mid-thought if that is where the cut happened.
|
||||
Break remaining work into smaller pieces.
|
||||
```
|
||||
|
||||
Token budget nudge prompt (`tokenBudget.ts:72`):
|
||||
|
||||
```
|
||||
Stopped at {pct}% of token target. Keep working — do not summarize.
|
||||
```
|
||||
|
||||
### 4. Streaming Error Handling
|
||||
|
||||
In CC's streaming path, recoverable errors (413, max_tokens, media errors) are **withheld from display** during streaming (`query.ts:788-822`) — SDK consumers don't see them, only the recovery logic does. After streaming ends, the system determines whether recovery is needed.
|
||||
|
||||
### 5. 529 → Fallback Model Switch
|
||||
|
||||
After 3 consecutive 529 overload errors (`MAX_529_RETRIES = 3`), CC automatically switches to the fallback model (e.g., Opus → Sonnet). On switch, all pending messages and tool results are cleared, and the user sees "Switched to {model} due to high demand".
|
||||
|
||||
### 6. Diminishing Returns Detection
|
||||
|
||||
Token budget "continuations" aren't unlimited. When there are 3 consecutive continuations with a token increment < 500, the system determines "continuing won't produce meaningful output" and stops continuation (`tokenBudget.ts:60-62`).
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||
277
s11_error_recovery/README.ja.md
Normal file
@@ -0,0 +1,277 @@
|
||||
# s11: Error Recovery — エラーは終わりではなく、リトライの始まり
|
||||
|
||||
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
|
||||
|
||||
s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s20
|
||||
> *"エラーは終わりではなく、リトライの始まり"* — トークン拡張、コンテキスト圧縮、モデル切り替え。
|
||||
>
|
||||
> **Harness 層**: 耐障害性 — メインループのエラーを分類し復旧。
|
||||
|
||||
---
|
||||
|
||||
## 課題
|
||||
|
||||
Agent が動いている途中でエラーが出た:
|
||||
|
||||
```
|
||||
Error: 529 overloaded
|
||||
```
|
||||
|
||||
Agent がクラッシュした。リトライもしない、モデルも切り替えない、コンテキストも減らさない——そのままクラッシュ。
|
||||
|
||||
本番環境では API エラーが日常茶飯事。最も一般的な 3 つの障害パターン:**出力の切り詰め**(モデルが途中まで出力して token が尽きた)、**コンテキスト超過**(圧縮後も長すぎる)、**一時的障害**(429 レート制限 / 529 過負荷)。エラーを処理しない Agent は、一度触れただけで止まる車のようなものだ。
|
||||
|
||||
---
|
||||
|
||||
## 解決策
|
||||
|
||||

|
||||
|
||||
s10 のループ、prompt 組み立てはすべてそのまま。唯一の変更点:LLM 呼び出しを try/except で包み、エラータイプに応じて異なる復旧パスに振り分ける。復旧後は `continue` でループ先頭に戻り、再度 LLM を呼び出す。
|
||||
|
||||
最も一般的な 3 つの復旧パターン(教学版は 429/529 のみ対応;実際のシステムは接続エラー、タイムアウト、クラウドベンダーの認証キャッシュ等もカバー。CC には実際 13 以上の reason code があるが、残りは Deep dive で解説):
|
||||
|
||||
| パターン | トリガー | 復旧アクション |
|
||||
|----------|----------|---------------|
|
||||
| 出力切り詰め | `max_tokens` | 8K→64K に拡張 / 続きのプロンプト注入 |
|
||||
| コンテキスト超過 | `prompt_too_long` | reactive compact → リトライ |
|
||||
| 一時的障害 | 429 / 529 | 指数バックオフ + ジッター、連続 529 でフォールバックモデルに切り替え可能 |
|
||||
|
||||
---
|
||||
|
||||
## 仕組み
|
||||
|
||||
### パス 1: 出力が切り詰められた
|
||||
|
||||
モデルが途中まで出力して、`max_tokens` に達した。デフォルトの 8000 token では完全な回答を出力しきれない。
|
||||
|
||||
初回発生時、`max_tokens` を 8K から 64K に拡張(8 倍の空間)し、同じリクエストをリトライする——この時、切り詰められた出力は messages に追加せず、元のリクエストをそのまま維持する。64K でも足りない場合にのみ、切り詰められた出力を保存し、続きのプロンプトを注入してモデルに先ほどの続きを出力させる。最大 3 回まで:
|
||||
|
||||
```python
|
||||
if response.stop_reason == "max_tokens":
|
||||
# First escalation: don't append truncated output, retry same request
|
||||
if not state.has_escalated:
|
||||
max_tokens = ESCALATED_MAX_TOKENS
|
||||
state.has_escalated = True
|
||||
continue # messages unchanged, same request with more tokens
|
||||
# 64K still truncated: save output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if state.recovery_count < MAX_RECOVERY_RETRIES:
|
||||
messages.append({"role": "user", "content":
|
||||
"Output token limit hit. Resume directly — "
|
||||
"no apology, no recap. Pick up mid-thought."})
|
||||
state.recovery_count += 1
|
||||
continue
|
||||
return # still truncated after 3 continuations
|
||||
# Normal: append after max_tokens check
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
```
|
||||
|
||||
拡張は 1 回だけ、続きの出力は最大 3 回。超過したら終了——これ以上続けても実質的な出力は得られない。
|
||||
|
||||
### パス 2: コンテキスト超過
|
||||
|
||||
LLM が「コンテキストが長すぎる」と返す(`prompt_too_long`)。s08 の 4 層圧縮をすべて実行したのに、まだ超えている。
|
||||
|
||||
reactive compact をトリガー——auto compact よりも積極的。教学版は最後の 5 メッセージだけを残して圧縮をシミュレート;実際の CC は LLM で compact サマリを生成してからリトライする。圧縮後にリトライ。ただし、一度圧縮してもまだ超過している場合は終了するしかない——再度圧縮しても小さくはならない:
|
||||
|
||||
```python
|
||||
except PromptTooLongError:
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return # 圧縮済みでも超過、終了するしかない
|
||||
```
|
||||
|
||||
### パス 3: 一時的障害
|
||||
|
||||
ネットワークの揺らぎ、429 レート制限、529 過負荷——これらはバグではなく、分散システムの日常だ。
|
||||
|
||||
429 と 529 は統一して指数バックオフ + ジッターを使用:1 回目は 0.5 秒待機、2 回目は 1 秒、3 回目は 2 秒、最大 10 回。ランダムジッターを加えることで、並行リクエストが同時にリトライするのを防ぐ。3 回連続で 529 過負荷 → フォールバックモデルに切り替え(`FALLBACK_MODEL_ID` 環境変数が設定されている場合):
|
||||
|
||||
```python
|
||||
def retry_delay(attempt, retry_after=None):
|
||||
if retry_after:
|
||||
return retry_after
|
||||
base = min(500 * (2 ** attempt), 32000) / 1000
|
||||
return base + random.uniform(0, base * 0.25)
|
||||
|
||||
def with_retry(fn, state, max_retries=10):
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
return fn()
|
||||
except (RateLimitError, OverloadedError):
|
||||
delay = retry_delay(attempt)
|
||||
time.sleep(delay)
|
||||
if is_overloaded:
|
||||
state.consecutive_529 += 1
|
||||
if state.consecutive_529 >= 3 and FALLBACK_MODEL:
|
||||
state.current_model = FALLBACK_MODEL
|
||||
raise MaxRetriesExceeded()
|
||||
```
|
||||
|
||||
バックオフの公式:`min(500 × 2^attempt, 32000) + random(0~25%)`。サーバーが `Retry-After` ヘッダーを返した場合、その値を優先して使用する。
|
||||
|
||||
### 統合して実行
|
||||
|
||||
```python
|
||||
def agent_loop(messages, context):
|
||||
system = get_system_prompt(context)
|
||||
state = RecoveryState()
|
||||
max_tokens = 8000
|
||||
|
||||
while True:
|
||||
try:
|
||||
response = with_retry(
|
||||
lambda: client.messages.create(
|
||||
model=state.current_model, system=system,
|
||||
messages=messages, tools=TOOLS,
|
||||
max_tokens=max_tokens),
|
||||
state)
|
||||
except Exception as e:
|
||||
if is_prompt_too_long_error(e):
|
||||
if not state.has_attempted_reactive_compact:
|
||||
messages[:] = reactive_compact(messages)
|
||||
state.has_attempted_reactive_compact = True
|
||||
continue
|
||||
return
|
||||
log_error(e)
|
||||
return
|
||||
|
||||
# max_tokens check BEFORE appending to messages
|
||||
if response.stop_reason == "max_tokens":
|
||||
if not state.has_escalated:
|
||||
max_tokens = 64000
|
||||
state.has_escalated = True
|
||||
continue # retry same request, messages unchanged
|
||||
# save truncated output + continuation prompt
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
messages.append({"role": "user", "content": CONTINUATION_PROMPT})
|
||||
continue
|
||||
# Normal completion
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
# ... tool execution ...
|
||||
```
|
||||
|
||||
外側の try/except が API 例外(prompt_too_long 等)を捕捉し、`with_retry` が一時的エラー(429/529)を処理し、`stop_reason` のチェックが切り詰めを処理する。3 つの復旧メカニズムがそれぞれ異なるエラータイプを担当する。
|
||||
|
||||
---
|
||||
|
||||
## s10 からの変更点
|
||||
|
||||
| コンポーネント | 変更前 (s10) | 変更後 (s11) |
|
||||
|---------------|-------------|-------------|
|
||||
| エラー処理 | なし(エラーで即クラッシュ) | 3 つの復旧パターン + 指数バックオフ |
|
||||
| 新規定数 | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500, FALLBACK_MODEL |
|
||||
| 新規関数 | — | with_retry, retry_delay, reactive_compact, is_prompt_too_long_error, RecoveryState |
|
||||
| ツール | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 変更なし |
|
||||
| ループ | LLM を直接呼び出し | try/except で包み + continue でリトライ |
|
||||
|
||||
---
|
||||
|
||||
## 試してみる
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python s11_error_recovery/code.py
|
||||
```
|
||||
|
||||
以下の prompt を試してみよう:
|
||||
|
||||
1. Agent に長いコードを生成させ、切り詰め後に自動で続きが出力されるか観察する(`[max_tokens] escalating` ログを確認)
|
||||
2. 連続して大量のファイルを読み込みコンテキストを肥大化させ、reactive compact の動作を観察する
|
||||
3. 429/529 が発生した場合、指数バックオフのログ出力を観察する
|
||||
|
||||
---
|
||||
|
||||
## 次のステップ
|
||||
|
||||
Agent はエラーから自動的に復旧できるようになった。しかし、まだ処理するタスクは「使い捨て」だ——タスクを与えると実行し、終わる。
|
||||
|
||||
Agent に**タスクリスト**を管理させられないだろうか——依存関係があり、ディスクに永続化され、セッションをまたいで復旧できる?TODO リストはタスクシステムではない。
|
||||
|
||||
s12 Task System → タスクとは依存関係があり、状態があり、永続化されたグラフだ。これはマルチ Agent 協調の基盤となる。
|
||||
|
||||
<details>
|
||||
<summary>CC ソースコード深掘り</summary>
|
||||
|
||||
> 以下は CC ソースコード `query.ts`(1729 行)、`services/api/withRetry.ts`(822 行)、`query/tokenBudget.ts`(93 行)、`utils/tokenBudget.ts`(73 行)の分析に基づく。
|
||||
|
||||
### 一、十数種の reason/transition(3 つだけではない)
|
||||
|
||||
教学版では最も一般的な 3 つの復旧パターンを解説した。CC には実際十数種の reason/transition があり、毎回の LLM 呼び出し後に判定される:
|
||||
|
||||
| reason/transition | 教学版の対応 | CC の動作 |
|
||||
|---|---|---|
|
||||
| `completed` | 正常終了 | 結果を返す |
|
||||
| `next_turn` | 通常のツール呼び出し | 次のツール実行ラウンドへ |
|
||||
| `max_output_tokens_escalate` | パス 1 | 8K→64K に拡張 |
|
||||
| `max_output_tokens_recovery` | パス 1 続き出力 | 続きのプロンプト注入(最大 3 回) |
|
||||
| `reactive_compact_retry` | パス 2 | reactive compact → リトライ |
|
||||
| `prompt_too_long` | パス 2 | 同上 |
|
||||
| `collapse_drain_retry` | 未展開 | context collapse 時にまず保留中の内容をコミット |
|
||||
| `model_error` | 未展開 | リトライ |
|
||||
| `image_error` | 未展開 | `ImageSizeError` / `ImageResizeError` の専用処理 |
|
||||
| `aborted_streaming` | 未展開 | ストリーミング中断の復旧 |
|
||||
| `aborted_tools` | 未展開 | ツール中断 |
|
||||
| `stop_hook_blocking` | 未展開 | blocking error を注入 → モデルが自己修正 |
|
||||
| `stop_hook_prevented` | 未展開 | hooks によるブロック |
|
||||
| `hook_stopped` | 未展開 | hook による実行停止 |
|
||||
| `token_budget_continuation` | 未展開 | token 使用量 < 90% の時に継続 |
|
||||
| `blocking_limit` | 未展開 | ブロック制限 |
|
||||
| `max_turns` | 未展開 | 最大ターン数に到達 |
|
||||
|
||||
教学版では最初の 5 つ(最も一般的なもの)だけを展開した。残りはそれぞれ専用の処理ロジックを持つ。
|
||||
|
||||
### 二、指数バックオフの正確な公式
|
||||
|
||||
CC のバックオフ遅延(`withRetry.ts:530-548`):
|
||||
|
||||
```
|
||||
delay = min(500 × 2^(attempt-1), 32000) + random(0~25%)
|
||||
```
|
||||
|
||||
| 試行 | 基本遅延 | + ジッター |
|
||||
|------|---------|-----------|
|
||||
| 1 | 500ms | 0-125ms |
|
||||
| 2 | 1000ms | 0-250ms |
|
||||
| 4 | 4000ms | 0-1000ms |
|
||||
| 7+ | 32000ms(上限) | 0-8000ms |
|
||||
|
||||
サーバーが `Retry-After` ヘッダーを返した場合、その値を優先して使用する。
|
||||
|
||||
### 三、CONTINUATION プロンプト原文
|
||||
|
||||
CC の続き出力プロンプト(`query.ts:1225-1227`):
|
||||
|
||||
```
|
||||
Output token limit hit. Resume directly — no apology, no recap of what
|
||||
you were doing. Pick up mid-thought if that is where the cut happened.
|
||||
Break remaining work into smaller pieces.
|
||||
```
|
||||
|
||||
Token budget のナッジプロンプト(`tokenBudget.ts:72`):
|
||||
|
||||
```
|
||||
Stopped at {pct}% of token target. Keep working — do not summarize.
|
||||
```
|
||||
|
||||
### 四、ストリーミングエラー処理
|
||||
|
||||
CC のストリーミングパスでは、復旧可能なエラー(413、max_tokens、media error)はストリーミング中**表示を保留される**(`query.ts:788-822`)——SDK コンシューマーには見えず、復旧ロジックだけが認識できる。ストリーミング終了後に復旧が必要かどうかを判断する。
|
||||
|
||||
### 五、529 → フォールバックモデル切り替え
|
||||
|
||||
3 回連続で 529 過負荷エラーが発生した後(`MAX_529_RETRIES = 3`)、CC は自動的にフォールバックモデルに切り替える(例:Opus → Sonnet)。切り替え時にすべての保留中のメッセージと tool 結果をクリアし、ユーザーに "Switched to {model} due to high demand" と表示する。
|
||||
|
||||
### 六、収穫逓減の検出
|
||||
|
||||
Token budget の「継続」は無限ではない。連続 3 回の continuation で token 増分が 500 未満の場合、システムは「続けても実質的な出力は得られない」と判断し、continuation を停止する(`tokenBudget.ts:60-62`)。
|
||||
|
||||
</details>
|
||||
|
||||
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->
|
||||