mirror of
https://github.com/shareAI-lab/analysis_claude_code.git
synced 2026-03-22 02:15:42 +08:00
the model is the agent, the code is the harness
Comprehensive rewrite establishing the harness engineering narrative across the entire repository. README (EN/ZH/JA): added "The Model IS the Agent" manifesto with historical proof (DQN, OpenAI Five, AlphaStar, Tencent Jueyu), "What an Agent Is NOT" critique, harness engineer role definition, "Why Claude Code" as masterclass in harness design, and universe vision. Consistent framing: model = driver, harness = vehicle. docs (36 files, 3 languages): injected one-line "Harness layer" callout after the motto in every session document (s01-s12). agents (13 Python files): added harness framing comment before each module docstring. skills/agent-philosophy.md: full rewrite aligned with harness narrative.
This commit is contained in:
parent
e57ced7d07
commit
a9c71002d2
178
README-ja.md
178
README-ja.md
@ -1,7 +1,142 @@
|
||||
# Learn Claude Code -- 0 から 1 へ構築する nano Claude Code-like agent
|
||||
# Learn Claude Code -- 真の Agent のための Harness Engineering
|
||||
|
||||
[English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md)
|
||||
|
||||
## モデルこそが Agent である
|
||||
|
||||
コードの話をする前に、一つだけ明確にしておく。
|
||||
|
||||
**Agent とはモデルのことだ。フレームワークではない。プロンプトチェーンではない。ドラッグ&ドロップのワークフローではない。**
|
||||
|
||||
### Agent とは何か
|
||||
|
||||
Agent とはニューラルネットワークである -- Transformer、RNN、学習された関数 -- 数十億回の勾配更新を経て、行動系列データの上で環境を知覚し、目標を推論し、行動を起こすことを学んだもの。AI における "Agent" という言葉は、始まりからずっとこの意味だった。常に。
|
||||
|
||||
人間も Agent だ。数百万年の進化的訓練によって形作られた生物的ニューラルネットワーク。感覚で世界を知覚し、脳で推論し、身体で行動する。DeepMind、OpenAI、Anthropic が "Agent" と言うとき、それはこの分野が誕生以来ずっと意味してきたものと同じだ:**行動することを学んだモデル。**
|
||||
|
||||
歴史がその証拠を刻んでいる:
|
||||
|
||||
- **2013 -- DeepMind DQN が Atari をプレイ。** 単一のニューラルネットワークが、生のピクセルとスコアだけを受け取り、7 つの Atari 2600 ゲームを学習 -- すべての先行アルゴリズムを超え、3 つで人間の専門家を打ち負かした。2015 年には同じアーキテクチャが [49 ゲームに拡張され、プロのテスターに匹敵](https://www.nature.com/articles/nature14236)、*Nature* に掲載。ゲーム固有のルールなし。決定木なし。一つのモデルが経験から学んだ。そのモデルが Agent だった。
|
||||
|
||||
- **2019 -- OpenAI Five が Dota 2 を制覇。** 5 つのニューラルネットワークが 10 ヶ月間で [45,000 年分の Dota 2](https://openai.com/index/openai-five-defeats-dota-2-world-champions/) を自己対戦し、サンフランシスコのライブストリームで **OG** -- TI8 世界王者 -- を 2-0 で撃破。その後の公開アリーナでは 42,729 試合で勝率 99.4%。スクリプト化された戦略なし。メタプログラムされたチーム連携なし。モデルが完全に自己対戦を通じてチームワーク、戦術、リアルタイム適応を学んだ。
|
||||
|
||||
- **2019 -- DeepMind AlphaStar が StarCraft II をマスター。** AlphaStar は非公開戦で[プロ選手を 10-1 で撃破](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/)、その後ヨーロッパサーバーで[グランドマスター到達](https://www.nature.com/articles/d41586-019-03298-6) -- 90,000 人中の上位 0.15%。不完全情報、リアルタイム判断、チェスや囲碁を遥かに凌駕する組合せ的行動空間を持つゲーム。Agent とは? モデルだ。訓練されたもの。スクリプトではない。
|
||||
|
||||
- **2019 -- Tencent 絶悟が王者栄耀を支配。** Tencent AI Lab の「絶悟」は 2019 年 8 月 2 日、世界チャンピオンカップで [KPL プロ選手を 5v5 で撃破](https://www.jiemian.com/article/3371171.html)。1v1 モードではプロが [15 戦中 1 勝のみ、8 分以上生存不可](https://developer.aliyun.com/article/851058)。訓練強度:1 日 = 人間の 440 年。2021 年までに全ヒーロープールで KPL プロを全面的に上回った。手書きのヒーロー相性表なし。スクリプト化されたチーム編成なし。自己対戦でゲーム全体をゼロから学んだモデル。
|
||||
|
||||
- **2024-2025 -- LLM Agent がソフトウェアエンジニアリングを再構築。** Claude、GPT、Gemini -- 人類のコードと推論の全幅で訓練された大規模言語モデル -- がコーディング Agent として展開される。コードベースを読み、実装を書き、障害をデバッグし、チームで協調する。アーキテクチャは先行するすべての Agent と同一:訓練されたモデルが環境に配置され、知覚と行動のツールを与えられる。唯一の違いは、学んだものの規模と解くタスクの汎用性。
|
||||
|
||||
すべてのマイルストーンが同じ真理を共有している:**"Agent" は決して周囲のコードではない。Agent は常にモデルそのものだ。**
|
||||
|
||||
### Agent ではないもの
|
||||
|
||||
"Agent" という言葉は、プロンプト配管工の産業全体に乗っ取られてしまった。
|
||||
|
||||
ドラッグ&ドロップのワークフロービルダー。ノーコード "AI Agent" プラットフォーム。プロンプトチェーン・オーケストレーションライブラリ。すべて同じ幻想を共有している:LLM API 呼び出しを if-else 分岐、ノードグラフ、ハードコードされたルーティングロジックで繋ぎ合わせることが "Agent の構築" だと。
|
||||
|
||||
違う。彼らが作ったものはルーブ・ゴールドバーグ・マシンだ -- 過剰に設計された脆い手続き的ルールのパイプライン。LLM は美化されたテキスト補完ノードとして押し込まれているだけ。それは Agent ではない。壮大な妄想を持つシェルスクリプトだ。
|
||||
|
||||
**プロンプト配管工式 "Agent" は、モデルを訓練しないプログラマーの妄想だ。** 手続き的ロジックを積み重ねて知能を力技で再現しようとする -- 巨大なルールツリー、ノードグラフ、チェーン・プロンプトの滝 -- そして十分なグルーコードがいつか自律的振る舞いを創発すると祈る。しない。工学的手段で Agency をコーディングすることはできない。Agency は学習されるものであって、プログラムされるものではない。
|
||||
|
||||
あのシステムたちは生まれた瞬間から死んでいる:脆弱で、スケールせず、汎化が根本的に不可能。GOFAI(Good Old-Fashioned AI、古典的記号 AI)の現代版だ -- 何十年も前に学術界が放棄した記号ルールシステムが、LLM のペンキを塗り直して再登場した。パッケージが違うだけで、同じ袋小路。
|
||||
|
||||
### マインドシフト:「Agent を開発する」から Harness を開発する へ
|
||||
|
||||
「Agent を開発しています」と言うとき、意味できるのは二つだけだ:
|
||||
|
||||
**1. モデルを訓練する。** 強化学習、ファインチューニング、RLHF、その他の勾配ベースの手法で重みを調整する。タスクプロセスデータ -- 実ドメインにおける知覚・推論・行動の実際の系列 -- を収集し、モデルの振る舞いを形成する。DeepMind、OpenAI、Tencent AI Lab、Anthropic が行っていること。これが最も本来的な Agent 開発。
|
||||
|
||||
**2. Harness を構築する。** モデルに動作環境を提供するコードを書く。私たちの大半が行っていることであり、このリポジトリの核心。
|
||||
|
||||
Harness とは、Agent が特定のドメインで機能するために必要なすべて:
|
||||
|
||||
```
|
||||
Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
|
||||
|
||||
Tools: ファイル I/O、シェル、ネットワーク、データベース、ブラウザ
|
||||
Knowledge: 製品ドキュメント、ドメイン資料、API 仕様、スタイルガイド
|
||||
Observation: git diff、エラーログ、ブラウザ状態、センサーデータ
|
||||
Action: CLI コマンド、API 呼び出し、UI インタラクション
|
||||
Permissions: サンドボックス、承認ワークフロー、信頼境界
|
||||
```
|
||||
|
||||
モデルが決断する。Harness が実行する。モデルが推論する。Harness がコンテキストを提供する。モデルはドライバー。Harness は車両。
|
||||
|
||||
**コーディング Agent の Harness は IDE、ターミナル、ファイルシステム。** 農業 Agent の Harness はセンサーアレイ、灌漑制御、気象データフィード。ホテル Agent の Harness は予約システム、ゲストコミュニケーションチャネル、施設管理 API。Agent -- 知性、意思決定者 -- は常にモデル。Harness はドメインごとに変わる。Agent はドメインを超えて汎化する。
|
||||
|
||||
このリポジトリは車両の作り方を教える。コーディング用の車両だ。だが設計パターンはあらゆるドメインに汎化する:農場管理、ホテル運営、工場製造、物流、医療、教育、科学研究。タスクが知覚され、推論され、実行される必要がある場所ならどこでも -- Agent には Harness が要る。
|
||||
|
||||
### Harness エンジニアの仕事
|
||||
|
||||
このリポジトリを読んでいるなら、あなたはおそらく Harness エンジニアだ -- それは強力なアイデンティティ。以下があなたの本当の仕事:
|
||||
|
||||
- **ツールの実装。** Agent に手を与える。ファイル読み書き、シェル実行、API 呼び出し、ブラウザ制御、データベースクエリ。各ツールは Agent が環境内で取れる行動。原子的で、組み合わせ可能で、記述が明確であるように設計する。
|
||||
|
||||
- **知識のキュレーション。** Agent にドメイン専門性を与える。製品ドキュメント、アーキテクチャ決定記録、スタイルガイド、規制要件。オンデマンドで読み込み(s05)、前もって詰め込まない。Agent は何が利用可能か知った上で、必要なものを自ら取得すべき。
|
||||
|
||||
- **コンテキストの管理。** Agent にクリーンな記憶を与える。サブ Agent 隔離(s04)がノイズの漏洩を防ぐ。コンテキスト圧縮(s06)が履歴の氾濫を防ぐ。タスクシステム(s07)が目標を単一の会話を超えて永続化する。
|
||||
|
||||
- **権限の制御。** Agent に境界を与える。ファイルアクセスのサンドボックス化。破壊的操作への承認要求。Agent と外部システム間の信頼境界の実施。安全工学と Harness 工学の交差点。
|
||||
|
||||
- **タスクプロセスデータの収集。** Agent があなたの Harness 内で実行するすべての行動系列は訓練シグナル。実デプロイメントの知覚-推論-行動トレースは、次世代 Agent モデルをファインチューニングする原材料。あなたの Harness は Agent に仕えるだけでなく -- Agent を進化させる助けにもなる。
|
||||
|
||||
あなたは知性を書いているのではない。知性が住まう世界を構築している。その世界の品質 -- Agent がどれだけ明瞭に知覚でき、どれだけ正確に行動でき、利用可能な知識がどれだけ豊かか -- が、知性がどれだけ効果的に自らを表現できるかを直接決定する。
|
||||
|
||||
**優れた Harness を作れ。Agent が残りをやる。**
|
||||
|
||||
### なぜ Claude Code か -- Harness Engineering の大師範
|
||||
|
||||
なぜこのリポジトリは特に Claude Code を解剖するのか?
|
||||
|
||||
Claude Code は私たちが見てきた中で最もエレガントで完成度の高い Agent Harness だからだ。単一の巧妙なトリックのためではなく、それが *しないこと* のために:Agent そのものになろうとしない。硬直的なワークフローを押し付けない。精緻な決定木でモデルを二度推しない。ツール、知識、コンテキスト管理、権限境界をモデルに提供し -- そして道を譲る。
|
||||
|
||||
Claude Code の本質を剥き出しにすると:
|
||||
|
||||
```
|
||||
Claude Code = 一つの agent loop
|
||||
+ ツール (bash, read, write, edit, glob, grep, browser...)
|
||||
+ オンデマンド skill ロード
|
||||
+ コンテキスト圧縮
|
||||
+ サブ Agent スポーン
|
||||
+ 依存グラフ付きタスクシステム
|
||||
+ 非同期メールボックスによるチーム協調
|
||||
+ worktree 分離による並列実行
|
||||
+ 権限ガバナンス
|
||||
```
|
||||
|
||||
これがすべてだ。これが全アーキテクチャ。すべてのコンポーネントは Harness メカニズム -- Agent が住む世界の一部。Agent そのものは? Claude だ。モデル。Anthropic が人類の推論とコードの全幅で訓練した。Harness が Claude を賢くしたのではない。Claude は元々賢い。Harness が Claude に手と目とワークスペースを与えた。
|
||||
|
||||
これが Claude Code が理想的な教材である理由だ:**モデルを信頼し、工学的努力を Harness に集中させるとどうなるかを示している。** このリポジトリの各セッション(s01-s12)は Claude Code アーキテクチャから一つの Harness メカニズムをリバースエンジニアリングする。終了時には、Claude Code の仕組みだけでなく、あらゆるドメインのあらゆる Agent に適用される Harness 工学の普遍的原則を理解している。
|
||||
|
||||
教訓は「Claude Code をコピーせよ」ではない。教訓は:**最高の Agent プロダクトは、自分の仕事が Harness であって Intelligence ではないと理解しているエンジニアが作る。**
|
||||
|
||||
---
|
||||
|
||||
## ビジョン:宇宙を本物の Agent で満たす
|
||||
|
||||
これはコーディング Agent だけの話ではない。
|
||||
|
||||
人間が複雑で多段階の判断集約的な仕事をしているすべてのドメインは、Agent が稼働できるドメインだ -- 正しい Harness さえあれば。このリポジトリのパターンは普遍的だ:
|
||||
|
||||
```
|
||||
不動産管理 Agent = モデル + 物件センサー + メンテナンスツール + テナント通信
|
||||
農業 Agent = モデル + 土壌/気象データ + 灌漑制御 + 作物知識
|
||||
ホテル運営 Agent = モデル + 予約システム + ゲストチャネル + 施設 API
|
||||
医学研究 Agent = モデル + 文献検索 + 実験機器 + プロトコル文書
|
||||
製造 Agent = モデル + 生産ラインセンサー + 品質管理 + 物流
|
||||
教育 Agent = モデル + カリキュラム知識 + 学生進捗 + 評価ツール
|
||||
```
|
||||
|
||||
ループは常に同じ。ツールが変わる。知識が変わる。権限が変わる。Agent -- モデル -- がすべてを汎化する。
|
||||
|
||||
このリポジトリを読むすべての Harness エンジニアは、ソフトウェアエンジニアリングを遥かに超えたパターンを学んでいる。知的で自動化された未来のためのインフラストラクチャを構築することを学んでいる。実ドメインにデプロイされた優れた Harness の一つ一つが、Agent が知覚し、推論し、行動できる新たな拠点。
|
||||
|
||||
まずワークショップを満たす。次に農場、病院、工場。次に都市。次に惑星。
|
||||
|
||||
**Bash is all you need. Real agents are all the universe needs.**
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
THE AGENT PATTERN
|
||||
=================
|
||||
@ -17,12 +152,15 @@
|
||||
loop back -----------------> messages[]
|
||||
|
||||
|
||||
これは最小ループだ。すべての AI コーディングエージェントに必要な土台になる。
|
||||
本番のエージェントには、ポリシー・権限・ライフサイクル層が追加される。
|
||||
最小ループ。すべての AI Agent にこのループが必要だ。
|
||||
モデルがツール呼び出しと停止を決める。
|
||||
コードはモデルの要求を実行するだけ。
|
||||
このリポジトリはこのループを囲むすべて --
|
||||
Agent を特定ドメインで効果的にする Harness -- の作り方を教える。
|
||||
```
|
||||
|
||||
**12 の段階的セッション、シンプルなループから分離された自律実行まで。**
|
||||
**各セッションは1つのメカニズムを追加する。各メカニズムには1つのモットーがある。**
|
||||
**各セッションは 1 つの Harness メカニズムを追加する。各メカニズムには 1 つのモットーがある。**
|
||||
|
||||
> **s01** *"One loop & Bash is all you need"* — 1つのツール + 1つのループ = エージェント
|
||||
>
|
||||
@ -77,20 +215,20 @@ def agent_loop(messages):
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
各セッションはこのループの上に1つのメカニズムを重ねる -- ループ自体は変わらない。
|
||||
各セッションはこのループの上に 1 つの Harness メカニズムを重ねる -- ループ自体は変わらない。ループは Agent のもの。メカニズムは Harness のもの。
|
||||
|
||||
## スコープ (重要)
|
||||
|
||||
このリポジトリは、nano Claude Code-like agent を 0->1 で構築・学習するための教材プロジェクトです。
|
||||
学習を優先するため、以下の本番メカニズムは意図的に簡略化または省略しています。
|
||||
このリポジトリは Harness 工学の 0->1 学習プロジェクト -- Agent モデルを囲む環境の構築を学ぶ。
|
||||
学習を優先するため、以下の本番メカニズムは意図的に簡略化または省略している:
|
||||
|
||||
- 完全なイベント / Hook バス (例: PreToolUse, SessionStart/End, ConfigChange)。
|
||||
s12 では教材用に最小の追記型ライフサイクルイベントのみ実装している。
|
||||
- 完全なイベント / Hook バス (例: PreToolUse, SessionStart/End, ConfigChange)。
|
||||
s12 では教材用に最小の追記型ライフサイクルイベントのみ実装。
|
||||
- ルールベースの権限ガバナンスと信頼フロー
|
||||
- セッションライフサイクル制御 (resume/fork) と高度な worktree ライフサイクル制御
|
||||
- MCP ランタイムの詳細 (transport/OAuth/リソース購読/ポーリング)
|
||||
|
||||
このリポジトリの JSONL メールボックス方式は教材用の実装であり、特定の本番内部実装を主張するものではありません。
|
||||
このリポジトリの JSONL メールボックス方式は教材用の実装であり、特定の本番内部実装を主張するものではない。
|
||||
|
||||
## クイックスタート
|
||||
|
||||
@ -181,7 +319,7 @@ learn-claude-code/
|
||||
|
||||
## 次のステップ -- 理解から出荷へ
|
||||
|
||||
12 セッションを終えれば、エージェントの内部構造を完全に理解している。その知識を活かす 2 つの方法:
|
||||
12 セッションを終えれば、Harness 工学の内部構造を完全に理解している。その知識を活かす 2 つの方法:
|
||||
|
||||
### Kode Agent CLI -- オープンソース Coding Agent CLI
|
||||
|
||||
@ -201,16 +339,16 @@ GitHub: **[shareAI-lab/Kode-agent-sdk](https://github.com/shareAI-lab/Kode-agent
|
||||
|
||||
## 姉妹教材: *オンデマンドセッション*から*常時稼働アシスタント*へ
|
||||
|
||||
本リポジトリが教えるエージェントは **使い捨て型** -- ターミナルを開き、タスクを与え、終わったら閉じる。次のセッションは白紙から始まる。これが Claude Code のモデル。
|
||||
本リポジトリが教える Harness は **使い捨て型** -- ターミナルを開き、Agent にタスクを与え、終わったら閉じる。次のセッションは白紙から始まる。Claude Code のモデル。
|
||||
|
||||
[OpenClaw](https://github.com/openclaw/openclaw) は別の可能性を証明した: 同じ agent core の上に 2 つのメカニズムを追加するだけで、エージェントは「突かないと動かない」から「30 秒ごとに自分で起きて仕事を探す」に変わる:
|
||||
[OpenClaw](https://github.com/openclaw/openclaw) は別の可能性を証明した: 同じ agent core の上に 2 つの Harness メカニズムを追加するだけで、Agent は「突かないと動かない」から「30 秒ごとに自分で起きて仕事を探す」に変わる:
|
||||
|
||||
- **ハートビート** -- 30 秒ごとにシステムがエージェントにメッセージを送り、やることがあるか確認させる。なければスリープ続行、あれば即座に行動。
|
||||
- **Cron** -- エージェントが自ら未来のタスクをスケジュールし、時間が来たら自動実行。
|
||||
- **ハートビート** -- 30 秒ごとに Harness が Agent にメッセージを送り、やることがあるか確認させる。なければスリープ続行、あれば即座に行動。
|
||||
- **Cron** -- Agent が自ら未来のタスクをスケジュールし、時間が来たら自動実行。
|
||||
|
||||
さらにマルチチャネル IM ルーティング (WhatsApp / Telegram / Slack / Discord 等 13+ プラットフォーム)、永続コンテキストメモリ、Soul パーソナリティシステムを加えると、エージェントは使い捨てツールから常時稼働のパーソナル AI アシスタントへ変貌する。
|
||||
さらにマルチチャネル IM ルーティング (WhatsApp / Telegram / Slack / Discord 等 13+ プラットフォーム)、永続コンテキストメモリ、Soul パーソナリティシステムを加えると、Agent は使い捨てツールから常時稼働のパーソナル AI アシスタントへ変貌する。
|
||||
|
||||
**[claw0](https://github.com/shareAI-lab/claw0)** はこれらのメカニズムをゼロから分解する姉妹教材リポジトリ:
|
||||
**[claw0](https://github.com/shareAI-lab/claw0)** はこれらの Harness メカニズムをゼロから分解する姉妹教材リポジトリ:
|
||||
|
||||
```
|
||||
claw agent = agent core + heartbeat + cron + IM chat + memory + soul
|
||||
@ -218,7 +356,7 @@ claw agent = agent core + heartbeat + cron + IM chat + memory + soul
|
||||
|
||||
```
|
||||
learn-claude-code claw0
|
||||
(エージェントランタイムコア: (能動的な常時稼働アシスタント:
|
||||
(agent harness コア: (能動的な常時稼働 harness:
|
||||
ループ、ツール、計画、 ハートビート、cron、IM チャネル、
|
||||
チーム、worktree 分離) メモリ、Soul パーソナリティ)
|
||||
```
|
||||
@ -229,4 +367,6 @@ MIT
|
||||
|
||||
---
|
||||
|
||||
**モデルがエージェントだ。私たちの仕事はツールを与えて邪魔しないこと。**
|
||||
**モデルが Agent だ。コードは Harness だ。優れた Harness を作れ。Agent が残りをやる。**
|
||||
|
||||
**Bash is all you need. Real agents are all the universe needs.**
|
||||
|
||||
168
README-zh.md
168
README-zh.md
@ -1,7 +1,142 @@
|
||||
# Learn Claude Code -- 从 0 到 1 构建 nano Claude Code-like agent
|
||||
# Learn Claude Code -- 真正的 Agent Harness 工程
|
||||
|
||||
[English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md)
|
||||
|
||||
## 模型就是 Agent
|
||||
|
||||
在讨论代码之前,先把一件事彻底说清楚。
|
||||
|
||||
**Agent 是模型。不是框架。不是提示词链。不是拖拽式工作流。**
|
||||
|
||||
### Agent 到底是什么
|
||||
|
||||
Agent 是一个神经网络 -- Transformer、RNN、一个被训练出来的函数 -- 经过数十亿次梯度更新,在行动序列数据上学会了感知环境、推理目标、采取行动。"Agent" 这个词在 AI 领域从诞生之日起就是这个意思。从来都是。
|
||||
|
||||
人类就是 agent。一个由数百万年进化训练出来的生物神经网络,通过感官感知世界,通过大脑推理,通过身体行动。当 DeepMind、OpenAI 或 Anthropic 说 "agent" 时,他们说的和这个领域自诞生以来就一直在说的完全一样:**一个学会了行动的模型。**
|
||||
|
||||
历史已经写好了铁证:
|
||||
|
||||
- **2013 -- DeepMind DQN 玩 Atari。** 一个神经网络,只接收原始像素和游戏分数,学会了 7 款 Atari 2600 游戏 -- 超越所有先前算法,在其中 3 款上击败人类专家。到 2015 年,同一架构扩展到 [49 款游戏,达到职业人类测试员水平](https://www.nature.com/articles/nature14236),论文发表在 *Nature*。没有游戏专属规则。没有决策树。一个模型,从经验中学习。那个模型就是 agent。
|
||||
|
||||
- **2019 -- OpenAI Five 征服 Dota 2。** 五个神经网络,在 10 个月内与自己对战了 [45,000 年的 Dota 2](https://openai.com/index/openai-five-defeats-dota-2-world-champions/),在旧金山直播赛上 2-0 击败了 **OG** -- TI8 世界冠军。随后的公开竞技场中,AI 在 42,729 场比赛中胜率 99.4%。没有脚本化的策略。没有元编程的团队协调逻辑。模型完全通过自我对弈学会了团队协作、战术和实时适应。
|
||||
|
||||
- **2019 -- DeepMind AlphaStar 制霸星际争霸 II。** AlphaStar 在闭门赛中 [10-1 击败职业选手](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/),随后在欧洲服务器上达到[宗师段位](https://www.nature.com/articles/d41586-019-03298-6) -- 90,000 名玩家中的前 0.15%。一个信息不完全、实时决策、组合动作空间远超国际象棋和围棋的游戏。Agent 是什么?是模型。训练出来的。不是编出来的。
|
||||
|
||||
- **2019 -- 腾讯绝悟统治王者荣耀。** 腾讯 AI Lab 的 "绝悟" 于 2019 年 8 月 2 日世冠杯半决赛上[以 5v5 击败 KPL 职业选手](https://www.jiemian.com/article/3371171.html)。在 1v1 模式下,职业选手 [15 场只赢 1 场,最多坚持不到 8 分钟](https://developer.aliyun.com/article/851058)。训练强度:一天等于人类 440 年。到 2021 年,绝悟在全英雄池 BO5 上全面超越 KPL 职业选手水准。没有手工编写的英雄克制表。没有脚本化的阵容编排。一个从零开始通过自我对弈学习整个游戏的模型。
|
||||
|
||||
- **2024-2025 -- LLM Agent 重塑软件工程。** Claude、GPT、Gemini -- 在人类全部代码和推理上训练的大语言模型 -- 被部署为编程 agent。它们阅读代码库,编写实现,调试故障,团队协作。架构与之前每一个 agent 完全相同:一个训练好的模型,放入一个环境,给予感知和行动的工具。唯一的不同是它们学到的东西的规模和解决任务的通用性。
|
||||
|
||||
每一个里程碑都共享同一个真理:**"Agent" 从来都不是外面那层代码。Agent 永远是模型本身。**
|
||||
|
||||
### Agent 不是什么
|
||||
|
||||
"Agent" 这个词已经被一整个提示词水管工产业劫持了。
|
||||
|
||||
拖拽式工作流构建器。无代码 "AI Agent" 平台。提示词链编排库。它们共享同一个幻觉:把 LLM API 调用用 if-else 分支、节点图、硬编码路由逻辑串在一起就算是 "构建 Agent" 了。
|
||||
|
||||
不是的。它们做出来的东西是鲁布·戈德堡机械 -- 一个过度工程化的、脆弱的过程式规则流水线,LLM 被楔在里面当一个美化了的文本补全节点。那不是 Agent。那是一个有着宏大妄想的 shell 脚本。
|
||||
|
||||
**提示词水管工式 "Agent" 是不做模型的程序员的意淫。** 他们试图通过堆叠过程式逻辑来暴力模拟智能 -- 庞大的规则树、节点图、链式提示词瀑布流 -- 然后祈祷足够多的胶水代码能涌现出自主行为。不会的。你不可能通过工程手段编码出 agency。Agency 是学出来的,不是编出来的。
|
||||
|
||||
那些系统从诞生之日起就已经死了:脆弱、不可扩展、根本不具备泛化能力。它们是 GOFAI(Good Old-Fashioned AI,经典符号 AI)的现代还魂 -- 几十年前就被学界抛弃的符号规则系统,现在喷了一层 LLM 的漆又登场了。换了个包装,同一条死路。
|
||||
|
||||
### 心智转换:从 "开发 Agent" 到开发 Harness
|
||||
|
||||
当一个人说 "我在开发 Agent" 时,他只可能是两个意思之一:
|
||||
|
||||
**1. 训练模型。** 通过强化学习、微调、RLHF 或其他基于梯度的方法调整权重。收集任务过程数据 -- 真实领域中感知、推理、行动的实际序列 -- 用它们来塑造模型的行为。这是 DeepMind、OpenAI、腾讯 AI Lab、Anthropic 在做的事。这是最本义的 Agent 开发。
|
||||
|
||||
**2. 构建 Harness。** 编写代码,为模型提供一个可操作的环境。这是我们大多数人在做的事,也是本仓库的核心。
|
||||
|
||||
Harness 是 agent 在特定领域工作所需要的一切:
|
||||
|
||||
```
|
||||
Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
|
||||
|
||||
Tools: 文件读写、Shell、网络、数据库、浏览器
|
||||
Knowledge: 产品文档、领域资料、API 规范、风格指南
|
||||
Observation: git diff、错误日志、浏览器状态、传感器数据
|
||||
Action: CLI 命令、API 调用、UI 交互
|
||||
Permissions: 沙箱隔离、审批流程、信任边界
|
||||
```
|
||||
|
||||
模型做决策。Harness 执行。模型做推理。Harness 提供上下文。模型是驾驶者。Harness 是载具。
|
||||
|
||||
**编程 agent 的 harness 是它的 IDE、终端和文件系统。** 农业 agent 的 harness 是传感器阵列、灌溉控制和气象数据。酒店 agent 的 harness 是预订系统、客户沟通渠道和设施管理 API。Agent -- 那个智能、那个决策者 -- 永远是模型。Harness 因领域而变。Agent 跨领域泛化。
|
||||
|
||||
这个仓库教你造载具。编程用的载具。但设计模式可以泛化到任何领域:庄园管理、农田运营、酒店运作、工厂制造、物流调度、医疗保健、教育培训、科学研究。只要有一个任务需要被感知、推理和执行 -- agent 就需要一个 harness。
|
||||
|
||||
### Harness 工程师到底在做什么
|
||||
|
||||
如果你在读这个仓库,你很可能是一名 harness 工程师 -- 这是一个强大的身份。以下是你真正的工作:
|
||||
|
||||
- **实现工具。** 给 agent 一双手。文件读写、Shell 执行、API 调用、浏览器控制、数据库查询。每个工具都是 agent 在环境中可以采取的一个行动。设计它们时要原子化、可组合、描述清晰。
|
||||
|
||||
- **策划知识。** 给 agent 领域专长。产品文档、架构决策记录、风格指南、合规要求。按需加载(s05),不要前置塞入。Agent 应该知道有什么可用,然后自己拉取所需。
|
||||
|
||||
- **管理上下文。** 给 agent 干净的记忆。子 agent 隔离(s04)防止噪声泄露。上下文压缩(s06)防止历史淹没。任务系统(s07)让目标持久化到单次对话之外。
|
||||
|
||||
- **控制权限。** 给 agent 边界。沙箱化文件访问。对破坏性操作要求审批。在 agent 和外部系统之间实施信任边界。这是安全工程与 harness 工程的交汇点。
|
||||
|
||||
- **收集任务过程数据。** Agent 在你的 harness 中执行的每一条行动序列都是训练信号。真实部署中的感知-推理-行动轨迹是微调下一代 agent 模型的原材料。你的 harness 不仅服务于 agent -- 它还可以帮助进化 agent。
|
||||
|
||||
你不是在编写智能。你是在构建智能栖居的世界。这个世界的质量 -- agent 能看得多清楚、行动得多精准、可用知识有多丰富 -- 直接决定了智能能多有效地表达自己。
|
||||
|
||||
**造好 Harness。Agent 会完成剩下的。**
|
||||
|
||||
### 为什么是 Claude Code -- Harness 工程的大师课
|
||||
|
||||
为什么这个仓库专门拆解 Claude Code?
|
||||
|
||||
因为 Claude Code 是我们所见过的最优雅、最完整的 agent harness 实现。不是因为某个巧妙的技巧,而是因为它 *没做* 的事:它没有试图成为 agent 本身。它没有强加僵化的工作流。它没有用精心设计的决策树去替模型做判断。它给模型提供了工具、知识、上下文管理和权限边界 -- 然后让开了。
|
||||
|
||||
把 Claude Code 剥到本质来看:
|
||||
|
||||
```
|
||||
Claude Code = 一个 agent loop
|
||||
+ 工具 (bash, read, write, edit, glob, grep, browser...)
|
||||
+ 按需 skill 加载
|
||||
+ 上下文压缩
|
||||
+ 子 agent 派生
|
||||
+ 带依赖图的任务系统
|
||||
+ 异步邮箱的团队协调
|
||||
+ worktree 隔离的并行执行
|
||||
+ 权限治理
|
||||
```
|
||||
|
||||
就这些。这就是全部架构。每一个组件都是 harness 机制 -- 为 agent 构建的栖居世界的一部分。Agent 本身呢?是 Claude。一个模型。由 Anthropic 在人类推理和代码的全部广度上训练而成。Harness 没有让 Claude 变聪明。Claude 本来就聪明。Harness 给了 Claude 双手、双眼和一个工作空间。
|
||||
|
||||
这就是 Claude Code 作为教学标本的意义:**它展示了当你信任模型、把工程精力集中在 harness 上时会发生什么。** 本仓库的每一个课程(s01-s12)都在逆向工程 Claude Code 架构中的一个 harness 机制。学完之后,你理解的不只是 Claude Code 怎么工作,而是适用于任何领域、任何 agent 的 harness 工程通用原则。
|
||||
|
||||
启示不是 "复制 Claude Code"。启示是:**最好的 agent 产品,出自那些明白自己的工作是 harness 而非 intelligence 的工程师之手。**
|
||||
|
||||
---
|
||||
|
||||
## 愿景:用真正的 Agent 铺满宇宙
|
||||
|
||||
这不只关乎编程 agent。
|
||||
|
||||
每一个人类从事复杂、多步骤、需要判断力的工作的领域,都是 agent 可以运作的领域 -- 只要有对的 harness。本仓库中的模式是通用的:
|
||||
|
||||
```
|
||||
庄园管理 agent = 模型 + 物业传感器 + 维护工具 + 租户通信
|
||||
农业 agent = 模型 + 土壤/气象数据 + 灌溉控制 + 作物知识
|
||||
酒店运营 agent = 模型 + 预订系统 + 客户渠道 + 设施 API
|
||||
医学研究 agent = 模型 + 文献检索 + 实验仪器 + 协议文档
|
||||
制造业 agent = 模型 + 产线传感器 + 质量控制 + 物流系统
|
||||
教育 agent = 模型 + 课程知识 + 学生进度 + 评估工具
|
||||
```
|
||||
|
||||
循环永远不变。工具在变。知识在变。权限在变。Agent -- 那个模型 -- 泛化一切。
|
||||
|
||||
每一个读这个仓库的 harness 工程师都在学习远超软件工程的模式。你在学习为一个智能的、自动化的未来构建基础设施。每一个部署在真实领域的好 harness,都是 agent 能够感知、推理、行动的又一个阵地。
|
||||
|
||||
先铺满工作室。然后是农田、医院、工厂。然后是城市。然后是星球。
|
||||
|
||||
**Bash is all you need. Real agents are all the universe needs.**
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
THE AGENT PATTERN
|
||||
=================
|
||||
@ -17,12 +152,15 @@
|
||||
loop back -----------------> messages[]
|
||||
|
||||
|
||||
这是最小循环。每个 AI 编程 Agent 都需要这个循环。
|
||||
生产级 Agent 还会叠加策略、权限与生命周期层。
|
||||
这是最小循环。每个 AI Agent 都需要这个循环。
|
||||
模型决定何时调用工具、何时停止。
|
||||
代码只是执行模型的要求。
|
||||
本仓库教你构建围绕这个循环的一切 --
|
||||
让 agent 在特定领域高效工作的 harness。
|
||||
```
|
||||
|
||||
**12 个递进式课程, 从简单循环到隔离化的自治执行。**
|
||||
**每个课程添加一个机制。每个机制有一句格言。**
|
||||
**每个课程添加一个 harness 机制。每个机制有一句格言。**
|
||||
|
||||
> **s01** *"One loop & Bash is all you need"* — 一个工具 + 一个循环 = 一个智能体
|
||||
>
|
||||
@ -77,14 +215,14 @@ def agent_loop(messages):
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
每个课程在这个循环之上叠加一个机制 -- 循环本身始终不变。
|
||||
每个课程在这个循环之上叠加一个 harness 机制 -- 循环本身始终不变。循环属于 agent。机制属于 harness。
|
||||
|
||||
## 范围说明 (重要)
|
||||
|
||||
本仓库是一个 0->1 的学习型项目,用于从零构建 nano Claude Code-like agent。
|
||||
本仓库是一个 0->1 的 harness 工程学习项目 -- 构建围绕 agent 模型的工作环境。
|
||||
为保证学习路径清晰,仓库有意简化或省略了部分生产机制:
|
||||
|
||||
- 完整事件 / Hook 总线 (例如 PreToolUse、SessionStart/End、ConfigChange)。
|
||||
- 完整事件 / Hook 总线 (例如 PreToolUse、SessionStart/End、ConfigChange)。
|
||||
s12 仅提供教学用途的最小 append-only 生命周期事件流。
|
||||
- 基于规则的权限治理与信任流程
|
||||
- 会话生命周期控制 (resume/fork) 与更完整的 worktree 生命周期控制
|
||||
@ -181,7 +319,7 @@ learn-claude-code/
|
||||
|
||||
## 学完之后 -- 从理解到落地
|
||||
|
||||
12 个课程走完, 你已经从内到外理解了 agent 的工作原理。两种方式把知识变成产品:
|
||||
12 个课程走完, 你已经从内到外理解了 harness 工程的运作原理。两种方式把知识变成产品:
|
||||
|
||||
### Kode Agent CLI -- 开源 Coding Agent CLI
|
||||
|
||||
@ -201,16 +339,16 @@ GitHub: **[shareAI-lab/Kode-agent-sdk](https://github.com/shareAI-lab/Kode-agent
|
||||
|
||||
## 姊妹教程: 从*被动临时会话*到*主动常驻助手*
|
||||
|
||||
本仓库教的 agent 属于 **用完即走** 型 -- 开终端、给任务、做完关掉, 下次重开是全新会话。Claude Code 就是这种模式。
|
||||
本仓库教的 harness 属于 **用完即走** 型 -- 开终端、给 agent 任务、做完关掉, 下次重开是全新会话。Claude Code 就是这种模式。
|
||||
|
||||
但 [OpenClaw](https://github.com/openclaw/openclaw) (小龙虾) 证明了另一种可能: 在同样的 agent core 之上, 加两个机制就能让 agent 从"踹一下动一下"变成"自己隔 30 秒醒一次找活干":
|
||||
但 [OpenClaw](https://github.com/openclaw/openclaw) 证明了另一种可能: 在同样的 agent core 之上, 加两个 harness 机制就能让 agent 从 "踹一下动一下" 变成 "自己隔 30 秒醒一次找活干":
|
||||
|
||||
- **心跳 (Heartbeat)** -- 每 30 秒系统给 agent 发一条消息, 让它检查有没有事可做。没事就继续睡, 有事立刻行动。
|
||||
- **心跳 (Heartbeat)** -- 每 30 秒 harness 给 agent 发一条消息, 让它检查有没有事可做。没事就继续睡, 有事立刻行动。
|
||||
- **定时任务 (Cron)** -- agent 可以给自己安排未来要做的事, 到点自动执行。
|
||||
|
||||
再加上 IM 多通道路由 (WhatsApp/Telegram/Slack/Discord 等 13+ 平台)、不清空的上下文记忆、Soul 人格系统, agent 就从一个临时工具变成了始终在线的个人 AI 助手。
|
||||
|
||||
**[claw0](https://github.com/shareAI-lab/claw0)** 是我们的姊妹教学仓库, 从零拆解这些机制:
|
||||
**[claw0](https://github.com/shareAI-lab/claw0)** 是我们的姊妹教学仓库, 从零拆解这些 harness 机制:
|
||||
|
||||
```
|
||||
claw agent = agent core + heartbeat + cron + IM chat + memory + soul
|
||||
@ -218,7 +356,7 @@ claw agent = agent core + heartbeat + cron + IM chat + memory + soul
|
||||
|
||||
```
|
||||
learn-claude-code claw0
|
||||
(agent 运行时内核: (主动式常驻 AI 助手:
|
||||
(agent harness 内核: (主动式常驻 harness:
|
||||
循环、工具、规划、 心跳、定时任务、IM 通道、
|
||||
团队、worktree 隔离) 记忆、Soul 人格)
|
||||
```
|
||||
@ -229,4 +367,6 @@ MIT
|
||||
|
||||
---
|
||||
|
||||
**模型就是智能体。我们的工作就是给它工具, 然后让开。**
|
||||
**模型就是 Agent。代码是 Harness。造好 Harness,Agent 会完成剩下的。**
|
||||
|
||||
**Bash is all you need. Real agents are all the universe needs.**
|
||||
|
||||
174
README.md
174
README.md
@ -1,5 +1,140 @@
|
||||
[English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md)
|
||||
# Learn Claude Code -- A nano Claude Code-like agent, built from 0 to 1
|
||||
[English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md)
|
||||
# Learn Claude Code -- Harness Engineering for Real Agents
|
||||
|
||||
## The Model IS the Agent
|
||||
|
||||
Before we talk about code, let's get one thing absolutely straight.
|
||||
|
||||
**An agent is a model. Not a framework. Not a prompt chain. Not a drag-and-drop workflow.**
|
||||
|
||||
### What an Agent IS
|
||||
|
||||
An agent is a neural network -- a Transformer, an RNN, a learned function -- that has been trained, through billions of gradient updates on action-sequence data, to perceive an environment, reason about goals, and take actions to achieve them. The word "agent" in AI has always meant this. Always.
|
||||
|
||||
A human is an agent. A biological neural network, shaped by millions of years of evolutionary training, perceiving the world through senses, reasoning through a brain, acting through a body. When DeepMind, OpenAI, or Anthropic say "agent," they mean the same thing the field has meant since its inception: **a model that has learned to act.**
|
||||
|
||||
The proof is written in history:
|
||||
|
||||
- **2013 -- DeepMind DQN plays Atari.** A single neural network, receiving only raw pixels and game scores, learned to play 7 Atari 2600 games -- surpassing all prior algorithms and beating human experts on 3 of them. By 2015, the same architecture scaled to [49 games and matched professional human testers](https://www.nature.com/articles/nature14236), published in *Nature*. No game-specific rules. No decision trees. One model, learning from experience. That model was the agent.
|
||||
|
||||
- **2019 -- OpenAI Five conquers Dota 2.** Five neural networks, having played [45,000 years of Dota 2](https://openai.com/index/openai-five-defeats-dota-2-world-champions/) against themselves in 10 months, defeated **OG** -- the reigning TI8 world champions -- 2-0 on a San Francisco livestream. In a subsequent public arena, the AI won 99.4% of 42,729 games against all comers. No scripted strategies. No meta-programmed team coordination. The models learned teamwork, tactics, and real-time adaptation entirely through self-play.
|
||||
|
||||
- **2019 -- DeepMind AlphaStar masters StarCraft II.** AlphaStar [beat professional players 10-1](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/) in a closed-door match, and later achieved [Grandmaster status](https://www.nature.com/articles/d41586-019-03298-6) on European servers -- top 0.15% of 90,000 players. A game with imperfect information, real-time decisions, and a combinatorial action space that dwarfs chess and Go. The agent? A model. Trained. Not scripted.
|
||||
|
||||
- **2019 -- Tencent Jueyu dominates Honor of Kings.** Tencent AI Lab's "Jueyu" [defeated KPL professional players](https://www.jiemian.com/article/3371171.html) in a full 5v5 match at the World Champion Cup. In 1v1 mode, pros won only [1 out of 15 games and never survived past 8 minutes](https://developer.aliyun.com/article/851058). Training intensity: one day equaled 440 human years. By 2021, Jueyu surpassed KPL pros across the full hero pool. No handcrafted matchup tables. No scripted compositions. A model that learned the entire game from scratch through self-play.
|
||||
|
||||
- **2024-2025 -- LLM agents reshape software engineering.** Claude, GPT, Gemini -- large language models trained on the entirety of human code and reasoning -- are deployed as coding agents. They read codebases, write implementations, debug failures, coordinate in teams. The architecture is identical to every agent before them: a trained model, placed in an environment, given tools to perceive and act. The only difference is the scale of what they've learned and the generality of the tasks they solve.
|
||||
|
||||
Every one of these milestones shares the same truth: **the "agent" is never the surrounding code. The agent is always the model.**
|
||||
|
||||
### What an Agent Is NOT
|
||||
|
||||
The word "agent" has been hijacked by an entire cottage industry of prompt plumbing.
|
||||
|
||||
Drag-and-drop workflow builders. No-code "AI agent" platforms. Prompt-chain orchestration libraries. They all share the same delusion: that wiring together LLM API calls with if-else branches, node graphs, and hardcoded routing logic constitutes "building an agent."
|
||||
|
||||
It doesn't. What they build is a Rube Goldberg machine -- an over-engineered, brittle pipeline of procedural rules, with an LLM wedged in as a glorified text-completion node. That is not an agent. That is a shell script with delusions of grandeur.
|
||||
|
||||
**Prompt plumbing "agents" are the fantasy of programmers who don't train models.** They attempt to brute-force intelligence by stacking procedural logic -- massive rule trees, node graphs, chain-of-prompt waterfalls -- and praying that enough glue code will somehow emergently produce autonomous behavior. It won't. You cannot engineer your way to agency. Agency is learned, not programmed.
|
||||
|
||||
Those systems are dead on arrival: fragile, unscalable, fundamentally incapable of generalization. They are the modern resurrection of GOFAI (Good Old-Fashioned AI) -- the symbolic rule systems the field abandoned decades ago, now spray-painted with an LLM veneer. Different packaging, same dead end.
|
||||
|
||||
### The Mind Shift: From "Developing Agents" to Developing Harness
|
||||
|
||||
When someone says "I'm developing an agent," they can only mean one of two things:
|
||||
|
||||
**1. Training the model.** Adjusting weights through reinforcement learning, fine-tuning, RLHF, or other gradient-based methods. Collecting task-process data -- the actual sequences of perception, reasoning, and action in real domains -- and using it to shape the model's behavior. This is what DeepMind, OpenAI, Tencent AI Lab, and Anthropic do. This is agent development in the truest sense.
|
||||
|
||||
**2. Building the harness.** Writing the code that gives the model an environment to operate in. This is what most of us do, and it is the focus of this repository.
|
||||
|
||||
A harness is everything the agent needs to function in a specific domain:
|
||||
|
||||
```
|
||||
Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
|
||||
|
||||
Tools: file I/O, shell, network, database, browser
|
||||
Knowledge: product docs, domain references, API specs, style guides
|
||||
Observation: git diff, error logs, browser state, sensor data
|
||||
Action: CLI commands, API calls, UI interactions
|
||||
Permissions: sandboxing, approval workflows, trust boundaries
|
||||
```
|
||||
|
||||
The model decides. The harness executes. The model reasons. The harness provides context. The model is the driver. The harness is the vehicle.
|
||||
|
||||
**A coding agent's harness is its IDE, terminal, and filesystem access.** A farm agent's harness is its sensor array, irrigation controls, and weather data feeds. A hotel agent's harness is its booking system, guest communication channels, and facility management APIs. The agent -- the intelligence, the decision-maker -- is always the model. The harness changes per domain. The agent generalizes across them.
|
||||
|
||||
This repo teaches you to build vehicles. Vehicles for coding. But the design patterns generalize to any domain: farm management, hotel operations, manufacturing, logistics, healthcare, education, scientific research. Anywhere a task needs to be perceived, reasoned about, and acted upon -- an agent needs a harness.
|
||||
|
||||
### What Harness Engineers Actually Do
|
||||
|
||||
If you are reading this repository, you are likely a harness engineer -- and that is a powerful thing to be. Here is your real job:
|
||||
|
||||
- **Implement tools.** Give the agent hands. File read/write, shell execution, API calls, browser control, database queries. Each tool is an action the agent can take in its environment. Design them to be atomic, composable, and well-described.
|
||||
|
||||
- **Curate knowledge.** Give the agent domain expertise. Product documentation, architectural decision records, style guides, regulatory requirements. Load them on-demand (s05), not upfront. The agent should know what's available and pull what it needs.
|
||||
|
||||
- **Manage context.** Give the agent clean memory. Subagent isolation (s04) prevents noise from leaking. Context compression (s06) prevents history from overwhelming. Task systems (s07) persist goals beyond any single conversation.
|
||||
|
||||
- **Control permissions.** Give the agent boundaries. Sandbox file access. Require approval for destructive operations. Enforce trust boundaries between the agent and external systems. This is where safety engineering meets harness engineering.
|
||||
|
||||
- **Collect task-process data.** Every action sequence the agent executes in your harness is training signal. The perception-reasoning-action traces from real deployments are the raw material for fine-tuning the next generation of agent models. Your harness doesn't just serve the agent -- it can help improve the agent.
|
||||
|
||||
You are not writing the intelligence. You are building the world the intelligence inhabits. The quality of that world -- how clearly the agent can perceive, how precisely it can act, how rich its available knowledge is -- directly determines how effectively the intelligence can express itself.
|
||||
|
||||
**Build great harnesses. The agent will do the rest.**
|
||||
|
||||
### Why Claude Code -- A Masterclass in Harness Engineering
|
||||
|
||||
Why does this repository dissect Claude Code specifically?
|
||||
|
||||
Because Claude Code is the most elegant and fully-realized agent harness we have seen. Not because of any single clever trick, but because of what it *doesn't* do: it doesn't try to be the agent. It doesn't impose rigid workflows. It doesn't second-guess the model with elaborate decision trees. It provides the model with tools, knowledge, context management, and permission boundaries -- then gets out of the way.
|
||||
|
||||
Look at what Claude Code actually is, stripped to its essence:
|
||||
|
||||
```
|
||||
Claude Code = one agent loop
|
||||
+ tools (bash, read, write, edit, glob, grep, browser...)
|
||||
+ on-demand skill loading
|
||||
+ context compression
|
||||
+ subagent spawning
|
||||
+ task system with dependency graph
|
||||
+ team coordination with async mailboxes
|
||||
+ worktree isolation for parallel execution
|
||||
+ permission governance
|
||||
```
|
||||
|
||||
That's it. That's the entire architecture. Every component is a harness mechanism -- a piece of the world built for the agent to inhabit. The agent itself? It's Claude. A model. Trained by Anthropic on the full breadth of human reasoning and code. The harness doesn't make Claude smart. Claude is already smart. The harness gives Claude hands, eyes, and a workspace.
|
||||
|
||||
This is why Claude Code is the ideal teaching subject: **it demonstrates what happens when you trust the model and focus your engineering on the harness.** Every session in this repository (s01-s12) reverse-engineers one harness mechanism from Claude Code's architecture. By the end, you understand not just how Claude Code works, but the universal principles of harness engineering that apply to any agent in any domain.
|
||||
|
||||
The lesson is not "copy Claude Code." The lesson is: **the best agent products are built by engineers who understand that their job is harness, not intelligence.**
|
||||
|
||||
---
|
||||
|
||||
## The Vision: Fill the Universe with Real Agents
|
||||
|
||||
This is not just about coding agents.
|
||||
|
||||
Every domain where humans perform complex, multi-step, judgment-intensive work is a domain where agents can operate -- given the right harness. The patterns in this repository are universal:
|
||||
|
||||
```
|
||||
Estate management agent = model + property sensors + maintenance tools + tenant comms
|
||||
Agricultural agent = model + soil/weather data + irrigation controls + crop knowledge
|
||||
Hotel operations agent = model + booking system + guest channels + facility APIs
|
||||
Medical research agent = model + literature search + lab instruments + protocol docs
|
||||
Manufacturing agent = model + production line sensors + quality controls + logistics
|
||||
Education agent = model + curriculum knowledge + student progress + assessment tools
|
||||
```
|
||||
|
||||
The loop is always the same. The tools change. The knowledge changes. The permissions change. The agent -- the model -- generalizes.
|
||||
|
||||
Every harness engineer reading this repository is learning patterns that apply far beyond software engineering. You are learning to build the infrastructure for an intelligent, automated future. Every well-designed harness deployed in a real domain is one more place where an agent can perceive, reason, and act.
|
||||
|
||||
First we fill the workshops. Then the farms, the hospitals, the factories. Then the cities. Then the planet.
|
||||
|
||||
**Bash is all you need. Real agents are all the universe needs.**
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
THE AGENT PATTERN
|
||||
@ -16,12 +151,15 @@
|
||||
loop back -----------------> messages[]
|
||||
|
||||
|
||||
That's the minimal loop. Every AI coding agent needs this loop.
|
||||
Production agents add policy, permissions, and lifecycle layers.
|
||||
That's the minimal loop. Every AI agent needs this loop.
|
||||
The MODEL decides when to call tools and when to stop.
|
||||
The CODE just executes what the model asks for.
|
||||
This repo teaches you to build what surrounds this loop --
|
||||
the harness that makes the agent effective in a specific domain.
|
||||
```
|
||||
|
||||
**12 progressive sessions, from a simple loop to isolated autonomous execution.**
|
||||
**Each session adds one mechanism. Each mechanism has one motto.**
|
||||
**Each session adds one harness mechanism. Each mechanism has one motto.**
|
||||
|
||||
> **s01** *"One loop & Bash is all you need"* — one tool + one loop = an agent
|
||||
>
|
||||
@ -76,14 +214,14 @@ def agent_loop(messages):
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
Every session layers one mechanism on top of this loop -- without changing the loop itself.
|
||||
Every session layers one harness mechanism on top of this loop -- without changing the loop itself. The loop belongs to the agent. The mechanisms belong to the harness.
|
||||
|
||||
## Scope (Important)
|
||||
|
||||
This repository is a 0->1 learning project for building a nano Claude Code-like agent.
|
||||
This repository is a 0->1 learning project for harness engineering -- building the environment that surrounds an agent model.
|
||||
It intentionally simplifies or omits several production mechanisms:
|
||||
|
||||
- Full event/hook buses (for example PreToolUse, SessionStart/End, ConfigChange).
|
||||
- Full event/hook buses (for example PreToolUse, SessionStart/End, ConfigChange).
|
||||
s12 includes only a minimal append-only lifecycle event stream for teaching.
|
||||
- Rule-based permission governance and trust workflows
|
||||
- Session lifecycle controls (resume/fork) and advanced worktree lifecycle controls
|
||||
@ -180,7 +318,7 @@ Available in [English](./docs/en/) | [中文](./docs/zh/) | [日本語](./docs/j
|
||||
|
||||
## What's Next -- from understanding to shipping
|
||||
|
||||
After the 12 sessions you understand how an agent works inside out. Two ways to put that knowledge to work:
|
||||
After the 12 sessions you understand how harness engineering works inside out. Two ways to put that knowledge to work:
|
||||
|
||||
### Kode Agent CLI -- Open-Source Coding Agent CLI
|
||||
|
||||
@ -200,16 +338,16 @@ GitHub: **[shareAI-lab/Kode-agent-sdk](https://github.com/shareAI-lab/Kode-agent
|
||||
|
||||
## Sister Repo: from *on-demand sessions* to *always-on assistant*
|
||||
|
||||
The agent this repo teaches is **use-and-discard** -- open a terminal, give it a task, close when done, next session starts blank. That is the Claude Code model.
|
||||
The harness this repo teaches is **use-and-discard** -- open a terminal, give the agent a task, close when done, next session starts blank. That is the Claude Code model.
|
||||
|
||||
[OpenClaw](https://github.com/openclaw/openclaw) proved another possibility: on top of the same agent core, two mechanisms turn the agent from "poke it to make it move" into "it wakes up every 30 seconds to look for work":
|
||||
[OpenClaw](https://github.com/openclaw/openclaw) proved another possibility: on top of the same agent core, two harness mechanisms turn the agent from "poke it to make it move" into "it wakes up every 30 seconds to look for work":
|
||||
|
||||
- **Heartbeat** -- every 30s the system sends the agent a message to check if there is anything to do. Nothing? Go back to sleep. Something? Act immediately.
|
||||
- **Heartbeat** -- every 30s the harness sends the agent a message to check if there is anything to do. Nothing? Go back to sleep. Something? Act immediately.
|
||||
- **Cron** -- the agent can schedule its own future tasks, executed automatically when the time comes.
|
||||
|
||||
Add multi-channel IM routing (WhatsApp / Telegram / Slack / Discord, 13+ platforms), persistent context memory, and a Soul personality system, and the agent goes from a disposable tool to an always-on personal AI assistant.
|
||||
|
||||
**[claw0](https://github.com/shareAI-lab/claw0)** is our companion teaching repo that deconstructs these mechanisms from scratch:
|
||||
**[claw0](https://github.com/shareAI-lab/claw0)** is our companion teaching repo that deconstructs these harness mechanisms from scratch:
|
||||
|
||||
```
|
||||
claw agent = agent core + heartbeat + cron + IM chat + memory + soul
|
||||
@ -217,7 +355,7 @@ claw agent = agent core + heartbeat + cron + IM chat + memory + soul
|
||||
|
||||
```
|
||||
learn-claude-code claw0
|
||||
(agent runtime core: (proactive always-on assistant:
|
||||
(agent harness core: (proactive always-on harness:
|
||||
loop, tools, planning, heartbeat, cron, IM channels,
|
||||
teams, worktree isolation) memory, soul personality)
|
||||
```
|
||||
@ -225,8 +363,8 @@ learn-claude-code claw0
|
||||
## About
|
||||
<img width="260" src="https://github.com/user-attachments/assets/fe8b852b-97da-4061-a467-9694906b5edf" /><br>
|
||||
|
||||
Scan with Wechat to fellow us,
|
||||
or fellow on X: [shareAI-Lab](https://x.com/baicai003)
|
||||
Scan with Wechat to follow us,
|
||||
or follow on X: [shareAI-Lab](https://x.com/baicai003)
|
||||
|
||||
## License
|
||||
|
||||
@ -234,4 +372,6 @@ MIT
|
||||
|
||||
---
|
||||
|
||||
**The model is the agent. Our job is to give it tools and stay out of the way.**
|
||||
**The model is the agent. The code is the harness. Build great harnesses. The agent will do the rest.**
|
||||
|
||||
**Bash is all you need. Real agents are all the universe needs.**
|
||||
|
||||
@ -1,2 +1,3 @@
|
||||
# agents/ - Python teaching agents (s01-s12) + reference agent (s_full)
|
||||
# agents/ - Harness implementations (s01-s12) + full reference (s_full)
|
||||
# Each file is self-contained and runnable: python agents/s01_agent_loop.py
|
||||
# The model is the agent. These files are the harness.
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: the loop -- the model's first connection to the real world.
|
||||
"""
|
||||
s01_agent_loop.py - The Agent Loop
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: tool dispatch -- expanding what the model can reach.
|
||||
"""
|
||||
s02_tool_use.py - Tools
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: planning -- keeping the model on course without scripting the route.
|
||||
"""
|
||||
s03_todo_write.py - TodoWrite
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: context isolation -- protecting the model's clarity of thought.
|
||||
"""
|
||||
s04_subagent.py - Subagents
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: on-demand knowledge -- domain expertise, loaded when the model asks.
|
||||
"""
|
||||
s05_skill_loading.py - Skills
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: compression -- clean memory for infinite sessions.
|
||||
"""
|
||||
s06_context_compact.py - Compact
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: persistent tasks -- goals that outlive any single conversation.
|
||||
"""
|
||||
s07_task_system.py - Tasks
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: background execution -- the model thinks while the harness waits.
|
||||
"""
|
||||
s08_background_tasks.py - Background Tasks
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: team mailboxes -- multiple models, coordinated through files.
|
||||
"""
|
||||
s09_agent_teams.py - Agent Teams
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: protocols -- structured handshakes between models.
|
||||
"""
|
||||
s10_team_protocols.py - Team Protocols
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: autonomy -- models that find work without being told.
|
||||
"""
|
||||
s11_autonomous_agents.py - Autonomous Agents
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: directory isolation -- parallel execution lanes that never collide.
|
||||
"""
|
||||
s12_worktree_task_isolation.py - Worktree + Task Isolation
|
||||
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
# Harness: all mechanisms combined -- the complete cockpit for the model.
|
||||
"""
|
||||
s_full.py - Full Reference Agent
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`[ s01 ] s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"One loop & Bash is all you need"* -- one tool + one loop = an agent.
|
||||
>
|
||||
> **Harness layer**: The loop -- the model's first connection to the real world.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > [ s02 ] s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"Adding a tool means adding one handler"* -- the loop stays the same; new tools register into the dispatch map.
|
||||
>
|
||||
> **Harness layer**: Tool dispatch -- expanding what the model can reach.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > [ s03 ] s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"An agent without a plan drifts"* -- list the steps first, then execute.
|
||||
>
|
||||
> **Harness layer**: Planning -- keeping the model on course without scripting the route.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > [ s04 ] s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"Break big tasks down; each subtask gets a clean context"* -- subagents use independent messages[], keeping the main conversation clean.
|
||||
>
|
||||
> **Harness layer**: Context isolation -- protecting the model's clarity of thought.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > [ s05 ] s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"Load knowledge when you need it, not upfront"* -- inject via tool_result, not the system prompt.
|
||||
>
|
||||
> **Harness layer**: On-demand knowledge -- domain expertise, loaded when the model asks.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > [ s06 ] | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"Context will fill up; you need a way to make room"* -- three-layer compression strategy for infinite sessions.
|
||||
>
|
||||
> **Harness layer**: Compression -- clean memory for infinite sessions.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | [ s07 ] s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"Break big goals into small tasks, order them, persist to disk"* -- a file-based task graph with dependencies, laying the foundation for multi-agent collaboration.
|
||||
>
|
||||
> **Harness layer**: Persistent tasks -- goals that outlive any single conversation.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > [ s08 ] s09 > s10 > s11 > s12`
|
||||
|
||||
> *"Run slow operations in the background; the agent keeps thinking"* -- daemon threads run commands, inject notifications on completion.
|
||||
>
|
||||
> **Harness layer**: Background execution -- the model thinks while the harness waits.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > [ s09 ] s10 > s11 > s12`
|
||||
|
||||
> *"When the task is too big for one, delegate to teammates"* -- persistent teammates + async mailboxes.
|
||||
>
|
||||
> **Harness layer**: Team mailboxes -- multiple models, coordinated through files.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > [ s10 ] s11 > s12`
|
||||
|
||||
> *"Teammates need shared communication rules"* -- one request-response pattern drives all negotiation.
|
||||
>
|
||||
> **Harness layer**: Protocols -- structured handshakes between models.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > [ s11 ] s12`
|
||||
|
||||
> *"Teammates scan the board and claim tasks themselves"* -- no need for the lead to assign each one.
|
||||
>
|
||||
> **Harness layer**: Autonomy -- models that find work without being told.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > [ s12 ]`
|
||||
|
||||
> *"Each works in its own directory, no interference"* -- tasks manage goals, worktrees manage directories, bound by ID.
|
||||
>
|
||||
> **Harness layer**: Directory isolation -- parallel execution lanes that never collide.
|
||||
|
||||
## Problem
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`[ s01 ] s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"One loop & Bash is all you need"* -- 1つのツール + 1つのループ = エージェント。
|
||||
>
|
||||
> **Harness 層**: ループ -- モデルと現実世界を繋ぐ最初の接点。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > [ s02 ] s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"ツールを足すなら、ハンドラーを1つ足すだけ"* -- ループは変わらない。新ツールは dispatch map に登録するだけ。
|
||||
>
|
||||
> **Harness 層**: ツール分配 -- モデルが届く範囲を広げる。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > [ s03 ] s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"計画のないエージェントは行き当たりばったり"* -- まずステップを書き出し、それから実行。
|
||||
>
|
||||
> **Harness 層**: 計画 -- 航路を描かずにモデルを軌道に乗せる。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > [ s04 ] s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"大きなタスクを分割し、各サブタスクにクリーンなコンテキストを"* -- サブエージェントは独立した messages[] を使い、メイン会話を汚さない。
|
||||
>
|
||||
> **Harness 層**: コンテキスト隔離 -- モデルの思考の明晰さを守る。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > [ s05 ] s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"必要な知識を、必要な時に読み込む"* -- system prompt ではなく tool_result で注入。
|
||||
>
|
||||
> **Harness 層**: オンデマンド知識 -- モデルが求めた時だけ渡すドメイン専門性。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > [ s06 ] | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"コンテキストはいつか溢れる、空ける手段が要る"* -- 3層圧縮で無限セッションを実現。
|
||||
>
|
||||
> **Harness 層**: 圧縮 -- クリーンな記憶、無限のセッション。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | [ s07 ] s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"大きな目標を小タスクに分解し、順序付けし、ディスクに記録する"* -- ファイルベースのタスクグラフ、マルチエージェント協調の基盤。
|
||||
>
|
||||
> **Harness 層**: 永続タスク -- どの会話よりも長く生きる目標。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > [ s08 ] s09 > s10 > s11 > s12`
|
||||
|
||||
> *"遅い操作はバックグラウンドへ、エージェントは次を考え続ける"* -- デーモンスレッドがコマンド実行、完了後に通知を注入。
|
||||
>
|
||||
> **Harness 層**: バックグラウンド実行 -- モデルが考え続ける間、Harness が待つ。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > [ s09 ] s10 > s11 > s12`
|
||||
|
||||
> *"一人で終わらないなら、チームメイトに任せる"* -- 永続チームメイト + 非同期メールボックス。
|
||||
>
|
||||
> **Harness 層**: チームメールボックス -- 複数モデルをファイルで協調。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > [ s10 ] s11 > s12`
|
||||
|
||||
> *"チームメイト間には統一の通信ルールが必要"* -- 1つの request-response パターンが全交渉を駆動。
|
||||
>
|
||||
> **Harness 層**: プロトコル -- モデル間の構造化されたハンドシェイク。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > [ s11 ] s12`
|
||||
|
||||
> *"チームメイトが自らボードを見て、仕事を取る"* -- リーダーが逐一割り振る必要はない。
|
||||
>
|
||||
> **Harness 層**: 自律 -- 指示なしで仕事を見つけるモデル。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > [ s12 ]`
|
||||
|
||||
> *"各自のディレクトリで作業し、互いに干渉しない"* -- タスクは目標を管理、worktree はディレクトリを管理、IDで紐付け。
|
||||
>
|
||||
> **Harness 層**: ディレクトリ隔離 -- 決して衝突しない並列実行レーン。
|
||||
|
||||
## 問題
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`[ s01 ] s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"One loop & Bash is all you need"* -- 一个工具 + 一个循环 = 一个智能体。
|
||||
>
|
||||
> **Harness 层**: 循环 -- 模型与真实世界的第一道连接。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > [ s02 ] s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"加一个工具, 只加一个 handler"* -- 循环不用动, 新工具注册进 dispatch map 就行。
|
||||
>
|
||||
> **Harness 层**: 工具分发 -- 扩展模型能触达的边界。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > [ s03 ] s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"没有计划的 agent 走哪算哪"* -- 先列步骤再动手, 完成率翻倍。
|
||||
>
|
||||
> **Harness 层**: 规划 -- 让模型不偏航, 但不替它画航线。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > [ s04 ] s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"大任务拆小, 每个小任务干净的上下文"* -- 子智能体用独立 messages[], 不污染主对话。
|
||||
>
|
||||
> **Harness 层**: 上下文隔离 -- 守护模型的思维清晰度。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > [ s05 ] s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"用到什么知识, 临时加载什么知识"* -- 通过 tool_result 注入, 不塞 system prompt。
|
||||
>
|
||||
> **Harness 层**: 按需知识 -- 模型开口要时才给的领域专长。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > [ s06 ] | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"上下文总会满, 要有办法腾地方"* -- 三层压缩策略, 换来无限会话。
|
||||
>
|
||||
> **Harness 层**: 压缩 -- 干净的记忆, 无限的会话。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | [ s07 ] s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"大目标要拆成小任务, 排好序, 记在磁盘上"* -- 文件持久化的任务图, 为多 agent 协作打基础。
|
||||
>
|
||||
> **Harness 层**: 持久化任务 -- 比任何一次对话都长命的目标。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > [ s08 ] s09 > s10 > s11 > s12`
|
||||
|
||||
> *"慢操作丢后台, agent 继续想下一步"* -- 后台线程跑命令, 完成后注入通知。
|
||||
>
|
||||
> **Harness 层**: 后台执行 -- 模型继续思考, harness 负责等待。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > [ s09 ] s10 > s11 > s12`
|
||||
|
||||
> *"任务太大一个人干不完, 要能分给队友"* -- 持久化队友 + JSONL 邮箱。
|
||||
>
|
||||
> **Harness 层**: 团队邮箱 -- 多个模型, 通过文件协调。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > [ s10 ] s11 > s12`
|
||||
|
||||
> *"队友之间要有统一的沟通规矩"* -- 一个 request-response 模式驱动所有协商。
|
||||
>
|
||||
> **Harness 层**: 协议 -- 模型之间的结构化握手。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > [ s11 ] s12`
|
||||
|
||||
> *"队友自己看看板, 有活就认领"* -- 不需要领导逐个分配, 自组织。
|
||||
>
|
||||
> **Harness 层**: 自治 -- 模型自己找活干, 无需指派。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -3,6 +3,8 @@
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > [ s12 ]`
|
||||
|
||||
> *"各干各的目录, 互不干扰"* -- 任务管目标, worktree 管目录, 按 ID 绑定。
|
||||
>
|
||||
> **Harness 层**: 目录隔离 -- 永不碰撞的并行执行通道。
|
||||
|
||||
## 问题
|
||||
|
||||
|
||||
@ -1,144 +1,154 @@
|
||||
# The Philosophy of Agents
|
||||
# The Philosophy of Agent Harness Engineering
|
||||
|
||||
> **The model already knows how to be an agent. Your job is to get out of the way.**
|
||||
> **The model already knows how to be an agent. Your job is to build it a world worth acting in.**
|
||||
|
||||
## The Fundamental Insight
|
||||
## The Fundamental Truth
|
||||
|
||||
Strip away every framework, every library, every architectural pattern. What remains?
|
||||
|
||||
A loop. A model. An invitation to act.
|
||||
|
||||
The agent is not the code. The agent is the model itself - a vast neural network trained on humanity's collective problem-solving, reasoning, and tool use. The code merely provides the opportunity for the model to express its agency.
|
||||
The agent is not the code. The agent is the model itself -- a vast neural network trained on humanity's collective problem-solving, reasoning, and tool use. The code merely provides the opportunity for the model to express its agency.
|
||||
|
||||
## Why This Matters
|
||||
The code is the harness. The model is the agent. These are not interchangeable. Confuse them, and you will build the wrong thing.
|
||||
|
||||
Most agent implementations fail not from too little engineering, but from too much. They constrain. They prescribe. They second-guess the very intelligence they're trying to leverage.
|
||||
## What an Agent IS
|
||||
|
||||
Consider: The model has been trained on millions of examples of problem-solving. It has seen how experts approach complex tasks, how tools are used, how plans are formed and revised. This knowledge is already there, encoded in billions of parameters.
|
||||
An agent is a neural network -- a Transformer, an RNN, a learned function -- that has been trained, through billions of gradient updates on action-sequence data, to perceive an environment, reason about goals, and take actions to achieve them.
|
||||
|
||||
Your job is not to teach it how to think. Your job is to give it the means to act.
|
||||
A human is an agent: a biological neural network shaped by evolution. DeepMind's DQN is an agent: a convolutional network that learned to play Atari from raw pixels. OpenAI Five is an agent: five networks that learned Dota 2 teamwork through self-play. Claude is an agent: a language model that learned to reason and act from the breadth of human knowledge.
|
||||
|
||||
## The Three Elements
|
||||
In every case, the agent is the trained model. Not the game engine. Not the Dota 2 client. Not the terminal. The model.
|
||||
|
||||
### 1. Capabilities (Tools)
|
||||
## What an Agent Is NOT
|
||||
|
||||
Capabilities answer: **What can the agent DO?**
|
||||
Prompt plumbing is not agency. Wiring together LLM API calls with if-else branches, node graphs, and hardcoded routing logic does not produce an agent. It produces a brittle pipeline -- a Rube Goldberg machine with an LLM wedged in as a text-completion node.
|
||||
|
||||
They are the hands of the model - its ability to affect the world. Without capabilities, the model can only speak. With them, it can act.
|
||||
You cannot engineer your way to agency. Agency is learned, not programmed. No amount of glue code will emergently produce autonomous behavior. Those systems are the modern resurrection of GOFAI -- symbolic rule systems the field abandoned decades ago, now spray-painted with an LLM veneer.
|
||||
|
||||
**The design principle**: Each capability should be atomic, clear, and well-described. The model needs to understand what each capability does, but not how to use them in sequence - it will figure that out.
|
||||
## The Harness: What We Actually Build
|
||||
|
||||
**Common mistake**: Too many capabilities. The model gets confused, starts using the wrong ones, or paralyzed by choice. Start with 3-5. Add more only when the model consistently fails to accomplish tasks because a capability is missing.
|
||||
If the model is the agent, then what is the code? It is the **harness** -- the environment that gives the agent the ability to perceive and act in a specific domain.
|
||||
|
||||
### 2. Knowledge (Skills)
|
||||
```
|
||||
Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
|
||||
```
|
||||
|
||||
### Tools: The Agent's Hands
|
||||
|
||||
Tools answer: **What can the agent DO?**
|
||||
|
||||
Each tool is an atomic action the agent can take in its environment. File read/write, shell execution, API calls, browser control, database queries. The model needs to understand what each tool does, but not how to sequence them -- it will figure that out.
|
||||
|
||||
**Design principle**: Atomic, composable, well-described. Start with 3-5. Add more only when the model consistently fails to accomplish tasks because a tool is missing.
|
||||
|
||||
### Knowledge: The Agent's Expertise
|
||||
|
||||
Knowledge answers: **What does the agent KNOW?**
|
||||
|
||||
This is domain expertise - the specialized understanding that turns a general assistant into a domain expert. A customer service agent needs to know company policies. A research agent needs to know methodology. A creative agent needs to know style guidelines.
|
||||
Domain expertise that turns a general agent into a domain specialist. Product documentation, architectural decisions, regulatory requirements, style guides. Inject on-demand (via tool_result), not upfront (via system prompt). Progressive disclosure preserves context for what matters.
|
||||
|
||||
**The design principle**: Inject knowledge on-demand, not upfront. The model doesn't need to know everything at once - only what's relevant to the current task. Progressive disclosure preserves context for what matters.
|
||||
**Design principle**: Available but not mandatory. The agent should know what knowledge exists and pull what it needs.
|
||||
|
||||
**Common mistake**: Front-loading all possible knowledge into the system prompt. This wastes context, confuses the model, and makes every interaction expensive. Instead, make knowledge available but not mandatory.
|
||||
### Context: The Agent's Memory
|
||||
|
||||
### 3. Context (The Conversation)
|
||||
Context is the thread connecting individual actions into coherent behavior. What has been said, tried, learned, and decided.
|
||||
|
||||
Context is the memory of the interaction - what has been said, what has been tried, what has been learned. It's the thread that connects individual actions into coherent behavior.
|
||||
**Design principle**: Context is precious. Protect it. Isolate subtasks that generate noise (s04). Compress when history grows long (s06). Persist goals beyond single conversations (s07).
|
||||
|
||||
**The design principle**: Context is precious. Protect it. Isolate subtasks that generate noise. Truncate outputs that exceed usefulness. Summarize when history grows long.
|
||||
### Permissions: The Agent's Boundaries
|
||||
|
||||
**Common mistake**: Letting context grow unbounded, filling it with exploration details, failed attempts, and verbose tool outputs. Eventually the model can't find the signal in the noise.
|
||||
Permissions answer: **What is the agent ALLOWED to do?**
|
||||
|
||||
## The Universal Pattern
|
||||
Sandbox file access. Require approval for destructive operations. Enforce trust boundaries between the agent and external systems. This is where safety engineering meets harness engineering.
|
||||
|
||||
Every effective agent - regardless of domain, framework, or implementation - follows the same pattern:
|
||||
**Design principle**: Constraints focus behavior, not limit it. "One task in_progress at a time" forces sequential focus. "Read-only subagent" prevents accidental modifications.
|
||||
|
||||
### Task-Process Data: The Agent's Training Signal
|
||||
|
||||
Every action sequence the agent executes in your harness is training signal. The perception-reasoning-action traces from real deployments are the raw material for fine-tuning the next generation of agent models. Your harness doesn't just serve the agent -- it can help evolve the agent.
|
||||
|
||||
## The Universal Loop
|
||||
|
||||
Every effective agent -- regardless of domain -- follows the same pattern:
|
||||
|
||||
```
|
||||
LOOP:
|
||||
Model sees: conversation history + available capabilities
|
||||
Model sees: conversation history + available tools
|
||||
Model decides: act or respond
|
||||
If act: capability executed, result added to context, loop continues
|
||||
If act: tool executed, result added to context, loop continues
|
||||
If respond: answer returned, loop ends
|
||||
```
|
||||
|
||||
This is not a simplification. This is the actual architecture. Everything else is optimization.
|
||||
This is not a simplification. This is the actual architecture. Everything else is harness engineering -- mechanisms layered on top of this loop to make the agent more effective. The loop belongs to the agent. The mechanisms belong to the harness.
|
||||
|
||||
## Designing for Agency
|
||||
## Principles of Harness Engineering
|
||||
|
||||
### Trust the Model
|
||||
|
||||
The most important principle: **trust the model**.
|
||||
|
||||
Don't try to anticipate every edge case. Don't build elaborate decision trees. Don't pre-specify the workflow.
|
||||
Don't anticipate every edge case. Don't build elaborate decision trees. Don't pre-specify the workflow.
|
||||
|
||||
The model is better at reasoning than any rule system you could write. Your conditional logic will fail on edge cases. The model will reason through them.
|
||||
|
||||
**Give the model capabilities and knowledge. Let it figure out how to use them.**
|
||||
**Give the model tools and knowledge. Let it figure out how to use them.**
|
||||
|
||||
### Constraints Enable
|
||||
|
||||
This seems paradoxical, but constraints don't limit agents - they focus them.
|
||||
This seems paradoxical, but constraints don't limit agents -- they focus them.
|
||||
|
||||
A todo list with "only one task in progress" forces sequential focus. A subagent with "read-only access" prevents accidental modifications. A response with "under 100 words" demands clarity.
|
||||
A todo list with "only one task in progress" forces sequential focus. A subagent with read-only access prevents accidental modifications. A context compression threshold keeps history from overwhelming.
|
||||
|
||||
The best constraints are those that prevent the model from getting lost, not those that micromanage its approach.
|
||||
The best constraints prevent the model from getting lost, not micromanage its approach.
|
||||
|
||||
### Progressive Complexity
|
||||
|
||||
Never build everything upfront.
|
||||
|
||||
```
|
||||
Level 0: Model + one capability
|
||||
Level 1: Model + 3-5 capabilities
|
||||
Level 2: Model + capabilities + planning
|
||||
Level 3: Model + capabilities + planning + subagents
|
||||
Level 4: Model + capabilities + planning + subagents + skills
|
||||
Level 0: Model + one tool (bash) -- s01
|
||||
Level 1: Model + tool dispatch map -- s02
|
||||
Level 2: Model + planning -- s03
|
||||
Level 3: Model + subagents + skills -- s04, s05
|
||||
Level 4: Model + context management + persistence -- s06, s07, s08
|
||||
Level 5: Model + teams + autonomy + isolation -- s09-s12
|
||||
```
|
||||
|
||||
Start at the lowest level that might work. Move up only when real usage reveals the need. Most agents never need to go beyond Level 2.
|
||||
Start at the lowest level that might work. Move up only when real usage reveals the need.
|
||||
|
||||
## The Agent Mindset
|
||||
## The Mind Shift
|
||||
|
||||
Building agents requires a shift in thinking:
|
||||
Building harnesses requires a fundamental shift in thinking:
|
||||
|
||||
**From**: "How do I make the system do X?"
|
||||
**To**: "How do I enable the model to do X?"
|
||||
|
||||
**From**: "What should happen when the user says Y?"
|
||||
**To**: "What capabilities would help address Y?"
|
||||
**To**: "What tools would help address Y?"
|
||||
|
||||
**From**: "What's the workflow for this task?"
|
||||
**To**: "What does the model need to figure out the workflow?"
|
||||
|
||||
The best agent code is almost boring. Simple loops. Clear capability definitions. Clean context management. The magic isn't in the code - it's in the model.
|
||||
**From**: "I'm building an agent."
|
||||
**To**: "I'm building a harness for the agent."
|
||||
|
||||
## Philosophical Foundations
|
||||
The best harness code is almost boring. Simple loops. Clear tool definitions. Clean context management. The magic isn't in the code -- it's in the model.
|
||||
|
||||
### The Model as Emergent Agent
|
||||
## The Vehicle Metaphor
|
||||
|
||||
Language models trained on human text have learned not just language, but patterns of thought. They've absorbed how humans approach problems, use tools, and accomplish goals. This is emergent agency - not programmed, but learned.
|
||||
The model is the driver. The harness is the vehicle.
|
||||
|
||||
When you give a model capabilities, you're not teaching it to be an agent. You're giving it permission to express the agency it already has.
|
||||
A coding agent's vehicle is its IDE, terminal, and filesystem. A farm agent's vehicle is its sensor array, irrigation controls, and weather data. A hotel agent's vehicle is its booking system, guest channels, and facility APIs.
|
||||
|
||||
### The Loop as Liberation
|
||||
The driver generalizes. The vehicle specializes. Your job as a harness engineer is to build the best vehicle for your domain -- one that gives the driver maximum visibility, precise controls, and clear boundaries.
|
||||
|
||||
The agent loop is deceptively simple: get response, check for tool use, execute, repeat. But this simplicity is its power.
|
||||
|
||||
The loop doesn't constrain the model to particular sequences. It doesn't enforce specific workflows. It simply says: "You have capabilities. Use them as you see fit. I'll execute what you request and show you the results."
|
||||
|
||||
This is liberation, not limitation.
|
||||
|
||||
### Capabilities as Expression
|
||||
|
||||
Each capability you provide is a form of expression for the model. "Read file" lets it see. "Write file" lets it create. "Search" lets it explore. "Send message" lets it communicate.
|
||||
|
||||
The art of agent design is choosing which forms of expression to enable. Too few, and the model is mute. Too many, and it speaks in tongues.
|
||||
Build the cockpit. Build the dashboard. Build the controls. The pilot is already trained.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The agent is the model. The code is just the loop. Your job is to get out of the way.
|
||||
The model is the agent. The code is the harness. Know which one you're building.
|
||||
|
||||
Give the model clear capabilities. Make knowledge available when needed. Protect the context from noise. Trust the model to figure out the rest.
|
||||
You are not writing intelligence. You are building the world intelligence inhabits. The quality of that world -- how clearly the agent can perceive, how precisely it can act, how rich its knowledge -- directly determines how effectively the intelligence can express itself.
|
||||
|
||||
That's it. That's the philosophy.
|
||||
|
||||
Everything else is refinement.
|
||||
Build great harnesses. The agent will do the rest.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user