Files
gui-yue 1baf1aca5a Follow up PR #265: refine chapters, diagrams, and add S20 (#283)
* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience

Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building
incrementally on the previous. Key fixes across chapters:

- s01-s04: agent loop, tool dispatch, permission pipeline, hooks
- s05-s08: todo write, subagent, skill loading, context compact
- s09-s11: memory system, system prompt assembly, error recovery
- s12-s14: task graph, background tasks, cron scheduler

All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS,
json.dumps cache, real-state context, can_start dep protection, etc.).

* feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools

Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform
chapters. Each chapter inherits all previous fixes and adds one mechanism:

- s15: agent teams (TeamCreate, teammate threads, shared task list)
- s16: team protocols (plan approval, shutdown handshake, consume_inbox)
- s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox)
- s18: worktree isolation (git worktree, bind_task, cwd switching, safety)
- s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache)

All appendix source code references verified against CC source. Config priority
corrected: claude.ai < plugin < user < project < local.

* fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash

- s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02)
- s06-s08: todo_write validates content/status required fields (inherited from s05)
- s09: extract_memories uses pre-compression snapshot instead of compacted messages
- s16: submit_plan docstring clarifies protocol-only (not code-level gate)
- s17-s19: match_response restores type mismatch validation (from s16)
- s17-s19: claim_task deps list handles missing dep files without crashing

* fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation

- s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task,
  non-interactive/SDK defaults to TodoWrite. Fix env var name to
  CLAUDE_CODE_ENABLE_TASKS (not TODO_V2).
- s14/s15: add _validate_cron_field with per-field range checks (minute 0-59,
  hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi.
  Replace old try/except validation that only caught exceptions.
- s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree,
  not just create_worktree.

* fix: align s16-s19 teaching tool consistency

* fix pr265 chapter diagrams

* Add comprehensive s20 harness chapter

* Fix chapter smoke test regressions

* Clarify README tutorial track transition

---------

Co-authored-by: Haoran <bill-billion@outlook.com>
2026-05-20 21:45:38 +08:00

255 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# s15: Agent Teams — 一个搞不定,组队来
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
s01 → ... → s13 → s14 → `s15` → [s16](../s16_team_protocols/) → s17 → s18 → s19 → s20
> *"一个搞不定, 组队来"* — 文件收件箱 + 队友线程。
>
> **Harness 层**: 团队 — 多 Agent 协作, 消息总线。
---
## 问题
"重构整个后端"涉及认证模块、数据库层、API 路由、测试。一个 Agent 在修 API 路由时,认证模块的细节已经不在上下文里了。上下文窗口就那么大,单个 Agent 的注意力覆盖不了所有模块。
s06 的子 Agent 是临时工,叫来干一件事就走了。但有些任务需要能通信、能协作的队友。
---
## 解决方案
![Agent Teams Overview](images/agent-teams-overview.svg)
教学代码沿用 S14 的能力prompt 组装、任务系统、后台执行、cron 调度)。为了聚焦团队机制,省略了完整错误恢复、记忆和技能系统。新增三样:**MessageBus**(文件收件箱)、**spawn_teammate_thread**(启动队友线程)、**inbox 注入**Lead 接收队友消息并注入 history
子 Agent vs 队友:
| | s06 子 Agent | s15 队友 |
|---|---|---|
| 生命周期 | 一次性,用完销毁 | 多轮(教学版限 10 轮,真实 CC 用 idle loop |
| 通信 | 只回传结论 | 异步收件箱,随时通信 |
| 上下文 | 完全隔离 | 通过消息共享信息 |
| 数量 | 一个主 Agent + 偶尔子 Agent | 一个 Lead + 多个队友 |
---
## 工作原理
![Team Topology](images/team-topology.svg)
### MessageBus: 文件收件箱
每个 Agent包括 Lead 和队友)有一个 `.jsonl` 邮箱。发消息 = 往对方的文件里 append 一行 JSON。读消息 = 读文件 + 删除(消费式):
```python
class MessageBus:
def send(self, from_agent: str, to_agent: str,
content: str, msg_type: str = "message"):
msg = {"from": from_agent, "to": to_agent,
"content": content, "type": msg_type,
"ts": time.time()}
inbox = MAILBOX_DIR / f"{to_agent}.jsonl"
with open(inbox, "a") as f:
f.write(json.dumps(msg) + "\n")
def read_inbox(self, agent: str) -> list[dict]:
inbox = MAILBOX_DIR / f"{agent}.jsonl"
if not inbox.exists():
return []
msgs = [json.loads(line) for line in inbox.read_text().splitlines()]
inbox.unlink() # 消费式:读完删除
return msgs
```
为什么用文件而不是内存队列?教学版选文件是因为直观、跨线程可观察。真实 CC 也用文件收件箱(`~/.claude/teams/{team}/inboxes/`),但加了 `proper-lockfile` 防并发写冲突。教学版的 `read_inbox` 有 read + unlink 竞态,多线程同时读可能丢消息,对教学场景可以接受。
### spawn_teammate_thread: 启动队友
Lead 调用 `spawn_teammate` 工具启动一个队友。队友跑在自己的 daemon 线程里,有自己的 system prompt、自己的 messages、自己的简化工具集
```python
def spawn_teammate_thread(name: str, role: str, prompt: str) -> str:
system = f"You are '{name}', a {role}. Use tools to complete tasks."
def run():
messages = [{"role": "user", "content": prompt}]
sub_tools = [bash, read_file, write_file, send_message]
for _ in range(10): # 最多 10 轮
inbox = BUS.read_inbox(name)
if inbox:
messages.append({"role": "user",
"content": f"<inbox>{json.dumps(inbox)}</inbox>"})
response = client.messages.create(
model=MODEL, system=system, messages=messages[-20:],
tools=sub_tools, max_tokens=8000)
# ... 执行工具、处理结果
# 完成后发 summary 给 Lead
BUS.send(name, "lead", summary, "result")
threading.Thread(target=run, daemon=True).start()
```
关键设计:
- **队友有简化工具集**bash、read、write、send_message。教学版省略了任务和 cron聚焦通信机制。真实 CC 的队友也有 TaskCreate、TaskUpdate 等工具,任务系统是团队共享的
- **教学版限 10 轮**:防止队友无限循环。真实 CC 用 idle loop跑完一轮后发 `idle_notification`,等 inbox 消息,收到后继续,直到 `shutdown_request` 才退出
- **完成后自动汇报**`BUS.send(name, "lead", summary)` 把最终结果发到 Lead 的收件箱
### Lead 的 inbox 注入
Lead 在每轮主循环结束后检查收件箱。队友发来的消息注入到 history 里,让 LLM 能看到并做出反应:
```python
# 主循环结束后
inbox = BUS.read_inbox("lead")
if inbox:
inbox_text = "\n".join(
f"From {m['from']}: {m['content'][:200]}" for m in inbox)
history.append({"role": "user",
"content": f"[Inbox]\n{inbox_text}"})
```
教学版在用户输入循环外注入。CC 更精细Lead 的 `useInboxPoller` 每 1 秒检查一次,有消息就提交为新的 turn不需要等用户输入。
### 权限冒泡
教学版省略了权限冒泡。真实 CC 的流程(`permissionSync.ts``useSwarmPermissionPoller.ts`
1. 队友遇到需要审批的操作 → 发 `permission_request` 到 Lead 收件箱
2. Lead 的 `useInboxPoller` 检测到请求 → 路由到审批队列
3. 用户审批后 → Lead 发 `permission_response` 回队友
4. 队友的 `useSwarmPermissionPoller`(每 500ms 轮询)收到回复 → 继续或拒绝
### 合起来跑
```
1. Lead: "搭建后端:一个人搞不定,组队吧"
2. Lead → spawn_teammate("alice", "backend dev", "创建数据库 schema")
3. Lead → spawn_teammate("bob", "frontend dev", "写 API 客户端")
4. alice 线程启动 → 自己的 LLM 调用 → bash "python manage.py migrate"
5. bob 线程启动 → 自己的 LLM 调用 → write_file("client.ts", ...)
6. alice 完成 → BUS.send("alice", "lead", "Schema done: users, orders tables")
7. bob 完成 → BUS.send("bob", "lead", "Client written with types")
8. Lead 下次循环 → inbox 注入 history → LLM 看到 alice 和 bob 的结果
```
两个队友并行工作。
---
## 相对 s14 的变更
| 组件 | 之前 (s14) | 之后 (s15) |
|------|-----------|-----------|
| Agent 数量 | 1 | 1 Lead + N 队友线程 |
| 通信 | 无 | MessageBus + .mailboxes/*.jsonl |
| 新类 | — | MessageBus, active_teammates dict |
| 新函数 | — | spawn_teammate_thread, run_send_message, run_check_inbox |
| Lead 工具 | 11 (s14) | + spawn_teammate, send_message, check_inbox (14) |
| 队友工具 | — | bash, read_file, write_file, send_message (4) |
| 权限 | 本地决策 | 教学版省略(真实 CC 有冒泡机制) |
---
## 试一下
```sh
cd learn-claude-code
python s15_agent_teams/code.py
```
试试这些 prompt
1. `Spawn alice as a backend developer. Ask her to create a file called schema.sql with a users table.`
2. `Check your inbox for alice's result.`
3. `Spawn bob as a tester. Ask him to check if schema.sql exists and list its contents.`
观察重点Lead 如何启动队友?`.mailboxes/` 目录下的 JSONL 文件长什么样?队友完成后 Lead 的 inbox 有没有注入到 history
---
## 接下来
队友能干活、能通信。但如果 Lead 想让 Alice 关机直接杀线程会留下写到一半的文件。需要一个体面的关机协议Lead 发 shutdown_request队友收尾后退出。
s16 Team Protocols → 关机握手与消息约定。
<details>
<summary>深入 CC 源码</summary>
> 以下基于 CC 源码 `spawnMultiAgent.ts`、`useInboxPoller.ts`969 行)、`useSwarmPermissionPoller.ts`330 行)、`teammateMailbox.ts`、`teamHelpers.ts` 的完整分析。
### 一、没有中央消息总线,是文件系统
教学版用 `MessageBus` 类收发消息。CC 的做法更直接,每个 Agent 直接写其他 Agent 的收件箱文件。
收件箱路径:`~/.claude/teams/{teamName}/inboxes/{agentName}.json`
写入时用 `proper-lockfile` 文件锁保证并发安全(最多重试 10 次)。每个文件是一个 JSON 数组append 新消息时读→追加→写回。
### 二、15 种消息类型
CC 的团队通信有 15 种结构化消息(`teammateMailbox.ts`
| 类型 | 方向 | 用途 |
|------|------|------|
| `plain text` | 双向 | 普通队友间通信 |
| `idle_notification` | 队友→Lead | 队友完成一轮工作,进入空闲 |
| `permission_request` | 队友→Lead | 队友需要操作审批 |
| `permission_response` | Lead→队友 | Lead 审批结果 |
| `plan_approval_request` | 队友→Lead | 队友提交计划待审 |
| `plan_approval_response` | Lead→队友 | Lead 审批计划 |
| `shutdown_request` | Lead→队友 | 请求体面关机 |
| `shutdown_approved` | 队友→Lead | 确认关机 |
| `shutdown_rejected` | 队友→Lead | 拒绝关机(附原因) |
| `task_assignment` | Lead→队友 | 分配任务 |
| `team_permission_update` | Lead→队友 | 广播权限变更 |
| `mode_set_request` | Lead→队友 | 修改队友的权限模式 |
| `sandbox_permission_*` | 双向 | 网络权限请求/回复 |
| `teammate_terminated` | 系统 | 队友被移除通知 |
文本消息被包装在 `<teammate-message>` XML 标签中交付给模型。
### 三、权限冒泡:双向轮询
教学版省略了权限冒泡。CC 的实际流程(`permissionSync.ts`
1. **队友**遇到需要审批的操作 → 发 `permission_request` 到 Lead 的收件箱
2. **Lead**`useInboxPoller`(每 1 秒轮询)检测到请求 → 路由到 `ToolUseConfirmQueue`
3. Lead 的 UI 显示审批对话框,带队友名字和颜色
4. 用户审批后 → Lead 发 `permission_response` 回队友的收件箱
5. **队友**的 `useSwarmPermissionPoller`(每 500ms 轮询)收到回复 → 继续或拒绝执行
### 四、队友生命周期
CC 的队友由 `spawnTeammate()``spawnMultiAgent.ts`)创建:
1. **Spawn**:创建 tmux 窗格(或进程内),分配颜色,写入 team config
2. **Work**`useInboxPoller` 每 1 秒检查收件箱 → 有消息就提交为新的 turn
3. **Idle**Stop hook 触发 → 发 `idle_notification` 给 Lead
4. **Shutdown**Lead 发 `shutdown_request` → 队友回复 `shutdown_approved` → Lead 清理
### 五、Team Config
团队注册表在 `~/.claude/teams/{teamName}/config.json``teamHelpers.ts`
```json
{
"name": "my-team",
"leadAgentId": "lead@my-team",
"members": [{
"agentId": "researcher@my-team",
"name": "researcher",
"agentType": "general-purpose",
"color": "blue",
"isActive": true
}]
}
```
队友之间不能嵌套(`AgentTool.tsx:273` 明确禁止 "teammates spawning other teammates")。
</details>
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->