feat: build an AI agent from 0 to 1 -- 11 progressive sessions

- 11 sessions from basic agent loop to autonomous teams
- Python MVP implementations for each session
- Mental-model-first docs in en/zh/ja
- Interactive web platform with step-through visualizations
- Incremental architecture: each session adds one mechanism
This commit is contained in:
CrazyBoyM
2026-02-21 14:37:42 +08:00
committed by CrazyBoyM
commit c6a27ef1d7
156 changed files with 28059 additions and 0 deletions

View File

@@ -0,0 +1,143 @@
# s01: The Agent Loop
> The entire secret of AI coding agents is a while loop that feeds tool results back to the model until the model decides to stop.
## The Problem
Why can't a language model just answer a coding question? Because coding
requires _interaction with the real world_. The model needs to read files,
run tests, check errors, and iterate. A single prompt-response pair cannot
do this.
Without the agent loop, you would have to copy-paste outputs back into the
model yourself. The user becomes the loop. The agent loop automates this:
call the model, execute whatever tools it asks for, feed the results back,
repeat until the model says "I'm done."
Consider a simple task: "Create a Python file that prints hello." The model
needs to (1) decide to write a file, (2) write it, (3) verify it works.
That is three tool calls minimum. Without a loop, each one requires manual
human intervention.
## The Solution
```
+----------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tool |
| prompt | | | | execute |
+----------+ +---+---+ +----+----+
^ |
| tool_result |
+---------------+
(loop continues)
The loop terminates when stop_reason != "tool_use".
That single condition is the entire control flow.
```
## How It Works
1. The user provides a prompt. It becomes the first message.
```python
history.append({"role": "user", "content": query})
```
2. The messages array is sent to the LLM along with the tool definitions.
```python
response = client.messages.create(
model=MODEL, system=SYSTEM, messages=messages,
tools=TOOLS, max_tokens=8000,
)
```
3. The assistant response is appended to messages.
```python
messages.append({"role": "assistant", "content": response.content})
```
4. We check the stop reason. If the model did not call a tool, the loop
ends. This is the only exit condition.
```python
if response.stop_reason != "tool_use":
return
```
5. For each tool_use block in the response, execute the tool (bash in this
session) and collect results.
```python
for block in response.content:
if block.type == "tool_use":
output = run_bash(block.input["command"])
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
})
```
6. The results are appended as a user message, and the loop continues.
```python
messages.append({"role": "user", "content": results})
```
## Key Code
The minimum viable agent -- the entire pattern in under 30 lines
(from `agents/s01_agent_loop.py`, lines 66-86):
```python
def agent_loop(messages: list):
while True:
response = client.messages.create(
model=MODEL, system=SYSTEM, messages=messages,
tools=TOOLS, max_tokens=8000,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return
results = []
for block in response.content:
if block.type == "tool_use":
output = run_bash(block.input["command"])
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
})
messages.append({"role": "user", "content": results})
```
## What Changed
This is session 1 -- the starting point. There is no prior session.
| Component | Before | After |
|---------------|------------|--------------------------------|
| Agent loop | (none) | `while True` + stop_reason |
| Tools | (none) | `bash` (one tool) |
| Messages | (none) | Accumulating list |
| Control flow | (none) | `stop_reason != "tool_use"` |
## Design Rationale
This loop is the universal foundation of all LLM-based agents. Production implementations add error handling, token counting, streaming, and retry logic, but the fundamental structure is unchanged. The simplicity is the point: one exit condition (`stop_reason != "tool_use"`) controls the entire flow. Everything else in this course -- tools, planning, compression, teams -- layers on top of this loop without modifying it. Understanding this loop means understanding every agent.
## Try It
```sh
cd learn-claude-code
python agents/s01_agent_loop.py
```
Example prompts to try:
1. `Create a file called hello.py that prints "Hello, World!"`
2. `List all Python files in this directory`
3. `What is the current git branch?`
4. `Create a directory called test_output and write 3 files in it`

151
docs/en/s02-tool-use.md Normal file
View File

@@ -0,0 +1,151 @@
# s02: Tools
> A dispatch map routes tool calls to handler functions -- the loop itself does not change at all.
## The Problem
With only `bash`, the agent shells out for everything: reading files,
writing files, editing files. This works but is fragile. `cat` output
gets truncated unpredictably. `sed` replacements fail on special
characters. The model wastes tokens constructing shell pipelines when
a direct function call would be simpler.
More importantly, bash is a security surface. Every bash call can do
anything the shell can do. With dedicated tools like `read_file` and
`write_file`, you can enforce path sandboxing and block dangerous
patterns at the tool level rather than hoping the model avoids them.
The insight is that adding tools does not require changing the loop.
The loop from s01 stays identical. You add entries to the tools array,
add handler functions, and wire them together with a dispatch map.
## The Solution
```
+----------+ +-------+ +------------------+
| User | ---> | LLM | ---> | Tool Dispatch |
| prompt | | | | { |
+----------+ +---+---+ | bash: run_bash |
^ | read: run_read |
| | write: run_wr |
+----------+ edit: run_edit |
tool_result| } |
+------------------+
The dispatch map is a dict: {tool_name: handler_function}
One lookup replaces any if/elif chain.
```
## How It Works
1. Define handler functions for each tool. Each takes keyword arguments
matching the tool's input_schema and returns a string result.
```python
def run_read(path: str, limit: int = None) -> str:
text = safe_path(path).read_text()
lines = text.splitlines()
if limit and limit < len(lines):
lines = lines[:limit]
return "\n".join(lines)[:50000]
```
2. Create the dispatch map linking tool names to handlers.
```python
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"],
kw["new_text"]),
}
```
3. In the agent loop, look up the handler by name instead of hardcoding.
```python
for block in response.content:
if block.type == "tool_use":
handler = TOOL_HANDLERS.get(block.name)
output = handler(**block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
})
```
4. Path sandboxing prevents the model from escaping the workspace.
```python
def safe_path(p: str) -> Path:
path = (WORKDIR / p).resolve()
if not path.is_relative_to(WORKDIR):
raise ValueError(f"Path escapes workspace: {p}")
return path
```
## Key Code
The dispatch pattern (from `agents/s02_tool_use.py`, lines 93-129):
```python
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"],
kw["new_text"]),
}
def agent_loop(messages: list):
while True:
response = client.messages.create(
model=MODEL, system=SYSTEM, messages=messages,
tools=TOOLS, max_tokens=8000,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return
results = []
for block in response.content:
if block.type == "tool_use":
handler = TOOL_HANDLERS.get(block.name)
output = handler(**block.input) if handler \
else f"Unknown tool: {block.name}"
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
})
messages.append({"role": "user", "content": results})
```
## What Changed From s01
| Component | Before (s01) | After (s02) |
|----------------|--------------------|----------------------------|
| Tools | 1 (bash only) | 4 (bash, read, write, edit)|
| Dispatch | Hardcoded bash call | `TOOL_HANDLERS` dict |
| Path safety | None | `safe_path()` sandbox |
| Agent loop | Unchanged | Unchanged |
## Design Rationale
The dispatch map pattern scales linearly -- adding a tool means adding one handler and one schema entry. The loop never changes. This separation of concerns (loop vs handlers) is why agent frameworks can support dozens of tools without increasing control flow complexity. The pattern also enables independent testing of each handler in isolation, since handlers are pure functions with no coupling to the loop. Any agent that outgrows a dispatch map has a design problem, not a scaling problem.
## Try It
```sh
cd learn-claude-code
python agents/s02_tool_use.py
```
Example prompts to try:
1. `Read the file requirements.txt`
2. `Create a file called greet.py with a greet(name) function`
3. `Edit greet.py to add a docstring to the function`
4. `Read greet.py to verify the edit worked`
5. `Run the greet function with bash: python -c "from greet import greet; greet('World')"`

170
docs/en/s03-todo-write.md Normal file
View File

@@ -0,0 +1,170 @@
# s03: TodoWrite
> A TodoManager lets the agent track its own progress, and a nag reminder injection forces it to keep updating when it forgets.
## The Problem
When an agent works on a multi-step task, it often loses track of what it
has done and what remains. Without explicit planning, the model might repeat
work, skip steps, or wander off on tangents. The user has no visibility
into the agent's internal plan.
This is worse than it sounds. Long conversations cause the model to "drift"
-- the system prompt fades in influence as the context window fills with
tool results. A 10-step refactoring task might complete steps 1-3, then
the model starts improvising because it forgot steps 4-10 existed.
The solution is structured state: a TodoManager that the model writes to
explicitly. The model creates a plan, marks items in_progress as it works,
and marks them completed when done. A nag reminder injects a nudge if the
model goes 3+ rounds without updating its todos.
Teaching simplification: the nag threshold of 3 rounds is set low for
teaching visibility. Production agents typically use a higher threshold
around 10 to avoid excessive prompting.
## The Solution
```
+----------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tools |
| prompt | | | | + todo |
+----------+ +---+---+ +----+----+
^ |
| tool_result |
+---------------+
|
+-----------+-----------+
| TodoManager state |
| [ ] task A |
| [>] task B <- doing |
| [x] task C |
+-----------------------+
|
if rounds_since_todo >= 3:
inject <reminder> into tool_result
```
## How It Works
1. The TodoManager validates and stores a list of items with statuses.
Only one item can be `in_progress` at a time.
```python
class TodoManager:
def __init__(self):
self.items = []
def update(self, items: list) -> str:
validated = []
in_progress_count = 0
for item in items:
status = item.get("status", "pending")
if status == "in_progress":
in_progress_count += 1
validated.append({
"id": item["id"],
"text": item["text"],
"status": status,
})
if in_progress_count > 1:
raise ValueError("Only one task can be in_progress")
self.items = validated
return self.render()
```
2. The `todo` tool is added to the dispatch map like any other tool.
```python
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
# ...other tools...
"todo": lambda **kw: TODO.update(kw["items"]),
}
```
3. The nag reminder injects a `<reminder>` tag into the tool_result
messages when the model goes 3+ rounds without calling `todo`.
```python
def agent_loop(messages: list):
rounds_since_todo = 0
while True:
if rounds_since_todo >= 3 and messages:
last = messages[-1]
if (last["role"] == "user"
and isinstance(last.get("content"), list)):
last["content"].insert(0, {
"type": "text",
"text": "<reminder>Update your todos.</reminder>",
})
# ... rest of loop ...
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
```
4. The system prompt instructs the model to use todos for planning.
```python
SYSTEM = f"""You are a coding agent at {WORKDIR}.
Use the todo tool to plan multi-step tasks.
Mark in_progress before starting, completed when done.
Prefer tools over prose."""
```
## Key Code
The TodoManager and nag injection (from `agents/s03_todo_write.py`,
lines 51-85 and 158-187):
```python
class TodoManager:
def update(self, items: list) -> str:
validated = []
in_progress_count = 0
for item in items:
status = item.get("status", "pending")
if status == "in_progress":
in_progress_count += 1
validated.append({
"id": item["id"],
"text": item["text"],
"status": status,
})
if in_progress_count > 1:
raise ValueError("Only one in_progress")
self.items = validated
return self.render()
# In agent_loop:
if rounds_since_todo >= 3:
last["content"].insert(0, {
"type": "text",
"text": "<reminder>Update your todos.</reminder>",
})
```
## What Changed From s02
| Component | Before (s02) | After (s03) |
|----------------|------------------|--------------------------|
| Tools | 4 | 5 (+todo) |
| Planning | None | TodoManager with statuses|
| Nag injection | None | `<reminder>` after 3 rounds|
| Agent loop | Simple dispatch | + rounds_since_todo counter|
## Design Rationale
Visible plans improve task completion because the model can self-monitor progress. The nag mechanism creates accountability -- without it, the model may abandon plans mid-execution as conversation context grows and earlier instructions fade. The "one in_progress at a time" constraint enforces sequential focus, preventing context-switching overhead that degrades output quality. This pattern works because it externalizes the model's working memory into structured state that survives attention drift.
## Try It
```sh
cd learn-claude-code
python agents/s03_todo_write.py
```
Example prompts to try:
1. `Refactor the file hello.py: add type hints, docstrings, and a main guard`
2. `Create a Python package with __init__.py, utils.py, and tests/test_utils.py`
3. `Review all Python files and fix any style issues`

157
docs/en/s04-subagent.md Normal file
View File

@@ -0,0 +1,157 @@
# s04: Subagents
> A subagent runs with a fresh messages list, shares the filesystem with the parent, and returns only a summary -- keeping the parent context clean.
## The Problem
As the agent works, its messages array grows. Every tool call, every file
read, every bash output accumulates. After 20-30 tool calls, the context
window is crowded with irrelevant history. Reading a 500-line file to
answer a quick question permanently adds 500 lines to the context.
This is particularly bad for exploratory tasks. "What testing framework
does this project use?" might require reading 5 files, but the parent
agent does not need all 5 file contents in its history -- it just needs
the answer: "pytest with conftest.py configuration."
The solution is process isolation: spawn a child agent with `messages=[]`.
The child explores, reads files, runs commands. When it finishes, only its
final text response returns to the parent. The child's entire message
history is discarded.
## The Solution
```
Parent agent Subagent
+------------------+ +------------------+
| messages=[...] | | messages=[] | <-- fresh
| | dispatch | |
| tool: task | ---------->| while tool_use: |
| prompt="..." | | call tools |
| | summary | append results |
| result = "..." | <--------- | return last text |
+------------------+ +------------------+
|
Parent context stays clean.
Subagent context is discarded.
```
## How It Works
1. The parent agent gets a `task` tool that triggers subagent spawning.
The child gets all base tools except `task` (no recursive spawning).
```python
PARENT_TOOLS = CHILD_TOOLS + [
{"name": "task",
"description": "Spawn a subagent with fresh context.",
"input_schema": {
"type": "object",
"properties": {
"prompt": {"type": "string"},
"description": {"type": "string"},
},
"required": ["prompt"],
}},
]
```
2. The subagent starts with a fresh messages list containing only
the delegated prompt. It shares the same filesystem.
```python
def run_subagent(prompt: str) -> str:
sub_messages = [{"role": "user", "content": prompt}]
for _ in range(30): # safety limit
response = client.messages.create(
model=MODEL, system=SUBAGENT_SYSTEM,
messages=sub_messages,
tools=CHILD_TOOLS, max_tokens=8000,
)
sub_messages.append({
"role": "assistant", "content": response.content
})
if response.stop_reason != "tool_use":
break
# execute tools, append results...
```
3. Only the final text returns to the parent. The child's 30+ tool
call history is discarded.
```python
return "".join(
b.text for b in response.content if hasattr(b, "text")
) or "(no summary)"
```
4. The parent receives this summary as a normal tool_result.
```python
if block.name == "task":
output = run_subagent(block.input["prompt"])
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(output),
})
```
## Key Code
The subagent function (from `agents/s04_subagent.py`,
lines 110-128):
```python
def run_subagent(prompt: str) -> str:
sub_messages = [{"role": "user", "content": prompt}]
for _ in range(30):
response = client.messages.create(
model=MODEL, system=SUBAGENT_SYSTEM,
messages=sub_messages,
tools=CHILD_TOOLS, max_tokens=8000,
)
sub_messages.append({"role": "assistant",
"content": response.content})
if response.stop_reason != "tool_use":
break
results = []
for block in response.content:
if block.type == "tool_use":
handler = TOOL_HANDLERS.get(block.name)
output = handler(**block.input)
results.append({"type": "tool_result",
"tool_use_id": block.id,
"content": str(output)[:50000]})
sub_messages.append({"role": "user", "content": results})
return "".join(
b.text for b in response.content if hasattr(b, "text")
) or "(no summary)"
```
## What Changed From s03
| Component | Before (s03) | After (s04) |
|----------------|------------------|---------------------------|
| Tools | 5 | 5 (base) + task (parent) |
| Context | Single shared | Parent + child isolation |
| Subagent | None | `run_subagent()` function |
| Return value | N/A | Summary text only |
| Todo system | TodoManager | Removed (not needed here) |
## Design Rationale
Process isolation gives context isolation for free. A fresh `messages[]` means the subagent cannot be confused by the parent's conversation history. The tradeoff is communication overhead -- results must be compressed back to the parent, losing detail. This is the same tradeoff as OS process isolation: safety and cleanliness in exchange for serialization cost. Limiting subagent depth (no recursive spawning) prevents unbounded resource consumption, and a max iteration count ensures runaway children terminate.
## Try It
```sh
cd learn-claude-code
python agents/s04_subagent.py
```
Example prompts to try:
1. `Use a subtask to find what testing framework this project uses`
2. `Delegate: read all .py files and summarize what each one does`
3. `Use a task to create a new module, then verify it from here`

View File

@@ -0,0 +1,165 @@
# s05: Skills
> Two-layer skill injection avoids system prompt bloat by putting skill names in the system prompt (cheap) and full skill bodies in tool_result (on demand).
## The Problem
You want the agent to follow specific workflows for different domains:
git conventions, testing patterns, code review checklists. The naive
approach is to put everything in the system prompt. But the system prompt
has limited effective attention -- too much text and the model starts
ignoring parts of it.
If you have 10 skills at 2000 tokens each, that is 20,000 tokens of system
prompt. The model pays attention to the beginning and end but skims the
middle. Worse, most of those skills are irrelevant to any given task. A
file editing task does not need the git workflow instructions.
The two-layer approach solves this: Layer 1 puts short skill descriptions
in the system prompt (~100 tokens per skill). Layer 2 loads the full skill
body into a tool_result only when the model calls `load_skill`. The model
learns what skills exist (cheap) and loads them on demand (only when
relevant).
## The Solution
```
System prompt (Layer 1 -- always present):
+--------------------------------------+
| You are a coding agent. |
| Skills available: |
| - git: Git workflow helpers | ~100 tokens/skill
| - test: Testing best practices |
+--------------------------------------+
When model calls load_skill("git"):
+--------------------------------------+
| tool_result (Layer 2 -- on demand): |
| <skill name="git"> |
| Full git workflow instructions... | ~2000 tokens
| Step 1: ... |
| Step 2: ... |
| </skill> |
+--------------------------------------+
```
## How It Works
1. Skill files live in `.skills/` as Markdown with YAML frontmatter.
```
.skills/
git.md # ---\n description: Git workflow\n ---\n ...
test.md # ---\n description: Testing patterns\n ---\n ...
```
2. The SkillLoader parses frontmatter and separates metadata from body.
```python
class SkillLoader:
def _parse_frontmatter(self, text: str) -> tuple:
match = re.match(
r"^---\n(.*?)\n---\n(.*)", text, re.DOTALL
)
if not match:
return {}, text
meta = {}
for line in match.group(1).strip().splitlines():
if ":" in line:
key, val = line.split(":", 1)
meta[key.strip()] = val.strip()
return meta, match.group(2).strip()
```
3. Layer 1: `get_descriptions()` returns short lines for the system prompt.
```python
def get_descriptions(self) -> str:
lines = []
for name, skill in self.skills.items():
desc = skill["meta"].get("description", "No description")
lines.append(f" - {name}: {desc}")
return "\n".join(lines)
SYSTEM = f"""You are a coding agent at {WORKDIR}.
Skills available:
{SKILL_LOADER.get_descriptions()}"""
```
4. Layer 2: `get_content()` returns the full body wrapped in `<skill>` tags.
```python
def get_content(self, name: str) -> str:
skill = self.skills.get(name)
if not skill:
return f"Error: Unknown skill '{name}'."
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
```
5. The `load_skill` tool is just another entry in the dispatch map.
```python
TOOL_HANDLERS = {
# ...base tools...
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
}
```
## Key Code
The SkillLoader class (from `agents/s05_skill_loading.py`,
lines 51-97):
```python
class SkillLoader:
def __init__(self, skills_dir: Path):
self.skills = {}
for f in sorted(skills_dir.glob("*.md")):
text = f.read_text()
meta, body = self._parse_frontmatter(text)
self.skills[f.stem] = {
"meta": meta, "body": body
}
def get_descriptions(self) -> str:
lines = []
for name, skill in self.skills.items():
desc = skill["meta"].get("description", "")
lines.append(f" - {name}: {desc}")
return "\n".join(lines)
def get_content(self, name: str) -> str:
skill = self.skills.get(name)
if not skill:
return f"Error: Unknown skill '{name}'."
return (f"<skill name=\"{name}\">\n"
f"{skill['body']}\n</skill>")
```
## What Changed From s04
| Component | Before (s04) | After (s05) |
|----------------|------------------|----------------------------|
| Tools | 5 (base + task) | 5 (base + load_skill) |
| System prompt | Static string | + skill descriptions |
| Knowledge | None | .skills/*.md files |
| Injection | None | Two-layer (system + result)|
| Subagent | `run_subagent()` | Removed (different focus) |
## Design Rationale
Two-layer injection solves the attention budget problem. Putting all skill content in the system prompt wastes tokens on unused skills. Layer 1 (compact summaries) costs roughly 120 tokens total. Layer 2 (full content) loads on demand via tool_result. This scales to dozens of skills without degrading model attention quality. The key insight is that the model only needs to know what skills exist (cheap) to decide when to load one (expensive). This is the same lazy-loading principle used in software module systems.
## Try It
```sh
cd learn-claude-code
python agents/s05_skill_loading.py
```
Example prompts to try:
1. `What skills are available?`
2. `Load the agent-builder skill and follow its instructions`
3. `I need to do a code review -- load the relevant skill first`
4. `Build an MCP server using the mcp-builder skill`

View File

@@ -0,0 +1,183 @@
# s06: Compact
> A three-layer compression pipeline lets the agent work indefinitely by strategically forgetting old tool results, auto-summarizing when tokens exceed a threshold, and allowing manual compression on demand.
## The Problem
The context window is finite. After enough tool calls, the messages array
exceeds the model's context limit and the API call fails. Even before
hitting the hard limit, performance degrades: the model becomes slower,
less accurate, and starts ignoring earlier messages.
A 200,000 token context window sounds large, but a single `read_file` on
a 1000-line source file consumes ~4000 tokens. After reading 30 files and
running 20 bash commands, you are at 100,000+ tokens. The agent cannot
work on large codebases without some form of compression.
The three-layer pipeline addresses this with increasing aggressiveness:
Layer 1 (micro-compact) silently replaces old tool results every turn.
Layer 2 (auto-compact) triggers a full summarization when tokens exceed
a threshold. Layer 3 (manual compact) lets the model trigger compression
itself.
Teaching simplification: the token estimation here uses a rough
characters/4 heuristic. Production systems use proper tokenizer
libraries for accurate counts.
## The Solution
```
Every turn:
+------------------+
| Tool call result |
+------------------+
|
v
[Layer 1: micro_compact] (silent, every turn)
Replace tool_result > 3 turns old
with "[Previous: used {tool_name}]"
|
v
[Check: tokens > 50000?]
| |
no yes
| |
v v
continue [Layer 2: auto_compact]
Save transcript to .transcripts/
LLM summarizes conversation.
Replace all messages with [summary].
|
v
[Layer 3: compact tool]
Model calls compact explicitly.
Same summarization as auto_compact.
```
## How It Works
1. **Layer 1 -- micro_compact**: Before each LLM call, find all
tool_result entries older than the last 3 and replace their content.
```python
def micro_compact(messages: list) -> list:
tool_results = []
for i, msg in enumerate(messages):
if msg["role"] == "user" and isinstance(msg.get("content"), list):
for j, part in enumerate(msg["content"]):
if isinstance(part, dict) and part.get("type") == "tool_result":
tool_results.append((i, j, part))
if len(tool_results) <= KEEP_RECENT:
return messages
to_clear = tool_results[:-KEEP_RECENT]
for _, _, part in to_clear:
if len(part.get("content", "")) > 100:
tool_id = part.get("tool_use_id", "")
tool_name = tool_name_map.get(tool_id, "unknown")
part["content"] = f"[Previous: used {tool_name}]"
return messages
```
2. **Layer 2 -- auto_compact**: When estimated tokens exceed 50,000,
save the full transcript and ask the LLM to summarize.
```python
def auto_compact(messages: list) -> list:
TRANSCRIPT_DIR.mkdir(exist_ok=True)
transcript_path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
with open(transcript_path, "w") as f:
for msg in messages:
f.write(json.dumps(msg, default=str) + "\n")
response = client.messages.create(
model=MODEL,
messages=[{"role": "user", "content":
"Summarize this conversation for continuity..."
+ json.dumps(messages, default=str)[:80000]}],
max_tokens=2000,
)
summary = response.content[0].text
return [
{"role": "user", "content": f"[Compressed]\n\n{summary}"},
{"role": "assistant", "content": "Understood. Continuing."},
]
```
3. **Layer 3 -- manual compact**: The `compact` tool triggers the same
summarization on demand.
```python
if manual_compact:
messages[:] = auto_compact(messages)
```
4. The agent loop integrates all three layers.
```python
def agent_loop(messages: list):
while True:
micro_compact(messages)
if estimate_tokens(messages) > THRESHOLD:
messages[:] = auto_compact(messages)
response = client.messages.create(...)
# ... tool execution ...
if manual_compact:
messages[:] = auto_compact(messages)
```
## Key Code
The three-layer pipeline (from `agents/s06_context_compact.py`,
lines 67-93 and 189-223):
```python
THRESHOLD = 50000
KEEP_RECENT = 3
def micro_compact(messages):
# Replace old tool results with placeholders
...
def auto_compact(messages):
# Save transcript, LLM summarize, replace messages
...
def agent_loop(messages):
while True:
micro_compact(messages) # Layer 1
if estimate_tokens(messages) > THRESHOLD:
messages[:] = auto_compact(messages) # Layer 2
response = client.messages.create(...)
# ...
if manual_compact:
messages[:] = auto_compact(messages) # Layer 3
```
## What Changed From s05
| Component | Before (s05) | After (s06) |
|----------------|------------------|----------------------------|
| Tools | 5 | 5 (base + compact) |
| Context mgmt | None | Three-layer compression |
| Micro-compact | None | Old results -> placeholders|
| Auto-compact | None | Token threshold trigger |
| Manual compact | None | `compact` tool |
| Transcripts | None | Saved to .transcripts/ |
| Skills | load_skill | Removed (different focus) |
## Design Rationale
Context windows are finite, but agent sessions can be infinite. Three compression layers solve this at different granularities: micro-compact (replace old tool outputs), auto-compact (LLM summarizes when approaching limit), and manual compact (user-triggered). The key insight is that forgetting is a feature, not a bug -- it enables unbounded sessions. Transcripts preserve the full history on disk so nothing is truly lost, just moved out of the active context. The layered approach lets each layer operate independently at its own granularity, from silent per-turn cleanup to full conversation reset.
## Try It
```sh
cd learn-claude-code
python agents/s06_context_compact.py
```
Example prompts to try:
1. `Read every Python file in the agents/ directory one by one`
(watch micro-compact replace old results)
2. `Keep reading files until compression triggers automatically`
3. `Use the compact tool to manually compress the conversation`

172
docs/en/s07-task-system.md Normal file
View File

@@ -0,0 +1,172 @@
# s07: Tasks
> Tasks persist as JSON files on the filesystem with a dependency graph, so they survive context compression and can be shared across agents.
## The Problem
In-memory state like TodoManager (s03) is lost when the context is
compressed (s06). After auto_compact replaces messages with a summary,
the todo list is gone. The agent has to reconstruct it from the summary
text, which is lossy and error-prone.
This is the critical s06-to-s07 bridge: TodoManager items die with
compression; file-based tasks don't. Moving state to the filesystem
makes it compression-proof.
More fundamentally, in-memory state is invisible to other agents.
When we eventually build teams (s09+), teammates need a shared task
board. In-memory data structures are process-local.
The solution is to persist tasks as JSON files in `.tasks/`. Each task
is a separate file with an ID, subject, status, and dependency graph.
Completing task 1 automatically unblocks task 2 if task 2 has
`blockedBy: [1]`. The file system becomes the source of truth.
## The Solution
```
.tasks/
task_1.json {"id":1, "status":"completed", ...}
task_2.json {"id":2, "blockedBy":[1], "status":"pending"}
task_3.json {"id":3, "blockedBy":[2], "status":"pending"}
Dependency resolution:
+----------+ +----------+ +----------+
| task 1 | --> | task 2 | --> | task 3 |
| complete | | blocked | | blocked |
+----------+ +----------+ +----------+
| ^
+--- completing task 1 removes it from
task 2's blockedBy list
```
## How It Works
1. The TaskManager provides CRUD operations. Each task is a JSON file.
```python
class TaskManager:
def create(self, subject: str, description: str = "") -> str:
task = {
"id": self._next_id,
"subject": subject,
"description": description,
"status": "pending",
"blockedBy": [],
"blocks": [],
"owner": "",
}
self._save(task)
self._next_id += 1
return json.dumps(task, indent=2)
```
2. When a task is marked completed, `_clear_dependency` removes its ID
from all other tasks' `blockedBy` lists.
```python
def _clear_dependency(self, completed_id: int):
for f in self.dir.glob("task_*.json"):
task = json.loads(f.read_text())
if completed_id in task.get("blockedBy", []):
task["blockedBy"].remove(completed_id)
self._save(task)
```
3. The `update` method handles status changes and bidirectional dependency
wiring.
```python
def update(self, task_id, status=None,
add_blocked_by=None, add_blocks=None):
task = self._load(task_id)
if status:
task["status"] = status
if status == "completed":
self._clear_dependency(task_id)
if add_blocks:
task["blocks"] = list(set(task["blocks"] + add_blocks))
for blocked_id in add_blocks:
blocked = self._load(blocked_id)
if task_id not in blocked["blockedBy"]:
blocked["blockedBy"].append(task_id)
self._save(blocked)
self._save(task)
```
4. Four task tools are added to the dispatch map.
```python
TOOL_HANDLERS = {
# ...base tools...
"task_create": lambda **kw: TASKS.create(kw["subject"]),
"task_update": lambda **kw: TASKS.update(kw["task_id"],
kw.get("status")),
"task_list": lambda **kw: TASKS.list_all(),
"task_get": lambda **kw: TASKS.get(kw["task_id"]),
}
```
## Key Code
The TaskManager with dependency graph (from `agents/s07_task_system.py`,
lines 46-123):
```python
class TaskManager:
def __init__(self, tasks_dir: Path):
self.dir = tasks_dir
self.dir.mkdir(exist_ok=True)
self._next_id = self._max_id() + 1
def _load(self, task_id: int) -> dict:
path = self.dir / f"task_{task_id}.json"
return json.loads(path.read_text())
def _save(self, task: dict):
path = self.dir / f"task_{task['id']}.json"
path.write_text(json.dumps(task, indent=2))
def create(self, subject, description=""):
task = {"id": self._next_id, "subject": subject,
"status": "pending", "blockedBy": [],
"blocks": [], "owner": ""}
self._save(task)
self._next_id += 1
return json.dumps(task, indent=2)
def _clear_dependency(self, completed_id):
for f in self.dir.glob("task_*.json"):
task = json.loads(f.read_text())
if completed_id in task.get("blockedBy", []):
task["blockedBy"].remove(completed_id)
self._save(task)
```
## What Changed From s06
| Component | Before (s06) | After (s07) |
|----------------|------------------|----------------------------|
| Tools | 5 | 8 (+task_create/update/list/get)|
| State storage | In-memory only | JSON files in .tasks/ |
| Dependencies | None | blockedBy + blocks graph |
| Compression | Three-layer | Removed (different focus) |
| Persistence | Lost on compact | Survives compression |
## Design Rationale
File-based state survives context compression. When the agent's conversation is compacted, in-memory state is lost, but tasks written to disk persist. The dependency graph ensures correct execution order even after context loss. This is the bridge between ephemeral conversation and persistent work -- the agent can forget conversation details but always has the task board to remind it what needs doing. The filesystem as source of truth also enables future multi-agent sharing, since any process can read the same JSON files.
## Try It
```sh
cd learn-claude-code
python agents/s07_task_system.py
```
Example prompts to try:
1. `Create 3 tasks: "Setup project", "Write code", "Write tests". Make them depend on each other in order.`
2. `List all tasks and show the dependency graph`
3. `Complete task 1 and then list tasks to see task 2 unblocked`
4. `Create a task board for refactoring: parse -> transform -> emit -> test`

View File

@@ -0,0 +1,188 @@
# s08: Background Tasks
> A BackgroundManager runs commands in separate threads and drains a notification queue before each LLM call, so the agent never blocks on long-running operations.
## The Problem
Some commands take minutes: `npm install`, `pytest`, `docker build`. With
a blocking agent loop, the model sits idle waiting for the subprocess to
finish. It cannot do anything else. If the user asked "install dependencies
and while that runs, create the config file," the agent would install
first, _then_ create the config -- sequentially, not in parallel.
The agent needs concurrency. Not full multi-threading of the agent loop
itself, but the ability to fire off a long command and continue working
while it runs. When the command finishes, its result should appear
naturally in the conversation.
The solution is a BackgroundManager that runs commands in daemon threads
and collects results in a notification queue. Before each LLM call, the
queue is drained and results are injected into the messages.
## The Solution
```
Main thread Background thread
+-----------------+ +-----------------+
| agent loop | | task executes |
| ... | | ... |
| [LLM call] <---+------- | enqueue(result) |
| ^drain queue | +-----------------+
+-----------------+
Timeline:
Agent --[spawn A]--[spawn B]--[other work]----
| |
v v
[A runs] [B runs] (parallel)
| |
+-- notification queue --+
|
[results injected before
next LLM call]
```
## How It Works
1. The BackgroundManager tracks tasks and maintains a thread-safe
notification queue.
```python
class BackgroundManager:
def __init__(self):
self.tasks = {}
self._notification_queue = []
self._lock = threading.Lock()
```
2. `run()` starts a daemon thread and returns a task_id immediately.
```python
def run(self, command: str) -> str:
task_id = str(uuid.uuid4())[:8]
self.tasks[task_id] = {
"status": "running",
"result": None,
"command": command,
}
thread = threading.Thread(
target=self._execute,
args=(task_id, command),
daemon=True,
)
thread.start()
return f"Background task {task_id} started"
```
3. The thread target `_execute` runs the subprocess and pushes
results to the notification queue.
```python
def _execute(self, task_id: str, command: str):
try:
r = subprocess.run(command, shell=True, cwd=WORKDIR,
capture_output=True, text=True, timeout=300)
output = (r.stdout + r.stderr).strip()[:50000]
status = "completed"
except subprocess.TimeoutExpired:
output = "Error: Timeout (300s)"
status = "timeout"
self.tasks[task_id]["status"] = status
self.tasks[task_id]["result"] = output
with self._lock:
self._notification_queue.append({
"task_id": task_id,
"status": status,
"result": output[:500],
})
```
4. `drain_notifications()` returns and clears pending results.
```python
def drain_notifications(self) -> list:
with self._lock:
notifs = list(self._notification_queue)
self._notification_queue.clear()
return notifs
```
5. The agent loop drains notifications before each LLM call.
```python
def agent_loop(messages: list):
while True:
notifs = BG.drain_notifications()
if notifs and messages:
notif_text = "\n".join(
f"[bg:{n['task_id']}] {n['status']}: "
f"{n['result']}" for n in notifs
)
messages.append({"role": "user",
"content": f"<background-results>"
f"\n{notif_text}\n"
f"</background-results>"})
messages.append({"role": "assistant",
"content": "Noted background results."})
response = client.messages.create(...)
```
## Key Code
The BackgroundManager (from `agents/s08_background_tasks.py`, lines 49-107):
```python
class BackgroundManager:
def __init__(self):
self.tasks = {}
self._notification_queue = []
self._lock = threading.Lock()
def run(self, command: str) -> str:
task_id = str(uuid.uuid4())[:8]
self.tasks[task_id] = {"status": "running",
"result": None,
"command": command}
thread = threading.Thread(
target=self._execute,
args=(task_id, command), daemon=True)
thread.start()
return f"Background task {task_id} started"
def _execute(self, task_id, command):
# run subprocess, push to queue
...
def drain_notifications(self) -> list:
with self._lock:
notifs = list(self._notification_queue)
self._notification_queue.clear()
return notifs
```
## What Changed From s07
| Component | Before (s07) | After (s08) |
|----------------|------------------|----------------------------|
| Tools | 8 | 6 (base + background_run + check)|
| Execution | Blocking only | Blocking + background threads|
| Notification | None | Queue drained per loop |
| Concurrency | None | Daemon threads |
| Task system | File-based CRUD | Removed (different focus) |
## Design Rationale
The agent loop is inherently single-threaded (one LLM call at a time). Background threads break this constraint for I/O-bound work (tests, builds, installs). The notification queue pattern ("drain before next LLM call") ensures results arrive at natural conversation breakpoints rather than interrupting the model's reasoning mid-thought. This is a minimal concurrency model: the agent loop stays single-threaded and deterministic, while only the I/O-bound subprocess execution is parallelized.
## Try It
```sh
cd learn-claude-code
python agents/s08_background_tasks.py
```
Example prompts to try:
1. `Run "sleep 5 && echo done" in the background, then create a file while it runs`
2. `Start 3 background tasks: "sleep 2", "sleep 4", "sleep 6". Check their status.`
3. `Run pytest in the background and keep working on other things`

233
docs/en/s09-agent-teams.md Normal file
View File

@@ -0,0 +1,233 @@
# s09: Agent Teams
> Persistent teammates with JSONL inboxes turn isolated agents into a communicating team -- spawn, message, broadcast, and drain.
## The Problem
Subagents (s04) are disposable: spawn, work, return summary, die. They
have no identity, no memory between invocations, and no way to receive
follow-up instructions. Background tasks (s08) run shell commands but
cannot make LLM-guided decisions or communicate findings.
For real teamwork you need three things: (1) persistent agents that
survive beyond a single prompt, (2) identity and lifecycle management,
and (3) a communication channel between agents. Without messaging, even
persistent teammates are deaf and mute -- they can work in parallel but
never coordinate.
The solution combines a TeammateManager for spawning persistent named
agents with a MessageBus using JSONL inbox files. Each teammate runs
its own agent loop in a thread, checks its inbox before every LLM call,
and can send messages to any other teammate or the lead.
Note on the s06-to-s07 bridge: TodoManager items from s03 die with
compression (s06). File-based tasks (s07) survive compression because
they live on disk. Teams build on this same principle -- config.json and
inbox files persist outside the context window.
## The Solution
```
Teammate lifecycle:
spawn -> WORKING -> IDLE -> WORKING -> ... -> SHUTDOWN
Communication:
.team/
config.json <- team roster + statuses
inbox/
alice.jsonl <- append-only, drain-on-read
bob.jsonl
lead.jsonl
+--------+ send("alice","bob","...") +--------+
| alice | -----------------------------> | bob |
| loop | bob.jsonl << {json_line} | loop |
+--------+ +--------+
^ |
| BUS.read_inbox("alice") |
+---- alice.jsonl -> read + drain ---------+
5 message types:
+-------------------------+------------------------------+
| message | Normal text between agents |
| broadcast | Sent to all teammates |
| shutdown_request | Request graceful shutdown |
| shutdown_response | Approve/reject shutdown |
| plan_approval_response | Approve/reject plan |
+-------------------------+------------------------------+
```
## How It Works
1. The TeammateManager maintains config.json with the team roster.
Each member has a name, role, and status.
```python
class TeammateManager:
def __init__(self, team_dir: Path):
self.dir = team_dir
self.dir.mkdir(exist_ok=True)
self.config_path = self.dir / "config.json"
self.config = self._load_config()
self.threads = {}
```
2. `spawn()` creates a teammate and starts its agent loop in a thread.
Re-spawning an idle teammate reactivates it.
```python
def spawn(self, name: str, role: str, prompt: str) -> str:
member = self._find_member(name)
if member:
if member["status"] not in ("idle", "shutdown"):
return f"Error: '{name}' is currently {member['status']}"
member["status"] = "working"
else:
member = {"name": name, "role": role, "status": "working"}
self.config["members"].append(member)
self._save_config()
thread = threading.Thread(
target=self._teammate_loop,
args=(name, role, prompt), daemon=True)
self.threads[name] = thread
thread.start()
return f"Spawned teammate '{name}' (role: {role})"
```
3. The MessageBus handles JSONL inbox files. `send()` appends a JSON
line; `read_inbox()` reads all lines and drains the file.
```python
class MessageBus:
def send(self, sender, to, content,
msg_type="message", extra=None):
msg = {"type": msg_type, "from": sender,
"content": content,
"timestamp": time.time()}
if extra:
msg.update(extra)
with open(self.dir / f"{to}.jsonl", "a") as f:
f.write(json.dumps(msg) + "\n")
return f"Sent {msg_type} to {to}"
def read_inbox(self, name):
path = self.dir / f"{name}.jsonl"
if not path.exists():
return "[]"
msgs = [json.loads(l)
for l in path.read_text().strip().splitlines()
if l]
path.write_text("") # drain
return json.dumps(msgs, indent=2)
```
4. Each teammate checks its inbox before every LLM call and injects
received messages into the conversation context.
```python
def _teammate_loop(self, name, role, prompt):
sys_prompt = f"You are '{name}', role: {role}, at {WORKDIR}."
messages = [{"role": "user", "content": prompt}]
for _ in range(50):
inbox = BUS.read_inbox(name)
if inbox != "[]":
messages.append({"role": "user",
"content": f"<inbox>{inbox}</inbox>"})
messages.append({"role": "assistant",
"content": "Noted inbox messages."})
response = client.messages.create(
model=MODEL, system=sys_prompt,
messages=messages, tools=TOOLS)
messages.append({"role": "assistant",
"content": response.content})
if response.stop_reason != "tool_use":
break
# execute tools, append results...
self._find_member(name)["status"] = "idle"
self._save_config()
```
5. `broadcast()` sends the same message to all teammates except the
sender.
```python
def broadcast(self, sender, content, teammates):
count = 0
for name in teammates:
if name != sender:
self.send(sender, name, content, "broadcast")
count += 1
return f"Broadcast to {count} teammates"
```
## Key Code
The TeammateManager + MessageBus core (from `agents/s09_agent_teams.py`):
```python
class TeammateManager:
def spawn(self, name, role, prompt):
member = self._find_member(name) or {
"name": name, "role": role, "status": "working"
}
member["status"] = "working"
self._save_config()
thread = threading.Thread(
target=self._teammate_loop,
args=(name, role, prompt), daemon=True)
thread.start()
return f"Spawned '{name}'"
class MessageBus:
def send(self, sender, to, content,
msg_type="message", extra=None):
msg = {"type": msg_type, "from": sender,
"content": content, "timestamp": time.time()}
if extra: msg.update(extra)
with open(self.dir / f"{to}.jsonl", "a") as f:
f.write(json.dumps(msg) + "\n")
def read_inbox(self, name):
path = self.dir / f"{name}.jsonl"
if not path.exists(): return "[]"
msgs = [json.loads(l)
for l in path.read_text().strip().splitlines()
if l]
path.write_text("")
return json.dumps(msgs, indent=2)
```
## What Changed From s08
| Component | Before (s08) | After (s09) |
|----------------|------------------|----------------------------|
| Tools | 6 | 9 (+spawn/send/read_inbox) |
| Agents | Single | Lead + N teammates |
| Persistence | None | config.json + JSONL inboxes|
| Threads | Background cmds | Full agent loops per thread|
| Lifecycle | Fire-and-forget | idle -> working -> idle |
| Communication | None | 5 message types + broadcast|
Teaching simplification: this implementation does not use lock files
for inbox access. In production, concurrent append from multiple writers
would need file locking or atomic rename. The single-writer-per-inbox
pattern used here is safe for the teaching scenario.
## Design Rationale
File-based mailboxes (append-only JSONL) provide concurrency-safe inter-agent communication. Append is atomic on most filesystems, avoiding lock contention. The "drain on read" pattern (read all, truncate) gives batch delivery. This is simpler and more robust than shared memory or socket-based IPC for agent coordination. The tradeoff is latency -- messages are only seen at the next poll -- but for LLM-driven agents where each turn takes seconds, polling latency is negligible compared to inference time.
## Try It
```sh
cd learn-claude-code
python agents/s09_agent_teams.py
```
Example prompts to try:
1. `Spawn alice (coder) and bob (tester). Have alice send bob a message.`
2. `Broadcast "status update: phase 1 complete" to all teammates`
3. `Check the lead inbox for any messages`
4. Type `/team` to see the team roster with statuses
5. Type `/inbox` to manually check the lead's inbox

View File

@@ -0,0 +1,204 @@
# s10: Team Protocols
> The same request_id handshake pattern powers both shutdown and plan approval -- one FSM, two applications.
## The Problem
In s09, teammates work and communicate but there is no structured
coordination. Two problems arise:
**Shutdown**: How do you stop a teammate cleanly? Killing the thread
leaves files partially written and config.json in a wrong state.
Graceful shutdown requires a handshake: the lead requests, the teammate
decides whether to approve (finish and exit) or reject (keep working).
**Plan approval**: How do you gate execution? When the lead says
"refactor the auth module," the teammate starts immediately. For
high-risk changes, the lead should review the plan before execution
begins. A junior proposes, a senior approves.
Both problems share the same structure: one side sends a request with a
unique ID, the other side responds referencing that ID. A finite state
machine tracks each request through pending -> approved | rejected.
## The Solution
```
Shutdown Protocol Plan Approval Protocol
================== ======================
Lead Teammate Teammate Lead
| | | |
|--shutdown_req-->| |--plan_req------>|
| {req_id:"abc"} | | {req_id:"xyz"} |
| | | |
|<--shutdown_resp-| |<--plan_resp-----|
| {req_id:"abc", | | {req_id:"xyz", |
| approve:true} | | approve:true} |
| | | |
v v v v
tracker["abc"] exits proceeds tracker["xyz"]
= approved = approved
Shared FSM (identical for both protocols):
[pending] --approve--> [approved]
[pending] --reject---> [rejected]
Trackers:
shutdown_requests = {req_id: {target, status}}
plan_requests = {req_id: {from, plan, status}}
```
## How It Works
1. The lead initiates shutdown by generating a request_id and sending
a shutdown_request through the inbox.
```python
shutdown_requests = {}
def handle_shutdown_request(teammate: str) -> str:
req_id = str(uuid.uuid4())[:8]
shutdown_requests[req_id] = {
"target": teammate, "status": "pending",
}
BUS.send("lead", teammate, "Please shut down gracefully.",
"shutdown_request", {"request_id": req_id})
return f"Shutdown request {req_id} sent (status: pending)"
```
2. The teammate receives the request in its inbox and calls the
`shutdown_response` tool to approve or reject.
```python
if tool_name == "shutdown_response":
req_id = args["request_id"]
approve = args["approve"]
if req_id in shutdown_requests:
shutdown_requests[req_id]["status"] = \
"approved" if approve else "rejected"
BUS.send(sender, "lead", args.get("reason", ""),
"shutdown_response",
{"request_id": req_id, "approve": approve})
return f"Shutdown {'approved' if approve else 'rejected'}"
```
3. The teammate loop checks for approved shutdown and exits.
```python
if (block.name == "shutdown_response"
and block.input.get("approve")):
should_exit = True
# ...
member["status"] = "shutdown" if should_exit else "idle"
```
4. Plan approval follows the identical pattern. The teammate submits
a plan, generating a request_id.
```python
plan_requests = {}
if tool_name == "plan_approval":
plan_text = args.get("plan", "")
req_id = str(uuid.uuid4())[:8]
plan_requests[req_id] = {
"from": sender, "plan": plan_text,
"status": "pending",
}
BUS.send(sender, "lead", plan_text,
"plan_approval_request",
{"request_id": req_id, "plan": plan_text})
return f"Plan submitted (request_id={req_id})"
```
5. The lead reviews and responds with the same request_id.
```python
def handle_plan_review(request_id, approve, feedback=""):
req = plan_requests.get(request_id)
if not req:
return f"Error: Unknown request_id '{request_id}'"
req["status"] = "approved" if approve else "rejected"
BUS.send("lead", req["from"], feedback,
"plan_approval_response",
{"request_id": request_id,
"approve": approve,
"feedback": feedback})
return f"Plan {req['status']} for '{req['from']}'"
```
6. Both protocols use the same `plan_approval` tool name with two
modes: teammates submit (no request_id), the lead reviews (with
request_id).
```python
# Lead tool dispatch:
"plan_approval": lambda **kw: handle_plan_review(
kw["request_id"], kw["approve"],
kw.get("feedback", "")),
# Teammate: submit mode (generate request_id)
```
## Key Code
The dual protocol handlers (from `agents/s10_team_protocols.py`):
```python
shutdown_requests = {}
plan_requests = {}
# -- Shutdown --
def handle_shutdown_request(teammate):
req_id = str(uuid.uuid4())[:8]
shutdown_requests[req_id] = {
"target": teammate, "status": "pending"
}
BUS.send("lead", teammate,
"Please shut down gracefully.",
"shutdown_request",
{"request_id": req_id})
# -- Plan Approval --
def handle_plan_review(request_id, approve, feedback=""):
req = plan_requests[request_id]
req["status"] = "approved" if approve else "rejected"
BUS.send("lead", req["from"], feedback,
"plan_approval_response",
{"request_id": request_id,
"approve": approve})
# Both use the same FSM:
# pending -> approved | rejected
# Both correlate by request_id across async inboxes
```
## What Changed From s09
| Component | Before (s09) | After (s10) |
|----------------|------------------|------------------------------|
| Tools | 9 | 12 (+shutdown_req/resp +plan)|
| Shutdown | Natural exit only| Request-response handshake |
| Plan gating | None | Submit/review with approval |
| Request tracking| None | Two tracker dicts |
| Correlation | None | request_id per request |
| FSM | None | pending -> approved/rejected |
## Design Rationale
The request_id correlation pattern turns any async interaction into a trackable finite state machine. The same 3-state machine (pending -> approved/rejected) applies to shutdown, plan approval, or any future protocol. This is why one pattern handles multiple protocols -- the FSM does not care what it is approving. The request_id provides correlation across async inboxes where messages may arrive out of order, making the pattern robust to timing variations between agents.
## Try It
```sh
cd learn-claude-code
python agents/s10_team_protocols.py
```
Example prompts to try:
1. `Spawn alice as a coder. Then request her shutdown.`
2. `List teammates to see alice's status after shutdown approval`
3. `Spawn bob with a risky refactoring task. Review and reject his plan.`
4. `Spawn charlie, have him submit a plan, then approve it.`
5. Type `/team` to monitor statuses

View File

@@ -0,0 +1,232 @@
# s11: Autonomous Agents
> An idle cycle with task board polling lets teammates find and claim work themselves, with identity re-injection after context compression.
## The Problem
In s09-s10, teammates only work when explicitly told to. The lead must
spawn each teammate with a specific prompt. If the task board has 10
unclaimed tasks, the lead must manually assign each one. This does not
scale.
True autonomy means teammates find work themselves. When a teammate
finishes its current task, it should scan the task board for unclaimed
work, claim a task, and start working -- without any instruction from
the lead.
But autonomous agents face a subtlety: after context compression, the
agent might forget who it is. If the messages are summarized, the
original system prompt identity ("you are alice, role: coder") fades.
Identity re-injection solves this by inserting an identity block at the
start of compressed contexts.
Teaching simplification: the token estimation used here is rough
(characters / 4). Production systems use proper tokenizer libraries.
The nag threshold of 3 rounds (from s03) is set low for teaching
visibility; production agents typically use a higher threshold around 10.
## The Solution
```
Teammate lifecycle with idle cycle:
+-------+
| spawn |
+---+---+
|
v
+-------+ tool_use +-------+
| WORK | <------------- | LLM |
+---+---+ +-------+
|
| stop_reason != tool_use
| (or idle tool called)
v
+--------+
| IDLE | poll every 5s for up to 60s
+---+----+
|
+---> check inbox --> message? ----------> WORK
|
+---> scan .tasks/ --> unclaimed? -------> claim -> WORK
|
+---> 60s timeout ----------------------> SHUTDOWN
Identity re-injection after compression:
if len(messages) <= 3:
messages.insert(0, identity_block)
"You are 'alice', role: coder, team: my-team"
```
## How It Works
1. The teammate loop has two phases: WORK and IDLE. WORK runs the
standard agent loop. When the LLM stops calling tools (or calls
the `idle` tool), the teammate enters the IDLE phase.
```python
def _loop(self, name, role, prompt):
while True:
# -- WORK PHASE --
messages = [{"role": "user", "content": prompt}]
for _ in range(50):
inbox = BUS.read_inbox(name)
for msg in inbox:
if msg.get("type") == "shutdown_request":
self._set_status(name, "shutdown")
return
messages.append(...)
response = client.messages.create(...)
if response.stop_reason != "tool_use":
break
# execute tools...
if idle_requested:
break
# -- IDLE PHASE --
self._set_status(name, "idle")
resume = self._idle_poll(name, messages)
if not resume:
self._set_status(name, "shutdown")
return
self._set_status(name, "working")
```
2. The idle phase polls the inbox and task board in a loop.
```python
def _idle_poll(self, name, messages):
polls = IDLE_TIMEOUT // POLL_INTERVAL # 60s / 5s = 12
for _ in range(polls):
time.sleep(POLL_INTERVAL)
# Check inbox for new messages
inbox = BUS.read_inbox(name)
if inbox:
messages.append({"role": "user",
"content": f"<inbox>{inbox}</inbox>"})
return True
# Scan task board for unclaimed tasks
unclaimed = scan_unclaimed_tasks()
if unclaimed:
task = unclaimed[0]
claim_task(task["id"], name)
messages.append({"role": "user",
"content": f"<auto-claimed>Task #{task['id']}: "
f"{task['subject']}</auto-claimed>"})
return True
return False # timeout -> shutdown
```
3. Task board scanning looks for pending, unowned, unblocked tasks.
```python
def scan_unclaimed_tasks() -> list:
TASKS_DIR.mkdir(exist_ok=True)
unclaimed = []
for f in sorted(TASKS_DIR.glob("task_*.json")):
task = json.loads(f.read_text())
if (task.get("status") == "pending"
and not task.get("owner")
and not task.get("blockedBy")):
unclaimed.append(task)
return unclaimed
def claim_task(task_id: int, owner: str):
path = TASKS_DIR / f"task_{task_id}.json"
task = json.loads(path.read_text())
task["status"] = "in_progress"
task["owner"] = owner
path.write_text(json.dumps(task, indent=2))
```
4. Identity re-injection inserts an identity block when the context
is too short, indicating compression has occurred.
```python
def make_identity_block(name, role, team_name):
return {"role": "user",
"content": f"<identity>You are '{name}', "
f"role: {role}, team: {team_name}. "
f"Continue your work.</identity>"}
# Before resuming work after idle:
if len(messages) <= 3:
messages.insert(0, make_identity_block(
name, role, team_name))
messages.insert(1, {"role": "assistant",
"content": f"I am {name}. Continuing."})
```
5. The `idle` tool lets the teammate explicitly signal it has no more
work, entering the idle polling phase early.
```python
{"name": "idle",
"description": "Signal that you have no more work. "
"Enters idle polling phase.",
"input_schema": {"type": "object", "properties": {}}},
```
## Key Code
The autonomous loop (from `agents/s11_autonomous_agents.py`):
```python
def _loop(self, name, role, prompt):
while True:
# WORK PHASE
for _ in range(50):
response = client.messages.create(...)
if response.stop_reason != "tool_use":
break
for block in response.content:
if block.name == "idle":
idle_requested = True
if idle_requested:
break
# IDLE PHASE
self._set_status(name, "idle")
for _ in range(IDLE_TIMEOUT // POLL_INTERVAL):
time.sleep(POLL_INTERVAL)
inbox = BUS.read_inbox(name)
if inbox: resume = True; break
unclaimed = scan_unclaimed_tasks()
if unclaimed:
claim_task(unclaimed[0]["id"], name)
resume = True; break
if not resume:
self._set_status(name, "shutdown")
return
self._set_status(name, "working")
```
## What Changed From s10
| Component | Before (s10) | After (s11) |
|----------------|------------------|----------------------------|
| Tools | 12 | 14 (+idle, +claim_task) |
| Autonomy | Lead-directed | Self-organizing |
| Idle phase | None | Poll inbox + task board |
| Task claiming | Manual only | Auto-claim unclaimed tasks |
| Identity | System prompt | + re-injection after compress|
| Timeout | None | 60s idle -> auto shutdown |
## Design Rationale
Polling + timeout makes agents self-organizing without a central coordinator. Each agent independently polls the task board, claims unclaimed work, and returns to idle when done. The timeout triggers the poll cycle, and if no work appears within the window, the agent shuts itself down. This is the same pattern as work-stealing thread pools -- distributed, no single point of failure. Identity re-injection after compression ensures agents maintain their role even when conversation history is summarized away.
## Try It
```sh
cd learn-claude-code
python agents/s11_autonomous_agents.py
```
Example prompts to try:
1. `Create 3 tasks on the board, then spawn alice and bob. Watch them auto-claim.`
2. `Spawn a coder teammate and let it find work from the task board itself`
3. `Create tasks with dependencies. Watch teammates respect the blocked order.`
4. Type `/tasks` to see the task board with owners
5. Type `/team` to monitor who is working vs idle