better doc

This commit is contained in:
CrazyBoyM
2026-02-27 01:11:57 +08:00
parent aea8844bac
commit 665831c774
46 changed files with 1217 additions and 3505 deletions

View File

@@ -1,49 +1,37 @@
# s01: The Agent Loop
> The core of a coding agent is a while loop that feeds tool results back to the model until the model decides to stop.
`[ s01 ] s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
## The Problem
> *"One loop & Bash is all you need"* -- one tool + one loop = an agent.
Why can't a language model just answer a coding question? Because coding
requires _interaction with the real world_. The model needs to read files,
run tests, check errors, and iterate. A single prompt-response pair cannot
do this.
## Problem
Without the agent loop, you would have to copy-paste outputs back into the
model yourself. The user becomes the loop. The agent loop automates this:
call the model, execute whatever tools it asks for, feed the results back,
repeat until the model says "I'm done."
A language model can reason about code, but it can't *touch* the real world -- can't read files, run tests, or check errors. Without a loop, every tool call requires you to manually copy-paste results back. You become the loop.
Consider a simple task: "Create a Python file that prints hello." The model
needs to (1) decide to write a file, (2) write it, (3) verify it works.
That is three tool calls minimum. Without a loop, each one requires manual
human intervention.
## The Solution
## Solution
```
+----------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tool |
| prompt | | | | execute |
+----------+ +---+---+ +----+----+
^ |
| tool_result |
+---------------+
(loop continues)
The loop terminates when stop_reason != "tool_use".
That single condition is the entire control flow.
+--------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tool |
| prompt | | | | execute |
+--------+ +---+---+ +----+----+
^ |
| tool_result |
+----------------+
(loop until stop_reason != "tool_use")
```
One exit condition controls the entire flow. The loop runs until the model stops calling tools.
## How It Works
1. The user provides a prompt. It becomes the first message.
1. User prompt becomes the first message.
```python
history.append({"role": "user", "content": query})
messages.append({"role": "user", "content": query})
```
2. The messages array is sent to the LLM along with the tool definitions.
2. Send messages + tool definitions to the LLM.
```python
response = client.messages.create(
@@ -52,25 +40,18 @@ response = client.messages.create(
)
```
3. The assistant response is appended to messages.
3. Append the assistant response. Check `stop_reason` -- if the model didn't call a tool, we're done.
```python
messages.append({"role": "assistant", "content": response.content})
```
4. We check the stop reason. If the model did not call a tool, the loop
ends. In this minimal lesson implementation, this is the only loop exit
condition.
```python
if response.stop_reason != "tool_use":
return
```
5. For each tool_use block in the response, execute the tool (bash in this
session) and collect results.
4. Execute each tool call, collect results, append as a user message. Loop back to step 2.
```python
results = []
for block in response.content:
if block.type == "tool_use":
output = run_bash(block.input["command"])
@@ -79,29 +60,24 @@ for block in response.content:
"tool_use_id": block.id,
"content": output,
})
```
6. The results are appended as a user message, and the loop continues.
```python
messages.append({"role": "user", "content": results})
```
## Key Code
The minimum viable agent -- the entire pattern in under 30 lines
(from `agents/s01_agent_loop.py`, lines 66-86):
Assembled into one function:
```python
def agent_loop(messages: list):
def agent_loop(query):
messages = [{"role": "user", "content": query}]
while True:
response = client.messages.create(
model=MODEL, system=SYSTEM, messages=messages,
tools=TOOLS, max_tokens=8000,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return
results = []
for block in response.content:
if block.type == "tool_use":
@@ -114,9 +90,9 @@ def agent_loop(messages: list):
messages.append({"role": "user", "content": results})
```
## What Changed
That's the entire agent in under 30 lines. Everything else in this course layers on top -- without changing the loop.
This is session 1 -- the starting point. There is no prior session.
## What Changed
| Component | Before | After |
|---------------|------------|--------------------------------|
@@ -125,10 +101,6 @@ This is session 1 -- the starting point. There is no prior session.
| Messages | (none) | Accumulating list |
| Control flow | (none) | `stop_reason != "tool_use"` |
## Design Rationale
This loop is the foundation of LLM-based agents. Production implementations add error handling, token counting, streaming, retry logic, permission policy, and lifecycle orchestration, but the core interaction pattern still starts here. The simplicity is the point for this session: in this minimal implementation, one exit condition (`stop_reason != "tool_use"`) controls the flow we need to learn first. Everything else in this course layers on top of this loop. Understanding this loop gives you the base model, not the full production architecture.
## Try It
```sh
@@ -136,8 +108,6 @@ cd learn-claude-code
python agents/s01_agent_loop.py
```
Example prompts to try:
1. `Create a file called hello.py that prints "Hello, World!"`
2. `List all Python files in this directory`
3. `What is the current git branch?`

View File

@@ -1,47 +1,43 @@
# s02: Tools
# s02: Tool Use
> A dispatch map routes tool calls to handler functions. The loop stays identical.
`s01 > [ s02 ] s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
## The Problem
> *"The loop didn't change"* -- adding tools means adding handlers, not rewriting the loop.
With only `bash`, the agent shells out for everything: reading files,
writing files, editing files. This works but is fragile. `cat` output
gets truncated unpredictably. `sed` replacements fail on special
characters. The model wastes tokens constructing shell pipelines when
a direct function call would be simpler.
## Problem
More importantly, bash is a security surface. Every bash call can do
anything the shell can do. With dedicated tools like `read_file` and
`write_file`, you can enforce path sandboxing and block dangerous
patterns at the tool level rather than hoping the model avoids them.
With only `bash`, the agent shells out for everything. `cat` truncates unpredictably, `sed` fails on special characters, and every bash call is an unconstrained security surface. Dedicated tools like `read_file` and `write_file` let you enforce path sandboxing at the tool level.
The insight is that adding tools does not require changing the loop.
The loop from s01 stays identical. You add entries to the tools array,
add handler functions, and wire them together with a dispatch map.
The key insight: adding tools does not require changing the loop.
## The Solution
## Solution
```
+----------+ +-------+ +------------------+
| User | ---> | LLM | ---> | Tool Dispatch |
| prompt | | | | { |
+----------+ +---+---+ | bash: run_bash |
^ | read: run_read |
| | write: run_wr |
+----------+ edit: run_edit |
tool_result| } |
+------------------+
+--------+ +-------+ +------------------+
| User | ---> | LLM | ---> | Tool Dispatch |
| prompt | | | | { |
+--------+ +---+---+ | bash: run_bash |
^ | read: run_read |
| | write: run_wr |
+-----------+ edit: run_edit |
tool_result | } |
+------------------+
The dispatch map is a dict: {tool_name: handler_function}
The dispatch map is a dict: {tool_name: handler_function}.
One lookup replaces any if/elif chain.
```
## How It Works
1. Define handler functions for each tool. Each takes keyword arguments
matching the tool's input_schema and returns a string result.
1. Each tool gets a handler function. Path sandboxing prevents workspace escape.
```python
def safe_path(p: str) -> Path:
path = (WORKDIR / p).resolve()
if not path.is_relative_to(WORKDIR):
raise ValueError(f"Path escapes workspace: {p}")
return path
def run_read(path: str, limit: int = None) -> str:
text = safe_path(path).read_text()
lines = text.splitlines()
@@ -50,7 +46,7 @@ def run_read(path: str, limit: int = None) -> str:
return "\n".join(lines)[:50000]
```
2. Create the dispatch map linking tool names to handlers.
2. The dispatch map links tool names to handlers.
```python
TOOL_HANDLERS = {
@@ -62,13 +58,14 @@ TOOL_HANDLERS = {
}
```
3. In the agent loop, look up the handler by name instead of hardcoding.
3. In the loop, look up the handler by name. The loop body itself is unchanged from s01.
```python
for block in response.content:
if block.type == "tool_use":
handler = TOOL_HANDLERS.get(block.name)
output = handler(**block.input)
output = handler(**block.input) if handler \
else f"Unknown tool: {block.name}"
results.append({
"type": "tool_result",
"tool_use_id": block.id,
@@ -76,51 +73,7 @@ for block in response.content:
})
```
4. Path sandboxing prevents the model from escaping the workspace.
```python
def safe_path(p: str) -> Path:
path = (WORKDIR / p).resolve()
if not path.is_relative_to(WORKDIR):
raise ValueError(f"Path escapes workspace: {p}")
return path
```
## Key Code
The dispatch pattern (from `agents/s02_tool_use.py`, lines 93-129):
```python
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"],
kw["new_text"]),
}
def agent_loop(messages: list):
while True:
response = client.messages.create(
model=MODEL, system=SYSTEM, messages=messages,
tools=TOOLS, max_tokens=8000,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return
results = []
for block in response.content:
if block.type == "tool_use":
handler = TOOL_HANDLERS.get(block.name)
output = handler(**block.input) if handler \
else f"Unknown tool: {block.name}"
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
})
messages.append({"role": "user", "content": results})
```
Add a tool = add a handler + add a schema entry. The loop never changes.
## What Changed From s01
@@ -131,10 +84,6 @@ def agent_loop(messages: list):
| Path safety | None | `safe_path()` sandbox |
| Agent loop | Unchanged | Unchanged |
## Design Rationale
The dispatch map scales linearly: add a tool, add a handler, add a schema entry. The loop never changes. Handlers are pure functions, so they test in isolation. Any agent that outgrows a dispatch map has a design problem, not a scaling problem.
## Try It
```sh
@@ -142,10 +91,7 @@ cd learn-claude-code
python agents/s02_tool_use.py
```
Example prompts to try:
1. `Read the file requirements.txt`
2. `Create a file called greet.py with a greet(name) function`
3. `Edit greet.py to add a docstring to the function`
4. `Read greet.py to verify the edit worked`
5. `Run the greet function with bash: python -c "from greet import greet; greet('World')"`

View File

@@ -1,159 +1,87 @@
# s03: TodoWrite
> A TodoManager lets the agent track its own progress, and a nag reminder injection forces it to keep updating when it forgets.
`s01 > s02 > [ s03 ] s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
## The Problem
> *"Plan before you act"* -- visible plans improve task completion.
When an agent works on a multi-step task, it often loses track of what it
has done and what remains. Without explicit planning, the model might repeat
work, skip steps, or wander off on tangents. The user has no visibility
into the agent's internal plan.
## Problem
This is worse than it sounds. Long conversations cause the model to "drift"
-- the system prompt fades in influence as the context window fills with
tool results. A 10-step refactoring task might complete steps 1-3, then
the model starts improvising because it forgot steps 4-10 existed.
On multi-step tasks, the model loses track. It repeats work, skips steps, or wanders off. Long conversations make this worse -- the system prompt fades as tool results fill the context. A 10-step refactoring might complete steps 1-3, then the model starts improvising because it forgot steps 4-10.
The solution is structured state: a TodoManager that the model writes to
explicitly. The model creates a plan, marks items in_progress as it works,
and marks them completed when done. A nag reminder injects a nudge if the
model goes 3+ rounds without updating its todos.
Note: the nag threshold of 3 rounds is low for visibility. Production systems tune higher. From s07, this course switches to the Task board for durable multi-step work; TodoWrite remains available for quick checklists.
## The Solution
## Solution
```
+----------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tools |
| prompt | | | | + todo |
+----------+ +---+---+ +----+----+
^ |
| tool_result |
+---------------+
|
+-----------+-----------+
| TodoManager state |
| [ ] task A |
| [>] task B <- doing |
| [x] task C |
+-----------------------+
|
if rounds_since_todo >= 3:
inject <reminder> into tool_result
+--------+ +-------+ +---------+
| User | ---> | LLM | ---> | Tools |
| prompt | | | | + todo |
+--------+ +---+---+ +----+----+
^ |
| tool_result |
+----------------+
|
+-----------+-----------+
| TodoManager state |
| [ ] task A |
| [>] task B <- doing |
| [x] task C |
+-----------------------+
|
if rounds_since_todo >= 3:
inject <reminder> into tool_result
```
## How It Works
1. The TodoManager validates and stores a list of items with statuses.
Only one item can be `in_progress` at a time.
1. TodoManager stores items with statuses. Only one item can be `in_progress` at a time.
```python
class TodoManager:
def __init__(self):
self.items = []
def update(self, items: list) -> str:
validated = []
in_progress_count = 0
validated, in_progress_count = [], 0
for item in items:
status = item.get("status", "pending")
if status == "in_progress":
in_progress_count += 1
validated.append({
"id": item["id"],
"text": item["text"],
"status": status,
})
validated.append({"id": item["id"], "text": item["text"],
"status": status})
if in_progress_count > 1:
raise ValueError("Only one task can be in_progress")
self.items = validated
return self.render()
```
2. The `todo` tool is added to the dispatch map like any other tool.
2. The `todo` tool goes into the dispatch map like any other tool.
```python
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
# ...other tools...
"todo": lambda **kw: TODO.update(kw["items"]),
# ...base tools...
"todo": lambda **kw: TODO.update(kw["items"]),
}
```
3. The nag reminder injects a `<reminder>` tag into the tool_result
messages when the model goes 3+ rounds without calling `todo`.
3. A nag reminder injects a nudge if the model goes 3+ rounds without calling `todo`.
```python
def agent_loop(messages: list):
rounds_since_todo = 0
while True:
if rounds_since_todo >= 3 and messages:
last = messages[-1]
if (last["role"] == "user"
and isinstance(last.get("content"), list)):
last["content"].insert(0, {
"type": "text",
"text": "<reminder>Update your todos.</reminder>",
})
# ... rest of loop ...
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
if rounds_since_todo >= 3 and messages:
last = messages[-1]
if last["role"] == "user" and isinstance(last.get("content"), list):
last["content"].insert(0, {
"type": "text",
"text": "<reminder>Update your todos.</reminder>",
})
```
4. The system prompt instructs the model to use todos for planning.
```python
SYSTEM = f"""You are a coding agent at {WORKDIR}.
Use the todo tool to plan multi-step tasks.
Mark in_progress before starting, completed when done.
Prefer tools over prose."""
```
## Key Code
The TodoManager and nag injection (from `agents/s03_todo_write.py`,
lines 51-85 and 158-187):
```python
class TodoManager:
def update(self, items: list) -> str:
validated = []
in_progress_count = 0
for item in items:
status = item.get("status", "pending")
if status == "in_progress":
in_progress_count += 1
validated.append({
"id": item["id"],
"text": item["text"],
"status": status,
})
if in_progress_count > 1:
raise ValueError("Only one in_progress")
self.items = validated
return self.render()
# In agent_loop:
if rounds_since_todo >= 3:
last["content"].insert(0, {
"type": "text",
"text": "<reminder>Update your todos.</reminder>",
})
```
The "one in_progress at a time" constraint forces sequential focus. The nag reminder creates accountability.
## What Changed From s02
| Component | Before (s02) | After (s03) |
|----------------|------------------|--------------------------|
| Tools | 4 | 5 (+todo) |
| Planning | None | TodoManager with statuses|
| Component | Before (s02) | After (s03) |
|----------------|------------------|----------------------------|
| Tools | 4 | 5 (+todo) |
| Planning | None | TodoManager with statuses |
| Nag injection | None | `<reminder>` after 3 rounds|
| Agent loop | Simple dispatch | + rounds_since_todo counter|
## Design Rationale
Visible plans improve task completion because the model can self-monitor progress. The nag mechanism creates accountability -- without it, the model may abandon plans mid-execution as conversation context grows and earlier instructions fade. The "one in_progress at a time" constraint enforces sequential focus, preventing context-switching overhead that degrades output quality. This pattern works because it externalizes the model's working memory into structured state that survives attention drift.
## Try It
```sh
@@ -161,8 +89,6 @@ cd learn-claude-code
python agents/s03_todo_write.py
```
Example prompts to try:
1. `Refactor the file hello.py: add type hints, docstrings, and a main guard`
2. `Create a Python package with __init__.py, utils.py, and tests/test_utils.py`
3. `Review all Python files and fix any style issues`

View File

@@ -1,45 +1,32 @@
# s04: Subagents
> A subagent runs with a fresh messages list, shares the filesystem with the parent, and returns only a summary -- keeping the parent context clean.
`s01 > s02 > s03 > [ s04 ] s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
## The Problem
> *"Process isolation = context isolation"* -- fresh messages[] per subagent.
As the agent works, its messages array grows. Every tool call, every file
read, every bash output accumulates. After 20-30 tool calls, the context
window is crowded with irrelevant history. Reading a 500-line file to
answer a quick question permanently adds 500 lines to the context.
## Problem
This is particularly bad for exploratory tasks. "What testing framework
does this project use?" might require reading 5 files, but the parent
agent does not need all 5 file contents in its history -- it just needs
the answer: "pytest with conftest.py configuration."
As the agent works, its messages array grows. Every file read, every bash output stays in context permanently. "What testing framework does this project use?" might require reading 5 files, but the parent only needs the answer: "pytest."
In this course, a practical solution is fresh-context isolation: spawn a child agent with `messages=[]`.
The child explores, reads files, runs commands. When it finishes, only its
final text response returns to the parent. The child's entire message
history is discarded.
## The Solution
## Solution
```
Parent agent Subagent
+------------------+ +------------------+
| messages=[...] | | messages=[] | <-- fresh
| | dispatch | |
| tool: task | ---------->| while tool_use: |
| prompt="..." | | call tools |
| | summary | append results |
| result = "..." | <--------- | return last text |
| tool: task | ----------> | while tool_use: |
| prompt="..." | | call tools |
| | summary | append results |
| result = "..." | <---------- | return last text |
+------------------+ +------------------+
|
Parent context stays clean.
Subagent context is discarded.
Parent context stays clean. Subagent context is discarded.
```
## How It Works
1. The parent agent gets a `task` tool that triggers subagent spawning.
The child gets all base tools except `task` (no recursive spawning).
1. The parent gets a `task` tool. The child gets all base tools except `task` (no recursive spawning).
```python
PARENT_TOOLS = CHILD_TOOLS + [
@@ -47,65 +34,18 @@ PARENT_TOOLS = CHILD_TOOLS + [
"description": "Spawn a subagent with fresh context.",
"input_schema": {
"type": "object",
"properties": {
"prompt": {"type": "string"},
"description": {"type": "string"},
},
"properties": {"prompt": {"type": "string"}},
"required": ["prompt"],
}},
]
```
2. The subagent starts with a fresh messages list containing only
the delegated prompt. It shares the same filesystem.
2. The subagent starts with `messages=[]` and runs its own loop. Only the final text returns to the parent.
```python
def run_subagent(prompt: str) -> str:
sub_messages = [{"role": "user", "content": prompt}]
for _ in range(30): # safety limit
response = client.messages.create(
model=MODEL, system=SUBAGENT_SYSTEM,
messages=sub_messages,
tools=CHILD_TOOLS, max_tokens=8000,
)
sub_messages.append({
"role": "assistant", "content": response.content
})
if response.stop_reason != "tool_use":
break
# execute tools, append results...
```
3. Only the final text returns to the parent. The child's 30+ tool
call history is discarded.
```python
return "".join(
b.text for b in response.content if hasattr(b, "text")
) or "(no summary)"
```
4. The parent receives this summary as a normal tool_result.
```python
if block.name == "task":
output = run_subagent(block.input["prompt"])
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(output),
})
```
## Key Code
The subagent function (from `agents/s04_subagent.py`,
lines 110-128):
```python
def run_subagent(prompt: str) -> str:
sub_messages = [{"role": "user", "content": prompt}]
for _ in range(30):
response = client.messages.create(
model=MODEL, system=SUBAGENT_SYSTEM,
messages=sub_messages,
@@ -129,6 +69,8 @@ def run_subagent(prompt: str) -> str:
) or "(no summary)"
```
The child's entire message history (possibly 30+ tool calls) is discarded. The parent receives a one-paragraph summary as a normal `tool_result`.
## What Changed From s03
| Component | Before (s03) | After (s04) |
@@ -138,10 +80,6 @@ def run_subagent(prompt: str) -> str:
| Subagent | None | `run_subagent()` function |
| Return value | N/A | Summary text only |
## Design Rationale
Fresh-context isolation is a practical way to approximate context isolation in this session. A fresh `messages[]` means the subagent starts without the parent's conversation history. The tradeoff is communication overhead -- results must be compressed back to the parent, losing detail. This is a message-history isolation strategy, not OS process isolation. Limiting subagent depth (no recursive spawning) prevents unbounded resource consumption, and a max iteration count ensures runaway children terminate.
## Try It
```sh
@@ -149,8 +87,6 @@ cd learn-claude-code
python agents/s04_subagent.py
```
Example prompts to try:
1. `Use a subtask to find what testing framework this project uses`
2. `Delegate: read all .py files and summarize what each one does`
3. `Use a task to create a new module, then verify it from here`

View File

@@ -1,27 +1,14 @@
# s05: Skills
> Two-layer skill injection avoids system prompt bloat by putting skill names in the system prompt (cheap) and full skill bodies in tool_result (on demand).
`s01 > s02 > s03 > s04 > [ s05 ] s06 | s07 > s08 > s09 > s10 > s11 > s12`
## The Problem
> *"Load on demand, not upfront"* -- inject knowledge via tool_result, not system prompt.
You want the agent to follow specific workflows for different domains:
git conventions, testing patterns, code review checklists. The naive
approach is to put everything in the system prompt. But the system prompt
has limited effective attention -- too much text and the model starts
ignoring parts of it.
## Problem
If you have 10 skills at 2000 tokens each, that is 20,000 tokens of system
prompt. The model pays attention to the beginning and end but skims the
middle. Worse, most of those skills are irrelevant to any given task. A
file editing task does not need the git workflow instructions.
You want the agent to follow domain-specific workflows: git conventions, testing patterns, code review checklists. Putting everything in the system prompt wastes tokens on unused skills. 10 skills at 2000 tokens each = 20,000 tokens, most of which are irrelevant to any given task.
The two-layer approach solves this: Layer 1 puts short skill descriptions
in the system prompt (~100 tokens per skill). Layer 2 loads the full skill
body into a tool_result only when the model calls `load_skill`. The model
learns what skills exist (cheap) and loads them on demand (only when
relevant).
## The Solution
## Solution
```
System prompt (Layer 1 -- always present):
@@ -38,11 +25,12 @@ When model calls load_skill("git"):
| <skill name="git"> |
| Full git workflow instructions... | ~2000 tokens
| Step 1: ... |
| Step 2: ... |
| </skill> |
+--------------------------------------+
```
Layer 1: skill *names* in system prompt (cheap). Layer 2: full *body* via tool_result (on demand).
## How It Works
1. Skill files live in `.skills/` as Markdown with YAML frontmatter.
@@ -53,62 +41,7 @@ When model calls load_skill("git"):
test.md # ---\n description: Testing patterns\n ---\n ...
```
2. The SkillLoader parses frontmatter and separates metadata from body.
```python
class SkillLoader:
def _parse_frontmatter(self, text: str) -> tuple:
match = re.match(
r"^---\n(.*?)\n---\n(.*)", text, re.DOTALL
)
if not match:
return {}, text
meta = {}
for line in match.group(1).strip().splitlines():
if ":" in line:
key, val = line.split(":", 1)
meta[key.strip()] = val.strip()
return meta, match.group(2).strip()
```
3. Layer 1: `get_descriptions()` returns short lines for the system prompt.
```python
def get_descriptions(self) -> str:
lines = []
for name, skill in self.skills.items():
desc = skill["meta"].get("description", "No description")
lines.append(f" - {name}: {desc}")
return "\n".join(lines)
SYSTEM = f"""You are a coding agent at {WORKDIR}.
Skills available:
{SKILL_LOADER.get_descriptions()}"""
```
4. Layer 2: `get_content()` returns the full body wrapped in `<skill>` tags.
```python
def get_content(self, name: str) -> str:
skill = self.skills.get(name)
if not skill:
return f"Error: Unknown skill '{name}'."
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
```
5. The `load_skill` tool is just another entry in the dispatch map.
```python
TOOL_HANDLERS = {
# ...base tools...
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
}
```
## Key Code
The SkillLoader class (from `agents/s05_skill_loading.py`,
lines 51-97):
2. SkillLoader parses frontmatter, separates metadata from body.
```python
class SkillLoader:
@@ -117,9 +50,7 @@ class SkillLoader:
for f in sorted(skills_dir.glob("*.md")):
text = f.read_text()
meta, body = self._parse_frontmatter(text)
self.skills[f.stem] = {
"meta": meta, "body": body
}
self.skills[f.stem] = {"meta": meta, "body": body}
def get_descriptions(self) -> str:
lines = []
@@ -132,10 +63,24 @@ class SkillLoader:
skill = self.skills.get(name)
if not skill:
return f"Error: Unknown skill '{name}'."
return (f"<skill name=\"{name}\">\n"
f"{skill['body']}\n</skill>")
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
```
3. Layer 1 goes into the system prompt. Layer 2 is just another tool handler.
```python
SYSTEM = f"""You are a coding agent at {WORKDIR}.
Skills available:
{SKILL_LOADER.get_descriptions()}"""
TOOL_HANDLERS = {
# ...base tools...
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
}
```
The model learns what skills exist (cheap) and loads them when relevant (expensive).
## What Changed From s04
| Component | Before (s04) | After (s05) |
@@ -145,10 +90,6 @@ class SkillLoader:
| Knowledge | None | .skills/*.md files |
| Injection | None | Two-layer (system + result)|
## Design Rationale
Two-layer injection solves the attention budget problem. Putting all skill content in the system prompt wastes tokens on unused skills. Layer 1 (compact summaries) costs roughly 120 tokens total. Layer 2 (full content) loads on demand via tool_result. This scales to dozens of skills without degrading model attention quality. The key insight is that the model only needs to know what skills exist (cheap) to decide when to load one (expensive). This is the same lazy-loading principle used in software module systems.
## Try It
```sh
@@ -156,8 +97,6 @@ cd learn-claude-code
python agents/s05_skill_loading.py
```
Example prompts to try:
1. `What skills are available?`
2. `Load the agent-builder skill and follow its instructions`
3. `I need to do a code review -- load the relevant skill first`

View File

@@ -1,30 +1,16 @@
# s06: Compact
# s06: Context Compact
> A three-layer compression pipeline lets the agent work indefinitely by strategically forgetting old tool results, auto-summarizing when tokens exceed a threshold, and allowing manual compression on demand.
`s01 > s02 > s03 > s04 > s05 > [ s06 ] | s07 > s08 > s09 > s10 > s11 > s12`
## The Problem
> *"Strategic forgetting"* -- forget old context to enable infinite sessions.
The context window is finite. After enough tool calls, the messages array
exceeds the model's context limit and the API call fails. Even before
hitting the hard limit, performance degrades: the model becomes slower,
less accurate, and starts ignoring earlier messages.
## Problem
A 200,000 token context window sounds large, but a single `read_file` on
a 1000-line source file consumes ~4000 tokens. After reading 30 files and
running 20 bash commands, you are at 100,000+ tokens. The agent cannot
work on large codebases without some form of compression.
The context window is finite. A single `read_file` on a 1000-line file costs ~4000 tokens. After reading 30 files and running 20 bash commands, you hit 100,000+ tokens. The agent cannot work on large codebases without compression.
The three-layer pipeline addresses this with increasing aggressiveness:
Layer 1 (micro-compact) silently replaces old tool results every turn.
Layer 2 (auto-compact) triggers a full summarization when tokens exceed
a threshold. Layer 3 (manual compact) lets the model trigger compression
itself.
## Solution
Teaching simplification: the token estimation here uses a rough
characters/4 heuristic. Production systems use proper tokenizer
libraries for accurate counts.
## The Solution
Three layers, increasing in aggressiveness:
```
Every turn:
@@ -56,8 +42,7 @@ continue [Layer 2: auto_compact]
## How It Works
1. **Layer 1 -- micro_compact**: Before each LLM call, find all
tool_result entries older than the last 3 and replace their content.
1. **Layer 1 -- micro_compact**: Before each LLM call, replace old tool results with placeholders.
```python
def micro_compact(messages: list) -> list:
@@ -69,25 +54,22 @@ def micro_compact(messages: list) -> list:
tool_results.append((i, j, part))
if len(tool_results) <= KEEP_RECENT:
return messages
to_clear = tool_results[:-KEEP_RECENT]
for _, _, part in to_clear:
for _, _, part in tool_results[:-KEEP_RECENT]:
if len(part.get("content", "")) > 100:
tool_id = part.get("tool_use_id", "")
tool_name = tool_name_map.get(tool_id, "unknown")
part["content"] = f"[Previous: used {tool_name}]"
return messages
```
2. **Layer 2 -- auto_compact**: When estimated tokens exceed 50,000,
save the full transcript and ask the LLM to summarize.
2. **Layer 2 -- auto_compact**: When tokens exceed threshold, save full transcript to disk, then ask the LLM to summarize.
```python
def auto_compact(messages: list) -> list:
TRANSCRIPT_DIR.mkdir(exist_ok=True)
# Save transcript for recovery
transcript_path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
with open(transcript_path, "w") as f:
for msg in messages:
f.write(json.dumps(msg, default=str) + "\n")
# LLM summarizes
response = client.messages.create(
model=MODEL,
messages=[{"role": "user", "content":
@@ -95,62 +77,29 @@ def auto_compact(messages: list) -> list:
+ json.dumps(messages, default=str)[:80000]}],
max_tokens=2000,
)
summary = response.content[0].text
return [
{"role": "user", "content": f"[Compressed]\n\n{summary}"},
{"role": "user", "content": f"[Compressed]\n\n{response.content[0].text}"},
{"role": "assistant", "content": "Understood. Continuing."},
]
```
3. **Layer 3 -- manual compact**: The `compact` tool triggers the same
summarization on demand.
3. **Layer 3 -- manual compact**: The `compact` tool triggers the same summarization on demand.
```python
if manual_compact:
messages[:] = auto_compact(messages)
```
4. The agent loop integrates all three layers.
4. The loop integrates all three:
```python
def agent_loop(messages: list):
while True:
micro_compact(messages)
micro_compact(messages) # Layer 1
if estimate_tokens(messages) > THRESHOLD:
messages[:] = auto_compact(messages)
messages[:] = auto_compact(messages) # Layer 2
response = client.messages.create(...)
# ... tool execution ...
if manual_compact:
messages[:] = auto_compact(messages)
messages[:] = auto_compact(messages) # Layer 3
```
## Key Code
The three-layer pipeline (from `agents/s06_context_compact.py`,
lines 67-93 and 189-223):
```python
THRESHOLD = 50000
KEEP_RECENT = 3
def micro_compact(messages):
# Replace old tool results with placeholders
...
def auto_compact(messages):
# Save transcript, LLM summarize, replace messages
...
def agent_loop(messages):
while True:
micro_compact(messages) # Layer 1
if estimate_tokens(messages) > THRESHOLD:
messages[:] = auto_compact(messages) # Layer 2
response = client.messages.create(...)
# ...
if manual_compact:
messages[:] = auto_compact(messages) # Layer 3
```
Transcripts preserve full history on disk. Nothing is truly lost -- just moved out of active context.
## What Changed From s05
@@ -160,13 +109,8 @@ def agent_loop(messages):
| Context mgmt | None | Three-layer compression |
| Micro-compact | None | Old results -> placeholders|
| Auto-compact | None | Token threshold trigger |
| Manual compact | None | `compact` tool |
| Transcripts | None | Saved to .transcripts/ |
## Design Rationale
Context windows are finite, but agent sessions can be infinite. Three compression layers solve this at different granularities: micro-compact (replace old tool outputs), auto-compact (LLM summarizes when approaching limit), and manual compact (user-triggered). The key insight is that forgetting is a feature, not a bug -- it enables unbounded sessions. Transcripts preserve the full history on disk so nothing is truly lost, just moved out of the active context. The layered approach lets each layer operate independently at its own granularity, from silent per-turn cleanup to full conversation reset.
## Try It
```sh
@@ -174,9 +118,6 @@ cd learn-claude-code
python agents/s06_context_compact.py
```
Example prompts to try:
1. `Read every Python file in the agents/ directory one by one`
(watch micro-compact replace old results)
1. `Read every Python file in the agents/ directory one by one` (watch micro-compact replace old results)
2. `Keep reading files until compression triggers automatically`
3. `Use the compact tool to manually compress the conversation`

View File

@@ -1,29 +1,14 @@
# s07: Tasks
> Tasks are persisted as JSON files with a dependency graph, so state survives context compression and can be shared across agents.
`s01 > s02 > s03 > s04 > s05 > s06 | [ s07 ] s08 > s09 > s10 > s11 > s12`
> *"State survives /compact"* -- file-based state outlives context compression.
## Problem
In-memory state (for example the TodoManager from s03) is fragile under compression (s06). Once earlier turns are compacted into summaries, in-memory todo state is gone.
In-memory state (TodoManager from s03) dies when context compresses (s06). After auto_compact replaces messages with a summary, the todo list is gone. The agent can only reconstruct from summary text -- lossy and error-prone.
s06 -> s07 is the key transition:
1. Todo list state in memory is conversational and lossy.
2. Task board state on disk is durable and recoverable.
A second issue is visibility: in-memory structures are process-local, so teammates cannot reliably share that state.
## When to Use Task vs Todo
From s07 onward, Task is the default. Todo remains for short linear checklists.
## Quick Decision Matrix
| Situation | Prefer | Why |
|---|---|---|
| Short, single-session checklist | Todo | Lowest ceremony, fastest capture |
| Cross-session work, dependencies, or teammates | Task | Durable state, dependency graph, shared visibility |
| Unsure which one to use | Task | Easier to simplify later than migrate mid-run |
File-based tasks solve this: write state to disk, and it survives compression, process restarts, and eventually multi-agent sharing (s09+).
## Solution
@@ -45,29 +30,28 @@ Dependency resolution:
## How It Works
1. TaskManager provides CRUD with one JSON file per task.
1. TaskManager: one JSON file per task, CRUD with dependency graph.
```python
class TaskManager:
def create(self, subject: str, description: str = "") -> str:
task = {
"id": self._next_id,
"subject": subject,
"description": description,
"status": "pending",
"blockedBy": [],
"blocks": [],
"owner": "",
}
def __init__(self, tasks_dir: Path):
self.dir = tasks_dir
self.dir.mkdir(exist_ok=True)
self._next_id = self._max_id() + 1
def create(self, subject, description=""):
task = {"id": self._next_id, "subject": subject,
"status": "pending", "blockedBy": [],
"blocks": [], "owner": ""}
self._save(task)
self._next_id += 1
return json.dumps(task, indent=2)
```
2. Completing a task clears that dependency from other tasks.
2. Completing a task clears its ID from every other task's `blockedBy` list.
```python
def _clear_dependency(self, completed_id: int):
def _clear_dependency(self, completed_id):
for f in self.dir.glob("task_*.json"):
task = json.loads(f.read_text())
if completed_id in task.get("blockedBy", []):
@@ -85,63 +69,22 @@ def update(self, task_id, status=None,
task["status"] = status
if status == "completed":
self._clear_dependency(task_id)
if add_blocks:
task["blocks"] = list(set(task["blocks"] + add_blocks))
for blocked_id in add_blocks:
blocked = self._load(blocked_id)
if task_id not in blocked["blockedBy"]:
blocked["blockedBy"].append(task_id)
self._save(blocked)
self._save(task)
```
4. Task tools are added to the dispatch map.
4. Four task tools go into the dispatch map.
```python
TOOL_HANDLERS = {
# ...base tools...
"task_create": lambda **kw: TASKS.create(kw["subject"]),
"task_update": lambda **kw: TASKS.update(kw["task_id"],
kw.get("status")),
"task_update": lambda **kw: TASKS.update(kw["task_id"], kw.get("status")),
"task_list": lambda **kw: TASKS.list_all(),
"task_get": lambda **kw: TASKS.get(kw["task_id"]),
}
```
## Key Code
TaskManager with dependency graph (from `agents/s07_task_system.py`, lines 46-123):
```python
class TaskManager:
def __init__(self, tasks_dir: Path):
self.dir = tasks_dir
self.dir.mkdir(exist_ok=True)
self._next_id = self._max_id() + 1
def _load(self, task_id: int) -> dict:
path = self.dir / f"task_{task_id}.json"
return json.loads(path.read_text())
def _save(self, task: dict):
path = self.dir / f"task_{task['id']}.json"
path.write_text(json.dumps(task, indent=2))
def create(self, subject, description=""):
task = {"id": self._next_id, "subject": subject,
"status": "pending", "blockedBy": [],
"blocks": [], "owner": ""}
self._save(task)
self._next_id += 1
return json.dumps(task, indent=2)
def _clear_dependency(self, completed_id):
for f in self.dir.glob("task_*.json"):
task = json.loads(f.read_text())
if completed_id in task.get("blockedBy", []):
task["blockedBy"].remove(completed_id)
self._save(task)
```
From s07 onward, Task is the default for multi-step work. Todo remains for quick checklists.
## What Changed From s06
@@ -152,14 +95,6 @@ class TaskManager:
| Dependencies | None | `blockedBy + blocks` graph |
| Persistence | Lost on compact | Survives compression |
## Design Rationale
File-based state survives compaction and process restarts. The dependency graph preserves execution order even when conversation details are forgotten. This turns transient chat context into durable work state.
Durability still needs a write discipline: reload task JSON before each write, validate expected `status/blockedBy`, then persist atomically. Otherwise concurrent writers can overwrite each other.
Course-level implication: s07+ defaults to Task because it better matches long-running and collaborative engineering workflows.
## Try It
```sh
@@ -167,8 +102,6 @@ cd learn-claude-code
python agents/s07_task_system.py
```
Suggested prompts:
1. `Create 3 tasks: "Setup project", "Write code", "Write tests". Make them depend on each other in order.`
2. `List all tasks and show the dependency graph`
3. `Complete task 1 and then list tasks to see task 2 unblocked`

View File

@@ -1,30 +1,19 @@
# s08: Background Tasks
> A BackgroundManager runs commands in separate threads and drains a notification queue before each LLM call, so the agent never blocks on long-running operations.
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > [ s08 ] s09 > s10 > s11 > s12`
## The Problem
> *"Fire and forget"* -- non-blocking threads + notification queue.
Some commands take minutes: `npm install`, `pytest`, `docker build`. With
a blocking agent loop, the model sits idle waiting for the subprocess to
finish. It cannot do anything else. If the user asked "install dependencies
and while that runs, create the config file," the agent would install
first, _then_ create the config -- sequentially, not in parallel.
## Problem
The agent needs concurrency. Not full multi-threading of the agent loop
itself, but the ability to fire off a long command and continue working
while it runs. When the command finishes, its result should appear
naturally in the conversation.
Some commands take minutes: `npm install`, `pytest`, `docker build`. With a blocking loop, the model sits idle waiting. If the user asks "install dependencies and while that runs, create the config file," the agent does them sequentially, not in parallel.
The solution is a BackgroundManager that runs commands in daemon threads
and collects results in a notification queue. Before each LLM call, the
queue is drained and results are injected into the messages.
## The Solution
## Solution
```
Main thread Background thread
+-----------------+ +-----------------+
| agent loop | | task executes |
| agent loop | | subprocess runs |
| ... | | ... |
| [LLM call] <---+------- | enqueue(result) |
| ^drain queue | +-----------------+
@@ -36,16 +25,12 @@ Agent --[spawn A]--[spawn B]--[other work]----
v v
[A runs] [B runs] (parallel)
| |
+-- notification queue --+
|
[results injected before
next LLM call]
+-- results injected before next LLM call --+
```
## How It Works
1. The BackgroundManager tracks tasks and maintains a thread-safe
notification queue.
1. BackgroundManager tracks tasks with a thread-safe notification queue.
```python
class BackgroundManager:
@@ -55,110 +40,51 @@ class BackgroundManager:
self._lock = threading.Lock()
```
2. `run()` starts a daemon thread and returns a task_id immediately.
2. `run()` starts a daemon thread and returns immediately.
```python
def run(self, command: str) -> str:
task_id = str(uuid.uuid4())[:8]
self.tasks[task_id] = {
"status": "running",
"result": None,
"command": command,
}
self.tasks[task_id] = {"status": "running", "command": command}
thread = threading.Thread(
target=self._execute,
args=(task_id, command),
daemon=True,
)
target=self._execute, args=(task_id, command), daemon=True)
thread.start()
return f"Background task {task_id} started"
```
3. The thread target `_execute` runs the subprocess and pushes
results to the notification queue.
3. When the subprocess finishes, its result goes into the notification queue.
```python
def _execute(self, task_id: str, command: str):
def _execute(self, task_id, command):
try:
r = subprocess.run(command, shell=True, cwd=WORKDIR,
capture_output=True, text=True, timeout=300)
output = (r.stdout + r.stderr).strip()[:50000]
status = "completed"
except subprocess.TimeoutExpired:
output = "Error: Timeout (300s)"
status = "timeout"
self.tasks[task_id]["status"] = status
self.tasks[task_id]["result"] = output
with self._lock:
self._notification_queue.append({
"task_id": task_id,
"status": status,
"result": output[:500],
})
"task_id": task_id, "result": output[:500]})
```
4. `drain_notifications()` returns and clears pending results.
```python
def drain_notifications(self) -> list:
with self._lock:
notifs = list(self._notification_queue)
self._notification_queue.clear()
return notifs
```
5. The agent loop drains notifications before each LLM call.
4. The agent loop drains notifications before each LLM call.
```python
def agent_loop(messages: list):
while True:
notifs = BG.drain_notifications()
if notifs and messages:
if notifs:
notif_text = "\n".join(
f"[bg:{n['task_id']}] {n['status']}: "
f"{n['result']}" for n in notifs
)
f"[bg:{n['task_id']}] {n['result']}" for n in notifs)
messages.append({"role": "user",
"content": f"<background-results>"
f"\n{notif_text}\n"
"content": f"<background-results>\n{notif_text}\n"
f"</background-results>"})
messages.append({"role": "assistant",
"content": "Noted background results."})
response = client.messages.create(...)
```
## Key Code
The BackgroundManager (from `agents/s08_background_tasks.py`, lines 49-107):
```python
class BackgroundManager:
def __init__(self):
self.tasks = {}
self._notification_queue = []
self._lock = threading.Lock()
def run(self, command: str) -> str:
task_id = str(uuid.uuid4())[:8]
self.tasks[task_id] = {"status": "running",
"result": None,
"command": command}
thread = threading.Thread(
target=self._execute,
args=(task_id, command), daemon=True)
thread.start()
return f"Background task {task_id} started"
def _execute(self, task_id, command):
# run subprocess, push to queue
...
def drain_notifications(self) -> list:
with self._lock:
notifs = list(self._notification_queue)
self._notification_queue.clear()
return notifs
```
The loop stays single-threaded. Only subprocess I/O is parallelized.
## What Changed From s07
@@ -169,10 +95,6 @@ class BackgroundManager:
| Notification | None | Queue drained per loop |
| Concurrency | None | Daemon threads |
## Design Rationale
The agent loop is inherently single-threaded (one LLM call at a time). Background threads break this constraint for I/O-bound work (tests, builds, installs). The notification queue pattern ("drain before next LLM call") ensures results arrive at natural conversation breakpoints rather than interrupting the model's reasoning mid-thought. This is a minimal concurrency model: the agent loop stays single-threaded and deterministic, while only the I/O-bound subprocess execution is parallelized.
## Try It
```sh
@@ -180,8 +102,6 @@ cd learn-claude-code
python agents/s08_background_tasks.py
```
Example prompts to try:
1. `Run "sleep 5 && echo done" in the background, then create a file while it runs`
2. `Start 3 background tasks: "sleep 2", "sleep 4", "sleep 6". Check their status.`
3. `Run pytest in the background and keep working on other things`

View File

@@ -1,31 +1,16 @@
# s09: Agent Teams
> Persistent teammates with JSONL inboxes are one teaching protocol for turning isolated agents into a communicating team -- spawn, message, broadcast, and drain.
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > [ s09 ] s10 > s11 > s12`
## The Problem
> *"Append to send, drain to read"* -- async mailboxes for persistent teammates.
Subagents (s04) are disposable: spawn, work, return summary, die. They
have no identity, no memory between invocations, and no way to receive
follow-up instructions. Background tasks (s08) run shell commands but
cannot make LLM-guided decisions or communicate findings.
## Problem
For real teamwork you need three things: (1) persistent agents that
survive beyond a single prompt, (2) identity and lifecycle management,
and (3) a communication channel between agents. Without messaging, even
persistent teammates are deaf and mute -- they can work in parallel but
never coordinate.
Subagents (s04) are disposable: spawn, work, return summary, die. No identity, no memory between invocations. Background tasks (s08) run shell commands but can't make LLM-guided decisions.
The solution combines a TeammateManager for spawning persistent named
agents with a MessageBus using JSONL inbox files. Each teammate runs
its own agent loop in a thread, checks its inbox before every LLM call,
and can send messages to any other teammate or the lead.
Real teamwork needs: (1) persistent agents that outlive a single prompt, (2) identity and lifecycle management, (3) a communication channel between agents.
Note on the s06-to-s07 bridge: TodoManager items from s03 die with
compression (s06). File-based tasks (s07) survive compression because
they live on disk. Teams build on this same principle -- config.json and
inbox files persist outside the context window.
## The Solution
## Solution
```
Teammate lifecycle:
@@ -39,28 +24,18 @@ Communication:
bob.jsonl
lead.jsonl
+--------+ send("alice","bob","...") +--------+
| alice | -----------------------------> | bob |
| loop | bob.jsonl << {json_line} | loop |
+--------+ +--------+
^ |
| BUS.read_inbox("alice") |
+---- alice.jsonl -> read + drain ---------+
5 message types:
+-------------------------+------------------------------+
| message | Normal text between agents |
| broadcast | Sent to all teammates |
| shutdown_request | Request graceful shutdown |
| shutdown_response | Approve/reject shutdown |
| plan_approval_response | Approve/reject plan |
+-------------------------+------------------------------+
+--------+ send("alice","bob","...") +--------+
| alice | -----------------------------> | bob |
| loop | bob.jsonl << {json_line} | loop |
+--------+ +--------+
^ |
| BUS.read_inbox("alice") |
+---- alice.jsonl -> read + drain ---------+
```
## How It Works
1. The TeammateManager maintains config.json with the team roster.
Each member has a name, role, and status.
1. TeammateManager maintains config.json with the team roster.
```python
class TeammateManager:
@@ -73,60 +48,43 @@ class TeammateManager:
```
2. `spawn()` creates a teammate and starts its agent loop in a thread.
Re-spawning an idle teammate reactivates it.
```python
def spawn(self, name: str, role: str, prompt: str) -> str:
member = self._find_member(name)
if member:
if member["status"] not in ("idle", "shutdown"):
return f"Error: '{name}' is currently {member['status']}"
member["status"] = "working"
else:
member = {"name": name, "role": role, "status": "working"}
self.config["members"].append(member)
member = {"name": name, "role": role, "status": "working"}
self.config["members"].append(member)
self._save_config()
thread = threading.Thread(
target=self._teammate_loop,
args=(name, role, prompt), daemon=True)
self.threads[name] = thread
thread.start()
return f"Spawned teammate '{name}' (role: {role})"
```
3. The MessageBus handles JSONL inbox files. `send()` appends a JSON
line; `read_inbox()` reads all lines and drains the file.
3. MessageBus: append-only JSONL inboxes. `send()` appends a JSON line; `read_inbox()` reads all and drains.
```python
class MessageBus:
def send(self, sender, to, content,
msg_type="message", extra=None):
def send(self, sender, to, content, msg_type="message", extra=None):
msg = {"type": msg_type, "from": sender,
"content": content,
"timestamp": time.time()}
"content": content, "timestamp": time.time()}
if extra:
msg.update(extra)
with open(self.dir / f"{to}.jsonl", "a") as f:
f.write(json.dumps(msg) + "\n")
return f"Sent {msg_type} to {to}"
def read_inbox(self, name):
path = self.dir / f"{name}.jsonl"
if not path.exists():
return "[]"
msgs = [json.loads(l)
for l in path.read_text().strip().splitlines()
if l]
if not path.exists(): return "[]"
msgs = [json.loads(l) for l in path.read_text().strip().splitlines() if l]
path.write_text("") # drain
return json.dumps(msgs, indent=2)
```
4. Each teammate checks its inbox before every LLM call and injects
received messages into the conversation context.
4. Each teammate checks its inbox before every LLM call, injecting received messages into context.
```python
def _teammate_loop(self, name, role, prompt):
sys_prompt = f"You are '{name}', role: {role}, at {WORKDIR}."
messages = [{"role": "user", "content": prompt}]
for _ in range(50):
inbox = BUS.read_inbox(name)
@@ -135,66 +93,11 @@ def _teammate_loop(self, name, role, prompt):
"content": f"<inbox>{inbox}</inbox>"})
messages.append({"role": "assistant",
"content": "Noted inbox messages."})
response = client.messages.create(
model=MODEL, system=sys_prompt,
messages=messages, tools=TOOLS)
messages.append({"role": "assistant",
"content": response.content})
response = client.messages.create(...)
if response.stop_reason != "tool_use":
break
# execute tools, append results...
self._find_member(name)["status"] = "idle"
self._save_config()
```
5. `broadcast()` sends the same message to all teammates except the
sender.
```python
def broadcast(self, sender, content, teammates):
count = 0
for name in teammates:
if name != sender:
self.send(sender, name, content, "broadcast")
count += 1
return f"Broadcast to {count} teammates"
```
## Key Code
The TeammateManager + MessageBus core (from `agents/s09_agent_teams.py`):
```python
class TeammateManager:
def spawn(self, name, role, prompt):
member = self._find_member(name) or {
"name": name, "role": role, "status": "working"
}
member["status"] = "working"
self._save_config()
thread = threading.Thread(
target=self._teammate_loop,
args=(name, role, prompt), daemon=True)
thread.start()
return f"Spawned '{name}'"
class MessageBus:
def send(self, sender, to, content,
msg_type="message", extra=None):
msg = {"type": msg_type, "from": sender,
"content": content, "timestamp": time.time()}
if extra: msg.update(extra)
with open(self.dir / f"{to}.jsonl", "a") as f:
f.write(json.dumps(msg) + "\n")
def read_inbox(self, name):
path = self.dir / f"{name}.jsonl"
if not path.exists(): return "[]"
msgs = [json.loads(l)
for l in path.read_text().strip().splitlines()
if l]
path.write_text("")
return json.dumps(msgs, indent=2)
```
## What Changed From s08
@@ -206,16 +109,7 @@ class MessageBus:
| Persistence | None | config.json + JSONL inboxes|
| Threads | Background cmds | Full agent loops per thread|
| Lifecycle | Fire-and-forget | idle -> working -> idle |
| Communication | None | 5 message types + broadcast|
Teaching simplification: this implementation does not use lock files
for inbox access. In production, concurrent append from multiple writers
would need file locking or atomic rename. The single-writer-per-inbox
pattern used here is safe for the teaching scenario.
## Design Rationale
File-based mailboxes (append-only JSONL) are easy to inspect and reason about in a teaching codebase. The "drain on read" pattern (read all, truncate) gives batch delivery with very little machinery. The tradeoff is latency -- messages are only seen at the next poll -- but for LLM-driven agents where each turn takes seconds, polling latency is acceptable for this course.
| Communication | None | message + broadcast |
## Try It
@@ -224,8 +118,6 @@ cd learn-claude-code
python agents/s09_agent_teams.py
```
Example prompts to try:
1. `Spawn alice (coder) and bob (tester). Have alice send bob a message.`
2. `Broadcast "status update: phase 1 complete" to all teammates`
3. `Check the lead inbox for any messages`

View File

@@ -1,27 +1,20 @@
# s10: Team Protocols
> The same request_id handshake pattern powers both shutdown and plan approval -- one FSM, two applications.
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > [ s10 ] s11 > s12`
## The Problem
> *"Same request_id, two protocols"* -- one FSM pattern powers shutdown + plan approval.
In s09, teammates work and communicate but there is no structured
coordination. Two problems arise:
## Problem
**Shutdown**: How do you stop a teammate cleanly? Killing the thread
leaves files partially written and config.json in a wrong state.
Graceful shutdown requires a handshake: the lead requests, the teammate
decides whether to approve (finish and exit) or reject (keep working).
In s09, teammates work and communicate but lack structured coordination:
**Plan approval**: How do you gate execution? When the lead says
"refactor the auth module," the teammate starts immediately. For
high-risk changes, the lead should review the plan before execution
begins. A junior proposes, a senior approves.
**Shutdown**: Killing a thread leaves files half-written and config.json stale. You need a handshake: the lead requests, the teammate approves (finish and exit) or rejects (keep working).
Both problems share the same structure: one side sends a request with a
unique ID, the other side responds referencing that ID. A finite state
machine tracks each request through pending -> approved | rejected.
**Plan approval**: When the lead says "refactor the auth module," the teammate starts immediately. For high-risk changes, the lead should review the plan first.
## The Solution
Both share the same structure: one side sends a request with a unique ID, the other responds referencing that ID.
## Solution
```
Shutdown Protocol Plan Approval Protocol
@@ -35,12 +28,8 @@ Lead Teammate Teammate Lead
|<--shutdown_resp-| |<--plan_resp-----|
| {req_id:"abc", | | {req_id:"xyz", |
| approve:true} | | approve:true} |
| | | |
v v v v
tracker["abc"] exits proceeds tracker["xyz"]
= approved = approved
Shared FSM (identical for both protocols):
Shared FSM:
[pending] --approve--> [approved]
[pending] --reject---> [rejected]
@@ -51,128 +40,46 @@ Trackers:
## How It Works
1. The lead initiates shutdown by generating a request_id and sending
a shutdown_request through the inbox.
1. The lead initiates shutdown by generating a request_id and sending through the inbox.
```python
shutdown_requests = {}
def handle_shutdown_request(teammate: str) -> str:
req_id = str(uuid.uuid4())[:8]
shutdown_requests[req_id] = {
"target": teammate, "status": "pending",
}
shutdown_requests[req_id] = {"target": teammate, "status": "pending"}
BUS.send("lead", teammate, "Please shut down gracefully.",
"shutdown_request", {"request_id": req_id})
return f"Shutdown request {req_id} sent (status: pending)"
```
2. The teammate receives the request in its inbox and calls the
`shutdown_response` tool to approve or reject.
2. The teammate receives the request and responds with approve/reject.
```python
if tool_name == "shutdown_response":
req_id = args["request_id"]
approve = args["approve"]
if req_id in shutdown_requests:
shutdown_requests[req_id]["status"] = \
"approved" if approve else "rejected"
shutdown_requests[req_id]["status"] = "approved" if approve else "rejected"
BUS.send(sender, "lead", args.get("reason", ""),
"shutdown_response",
{"request_id": req_id, "approve": approve})
return f"Shutdown {'approved' if approve else 'rejected'}"
```
3. The teammate loop checks for approved shutdown and exits.
```python
if (block.name == "shutdown_response"
and block.input.get("approve")):
should_exit = True
# ...
member["status"] = "shutdown" if should_exit else "idle"
```
4. Plan approval follows the identical pattern. The teammate submits
a plan, generating a request_id.
3. Plan approval follows the identical pattern. The teammate submits a plan (generating a request_id), the lead reviews (referencing the same request_id).
```python
plan_requests = {}
if tool_name == "plan_approval":
plan_text = args.get("plan", "")
req_id = str(uuid.uuid4())[:8]
plan_requests[req_id] = {
"from": sender, "plan": plan_text,
"status": "pending",
}
BUS.send(sender, "lead", plan_text,
"plan_approval_request",
{"request_id": req_id, "plan": plan_text})
return f"Plan submitted (request_id={req_id})"
```
5. The lead reviews and responds with the same request_id.
```python
def handle_plan_review(request_id, approve, feedback=""):
req = plan_requests.get(request_id)
if not req:
return f"Error: Unknown request_id '{request_id}'"
req["status"] = "approved" if approve else "rejected"
BUS.send("lead", req["from"], feedback,
"plan_approval_response",
{"request_id": request_id,
"approve": approve,
"feedback": feedback})
return f"Plan {req['status']} for '{req['from']}'"
```
6. Both protocols use the same `plan_approval` tool name with two
modes: teammates submit (no request_id), the lead reviews (with
request_id).
```python
# Lead tool dispatch:
"plan_approval": lambda **kw: handle_plan_review(
kw["request_id"], kw["approve"],
kw.get("feedback", "")),
# Teammate: submit mode (generate request_id)
```
## Key Code
The dual protocol handlers (from `agents/s10_team_protocols.py`):
```python
shutdown_requests = {}
plan_requests = {}
# -- Shutdown --
def handle_shutdown_request(teammate):
req_id = str(uuid.uuid4())[:8]
shutdown_requests[req_id] = {
"target": teammate, "status": "pending"
}
BUS.send("lead", teammate,
"Please shut down gracefully.",
"shutdown_request",
{"request_id": req_id})
# -- Plan Approval --
def handle_plan_review(request_id, approve, feedback=""):
req = plan_requests[request_id]
req["status"] = "approved" if approve else "rejected"
BUS.send("lead", req["from"], feedback,
"plan_approval_response",
{"request_id": request_id,
"approve": approve})
# Both use the same FSM:
# pending -> approved | rejected
# Both correlate by request_id across async inboxes
{"request_id": request_id, "approve": approve})
```
One FSM, two applications. The same `pending -> approved | rejected` state machine handles any request-response protocol.
## What Changed From s09
| Component | Before (s09) | After (s10) |
@@ -180,14 +87,9 @@ def handle_plan_review(request_id, approve, feedback=""):
| Tools | 9 | 12 (+shutdown_req/resp +plan)|
| Shutdown | Natural exit only| Request-response handshake |
| Plan gating | None | Submit/review with approval |
| Request tracking| None | Two tracker dicts |
| Correlation | None | request_id per request |
| FSM | None | pending -> approved/rejected |
## Design Rationale
The request_id correlation pattern turns any async interaction into a trackable finite state machine. The same 3-state machine (pending -> approved/rejected) applies to shutdown, plan approval, or any future protocol. This is why one pattern handles multiple protocols -- the FSM does not care what it is approving. The request_id provides correlation across async inboxes where messages may arrive out of order, making the pattern robust to timing variations between agents.
## Try It
```sh
@@ -195,8 +97,6 @@ cd learn-claude-code
python agents/s10_team_protocols.py
```
Example prompts to try:
1. `Spawn alice as a coder. Then request her shutdown.`
2. `List teammates to see alice's status after shutdown approval`
3. `Spawn bob with a risky refactoring task. Review and reject his plan.`

View File

@@ -1,28 +1,18 @@
# s11: Autonomous Agents
> An idle cycle with task board polling lets teammates find and claim work themselves, with identity re-injection after context compression.
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > [ s11 ] s12`
## The Problem
> *"Poll, claim, work, repeat"* -- no coordinator needed, agents self-organize.
In s09-s10, teammates only work when explicitly told to. The lead must
spawn each teammate with a specific prompt. If the task board has 10
unclaimed tasks, the lead must manually assign each one. This does not
scale.
## Problem
True autonomy means teammates find work themselves. When a teammate
finishes its current task, it should scan the task board for unclaimed
work, claim a task, and start working -- without any instruction from
the lead.
In s09-s10, teammates only work when explicitly told to. The lead must spawn each one with a specific prompt. 10 unclaimed tasks on the board? The lead assigns each one manually. Doesn't scale.
But autonomous agents face a subtlety: after context compression, the
agent might forget who it is. If the messages are summarized, the
original system prompt identity ("you are alice, role: coder") fades.
Identity re-injection solves this by inserting an identity block at the
start of compressed contexts.
True autonomy: teammates scan the task board themselves, claim unclaimed tasks, work on them, then look for more.
Note: token estimation here uses characters/4 (rough). The nag threshold of 3 rounds is low for teaching visibility.
One subtlety: after context compression (s06), the agent might forget who it is. Identity re-injection fixes this.
## The Solution
## Solution
```
Teammate lifecycle with idle cycle:
@@ -36,8 +26,7 @@ Teammate lifecycle with idle cycle:
| WORK | <------------- | LLM |
+---+---+ +-------+
|
| stop_reason != tool_use
| (or idle tool called)
| stop_reason != tool_use (or idle tool called)
v
+--------+
| IDLE | poll every 5s for up to 60s
@@ -52,14 +41,11 @@ Teammate lifecycle with idle cycle:
Identity re-injection after compression:
if len(messages) <= 3:
messages.insert(0, identity_block)
"You are 'alice', role: coder, team: my-team"
```
## How It Works
1. The teammate loop has two phases: WORK and IDLE. WORK runs the
standard agent loop. When the LLM stops calling tools (or calls
the `idle` tool), the teammate enters the IDLE phase.
1. The teammate loop has two phases: WORK and IDLE. When the LLM stops calling tools (or calls `idle`), the teammate enters IDLE.
```python
def _loop(self, name, role, prompt):
@@ -67,12 +53,6 @@ def _loop(self, name, role, prompt):
# -- WORK PHASE --
messages = [{"role": "user", "content": prompt}]
for _ in range(50):
inbox = BUS.read_inbox(name)
for msg in inbox:
if msg.get("type") == "shutdown_request":
self._set_status(name, "shutdown")
return
messages.append(...)
response = client.messages.create(...)
if response.stop_reason != "tool_use":
break
@@ -89,36 +69,31 @@ def _loop(self, name, role, prompt):
self._set_status(name, "working")
```
2. The idle phase polls the inbox and task board in a loop.
2. The idle phase polls inbox and task board in a loop.
```python
def _idle_poll(self, name, messages):
polls = IDLE_TIMEOUT // POLL_INTERVAL # 60s / 5s = 12
for _ in range(polls):
for _ in range(IDLE_TIMEOUT // POLL_INTERVAL): # 60s / 5s = 12
time.sleep(POLL_INTERVAL)
# Check inbox for new messages
inbox = BUS.read_inbox(name)
if inbox:
messages.append({"role": "user",
"content": f"<inbox>{inbox}</inbox>"})
return True
# Scan task board for unclaimed tasks
unclaimed = scan_unclaimed_tasks()
if unclaimed:
task = unclaimed[0]
claim_task(task["id"], name)
claim_task(unclaimed[0]["id"], name)
messages.append({"role": "user",
"content": f"<auto-claimed>Task #{task['id']}: "
f"{task['subject']}</auto-claimed>"})
"content": f"<auto-claimed>Task #{unclaimed[0]['id']}: "
f"{unclaimed[0]['subject']}</auto-claimed>"})
return True
return False # timeout -> shutdown
```
3. Task board scanning looks for pending, unowned, unblocked tasks.
3. Task board scanning: find pending, unowned, unblocked tasks.
```python
def scan_unclaimed_tasks() -> list:
TASKS_DIR.mkdir(exist_ok=True)
unclaimed = []
for f in sorted(TASKS_DIR.glob("task_*.json")):
task = json.loads(f.read_text())
@@ -127,77 +102,19 @@ def scan_unclaimed_tasks() -> list:
and not task.get("blockedBy")):
unclaimed.append(task)
return unclaimed
def claim_task(task_id: int, owner: str):
path = TASKS_DIR / f"task_{task_id}.json"
task = json.loads(path.read_text())
task["status"] = "in_progress"
task["owner"] = owner
path.write_text(json.dumps(task, indent=2))
```
4. Identity re-injection inserts an identity block when the context
is too short, indicating compression has occurred.
4. Identity re-injection: when context is too short (compression happened), insert an identity block.
```python
def make_identity_block(name, role, team_name):
return {"role": "user",
"content": f"<identity>You are '{name}', "
f"role: {role}, team: {team_name}. "
f"Continue your work.</identity>"}
# Before resuming work after idle:
if len(messages) <= 3:
messages.insert(0, make_identity_block(
name, role, team_name))
messages.insert(0, {"role": "user",
"content": f"<identity>You are '{name}', role: {role}, "
f"team: {team_name}. Continue your work.</identity>"})
messages.insert(1, {"role": "assistant",
"content": f"I am {name}. Continuing."})
```
5. The `idle` tool lets the teammate explicitly signal it has no more
work, entering the idle polling phase early.
```python
{"name": "idle",
"description": "Signal that you have no more work. "
"Enters idle polling phase.",
"input_schema": {"type": "object", "properties": {}}},
```
## Key Code
The autonomous loop (from `agents/s11_autonomous_agents.py`):
```python
def _loop(self, name, role, prompt):
while True:
# WORK PHASE
for _ in range(50):
response = client.messages.create(...)
if response.stop_reason != "tool_use":
break
for block in response.content:
if block.name == "idle":
idle_requested = True
if idle_requested:
break
# IDLE PHASE
self._set_status(name, "idle")
for _ in range(IDLE_TIMEOUT // POLL_INTERVAL):
time.sleep(POLL_INTERVAL)
inbox = BUS.read_inbox(name)
if inbox: resume = True; break
unclaimed = scan_unclaimed_tasks()
if unclaimed:
claim_task(unclaimed[0]["id"], name)
resume = True; break
if not resume:
self._set_status(name, "shutdown")
return
self._set_status(name, "working")
```
## What Changed From s10
| Component | Before (s10) | After (s11) |
@@ -209,10 +126,6 @@ def _loop(self, name, role, prompt):
| Identity | System prompt | + re-injection after compress|
| Timeout | None | 60s idle -> auto shutdown |
## Design Rationale
Polling + timeout makes agents self-organizing without a central coordinator. Each agent independently polls the task board, claims unclaimed work, and returns to idle when done. The timeout triggers the poll cycle, and if no work appears within the window, the agent shuts itself down. This is the same pattern as work-stealing thread pools -- distributed, no single point of failure. Identity re-injection after compression ensures agents maintain their role even when conversation history is summarized away.
## Try It
```sh
@@ -220,8 +133,6 @@ cd learn-claude-code
python agents/s11_autonomous_agents.py
```
Example prompts to try:
1. `Create 3 tasks on the board, then spawn alice and bob. Watch them auto-claim.`
2. `Spawn a coder teammate and let it find work from the task board itself`
3. `Create tasks with dependencies. Watch teammates respect the blocked order.`

View File

@@ -1,238 +1,109 @@
# s12: Worktree + Task Isolation
> Isolate by directory, coordinate by task ID -- tasks are the control plane, worktrees are the execution plane, and an event stream makes every lifecycle step observable.
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > [ s12 ]`
## The Problem
> *"Isolate by directory, coordinate by task ID"* -- task board + optional worktree lanes.
By s11, agents can claim and complete tasks autonomously. But every task runs in one shared directory. Ask two agents to refactor different modules at the same time and you hit three failure modes:
## Problem
Agent A edits `auth.py`. Agent B edits `auth.py`. Neither knows the other touched it. Unstaged changes collide, task status says "in_progress" but the directory is a mess, and when something breaks there is no way to roll back one agent's work without destroying the other's. The task board tracks _what to do_ but has no opinion about _where to do it_.
By s11, agents can claim and complete tasks autonomously. But every task runs in one shared directory. Two agents refactoring different modules at the same time will collide: agent A edits `config.py`, agent B edits `config.py`, unstaged changes mix, and neither can roll back cleanly.
The fix is to separate the two concerns. Tasks manage goals. Worktrees manage execution context. Bind them by task ID, and each agent gets its own directory, its own branch, and a clean teardown path.
The task board tracks *what to do* but has no opinion about *where to do it*. The fix: give each task its own git worktree directory. Tasks manage goals, worktrees manage execution context. Bind them by task ID.
## The Solution
## Solution
```
Control Plane (.tasks/) Execution Plane (.worktrees/)
+---------------------------+ +---------------------------+
| task_1.json | | index.json |
| id: 1 | | name: "auth-refactor" |
| subject: "Auth refactor"| bind | path: ".worktrees/..." |
| status: "in_progress" | <----> | branch: "wt/auth-..." |
| worktree: "auth-refactor"| | task_id: 1 |
+---------------------------+ | status: "active" |
+---------------------------+
| task_2.json | | |
| id: 2 | bind | name: "ui-login" |
| subject: "Login page" | <----> | task_id: 2 |
| worktree: "ui-login" | | status: "active" |
+---------------------------+ +---------------------------+
|
+---------------------------+
| events.jsonl (append-only)|
| worktree.create.before |
| worktree.create.after |
| worktree.remove.after |
| task.completed |
+---------------------------+
Control plane (.tasks/) Execution plane (.worktrees/)
+------------------+ +------------------------+
| task_1.json | | auth-refactor/ |
| status: in_progress <------> branch: wt/auth-refactor
| worktree: "auth-refactor" | task_id: 1 |
+------------------+ +------------------------+
| task_2.json | | ui-login/ |
| status: pending <------> branch: wt/ui-login
| worktree: "ui-login" | task_id: 2 |
+------------------+ +------------------------+
|
index.json (worktree registry)
events.jsonl (lifecycle log)
State machines:
Task: pending -> in_progress -> completed
Worktree: absent -> active -> removed | kept
```
Three state layers make this work:
1. **Control plane** (`.tasks/task_*.json`) -- what is assigned, in progress, or done. Key fields: `id`, `subject`, `status`, `owner`, `worktree`.
2. **Execution plane** (`.worktrees/index.json`) -- where commands run and whether the workspace is still valid. Key fields: `name`, `path`, `branch`, `task_id`, `status`.
3. **Runtime state** (in-memory) -- per-turn execution continuity: `current_task`, `current_worktree`, `tool_result`, `error`.
## How It Works
The lifecycle has five steps. Each step is a tool call.
1. **Create a task.** Persist the goal first. The task starts as `pending` with an empty `worktree` field.
1. **Create a task.** Persist the goal first.
```python
task = {
"id": self._next_id,
"subject": subject,
"status": "pending",
"owner": "",
"worktree": "",
}
self._save(task)
TASKS.create("Implement auth refactor")
# -> .tasks/task_1.json status=pending worktree=""
```
2. **Create a worktree.** Allocate an isolated directory and branch. If you pass `task_id`, the task auto-advances to `in_progress` and the binding is written to both sides.
2. **Create a worktree and bind to the task.** Passing `task_id` auto-advances the task to `in_progress`.
```python
self._run_git(["worktree", "add", "-b", branch, str(path), base_ref])
entry = {
"name": name,
"path": str(path),
"branch": branch,
"task_id": task_id,
"status": "active",
}
idx["worktrees"].append(entry)
self._save_index(idx)
if task_id is not None:
self.tasks.bind_worktree(task_id, name)
WORKTREES.create("auth-refactor", task_id=1)
# -> git worktree add -b wt/auth-refactor .worktrees/auth-refactor HEAD
# -> index.json gets new entry, task_1.json gets worktree="auth-refactor"
```
3. **Run commands in the worktree.** `worktree_run` sets `cwd` to the worktree path. Edits happen in the isolated directory, not the shared workspace.
The binding writes state to both sides:
```python
r = subprocess.run(
command,
shell=True,
cwd=path,
capture_output=True,
text=True,
timeout=300,
)
```
4. **Observe.** `worktree_status` shows git state inside the isolated context. `worktree_events` queries the append-only event stream.
5. **Close out.** Two choices:
- `worktree_keep(name)` -- preserve the directory, mark lifecycle as `kept`.
- `worktree_remove(name, complete_task=True)` -- remove the directory, complete the bound task, unbind, and emit `task.completed`. This is the closeout pattern: one call handles teardown and task completion together.
## State Machines
```
Task: pending -------> in_progress -------> completed
(worktree_create (worktree_remove
with task_id) with complete_task=true)
Worktree: absent --------> active -----------> removed | kept
(worktree_create) (worktree_remove | worktree_keep)
```
## Key Code
The closeout pattern -- teardown + task completion in one operation (from `agents/s12_worktree_task_isolation.py`):
```python
def remove(self, name: str, force: bool = False, complete_task: bool = False) -> str:
wt = self._find(name)
if not wt:
return f"Error: Unknown worktree '{name}'"
self.events.emit(
"worktree.remove.before",
task={"id": wt.get("task_id")} if wt.get("task_id") is not None else {},
worktree={"name": name, "path": wt.get("path")},
)
try:
args = ["worktree", "remove"]
if force:
args.append("--force")
args.append(wt["path"])
self._run_git(args)
if complete_task and wt.get("task_id") is not None:
task_id = wt["task_id"]
self.tasks.update(task_id, status="completed")
self.tasks.unbind_worktree(task_id)
self.events.emit("task.completed", task={
"id": task_id, "status": "completed",
}, worktree={"name": name})
idx = self._load_index()
for item in idx.get("worktrees", []):
if item.get("name") == name:
item["status"] = "removed"
item["removed_at"] = time.time()
self._save_index(idx)
self.events.emit(
"worktree.remove.after",
task={"id": wt.get("task_id")} if wt.get("task_id") is not None else {},
worktree={"name": name, "path": wt.get("path"), "status": "removed"},
)
return f"Removed worktree '{name}'"
except Exception as e:
self.events.emit(
"worktree.remove.failed",
worktree={"name": name},
error=str(e),
)
raise
```
The task-side binding (from `agents/s12_worktree_task_isolation.py`):
```python
def bind_worktree(self, task_id: int, worktree: str, owner: str = "") -> str:
def bind_worktree(self, task_id, worktree):
task = self._load(task_id)
task["worktree"] = worktree
if task["status"] == "pending":
task["status"] = "in_progress"
task["updated_at"] = time.time()
self._save(task)
```
The dispatch map wiring all tools together:
3. **Run commands in the worktree.** `cwd` points to the isolated directory.
```python
TOOL_HANDLERS = {
"bash": lambda **kw: run_bash(kw["command"]),
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"], kw["new_text"]),
"task_create": lambda **kw: TASKS.create(kw["subject"], kw.get("description", "")),
"task_list": lambda **kw: TASKS.list_all(),
"task_get": lambda **kw: TASKS.get(kw["task_id"]),
"task_update": lambda **kw: TASKS.update(kw["task_id"], kw.get("status"), kw.get("owner")),
"task_bind_worktree": lambda **kw: TASKS.bind_worktree(kw["task_id"], kw["worktree"]),
"worktree_create": lambda **kw: WORKTREES.create(kw["name"], kw.get("task_id")),
"worktree_list": lambda **kw: WORKTREES.list_all(),
"worktree_status": lambda **kw: WORKTREES.status(kw["name"]),
"worktree_run": lambda **kw: WORKTREES.run(kw["name"], kw["command"]),
"worktree_keep": lambda **kw: WORKTREES.keep(kw["name"]),
"worktree_remove": lambda **kw: WORKTREES.remove(kw["name"], kw.get("force", False), kw.get("complete_task", False)),
"worktree_events": lambda **kw: EVENTS.list_recent(kw.get("limit", 20)),
}
subprocess.run(command, shell=True, cwd=worktree_path,
capture_output=True, text=True, timeout=300)
```
## Event Stream
4. **Close out.** Two choices:
- `worktree_keep(name)` -- preserve the directory for later.
- `worktree_remove(name, complete_task=True)` -- remove directory, complete the bound task, emit event. One call handles teardown + completion.
Every lifecycle transition emits a before/after/failed triplet to `.worktrees/events.jsonl`. This is an append-only log, not a replacement for task/worktree state files.
```python
def remove(self, name, force=False, complete_task=False):
self._run_git(["worktree", "remove", wt["path"]])
if complete_task and wt.get("task_id") is not None:
self.tasks.update(wt["task_id"], status="completed")
self.tasks.unbind_worktree(wt["task_id"])
self.events.emit("task.completed", ...)
```
Events emitted:
- `worktree.create.before` / `worktree.create.after` / `worktree.create.failed`
- `worktree.remove.before` / `worktree.remove.after` / `worktree.remove.failed`
- `worktree.keep`
- `task.completed` (when `complete_task=true` succeeds)
Payload shape:
5. **Event stream.** Every lifecycle step emits to `.worktrees/events.jsonl`:
```json
{
"event": "worktree.remove.after",
"task": {"id": 7, "status": "completed"},
"worktree": {"name": "auth-refactor", "path": "...", "status": "removed"},
"task": {"id": 1, "status": "completed"},
"worktree": {"name": "auth-refactor", "status": "removed"},
"ts": 1730000000
}
```
This gives you three things: policy decoupling (audit and notifications stay outside the core flow), failure compensation (`*.failed` records mark partial transitions), and queryability (`worktree_events` tool reads the log directly).
Events emitted: `worktree.create.before/after/failed`, `worktree.remove.before/after/failed`, `worktree.keep`, `task.completed`.
After a crash, state reconstructs from `.tasks/` + `.worktrees/index.json` on disk. Conversation memory is volatile; file state is durable.
## What Changed From s11
| Component | Before (s11) | After (s12) |
|--------------------|----------------------------|----------------------------------------------|
| Coordination state | Task board (`owner/status`) | Task board + explicit `worktree` binding |
| Execution scope | Shared directory | Task-scoped isolated directory |
| Recoverability | Task status only | Task status + worktree index |
| Teardown semantics | Task completion | Task completion + explicit keep/remove |
| Lifecycle visibility | Implicit in logs | Explicit events in `.worktrees/events.jsonl` |
## Design Rationale
Separating control plane from execution plane means you can reason about _what to do_ and _where to do it_ independently. A task can exist without a worktree (planning phase). A worktree can exist without a task (ad-hoc exploration). Binding them is an explicit action that writes state to both sides. This composability is the point -- it keeps the system recoverable after crashes. After an interruption, state reconstructs from `.tasks/` + `.worktrees/index.json` on disk. Volatile in-memory session state downgrades into explicit, durable file state. The event stream adds observability without coupling side effects into the critical path: auditing, notifications, and quota checks consume events rather than intercepting state writes.
| Coordination | Task board (owner/status) | Task board + explicit worktree binding |
| Execution scope | Shared directory | Task-scoped isolated directory |
| Recoverability | Task status only | Task status + worktree index |
| Teardown | Task completion | Task completion + explicit keep/remove |
| Lifecycle visibility | Implicit in logs | Explicit events in `.worktrees/events.jsonl` |
## Try It
@@ -241,10 +112,8 @@ cd learn-claude-code
python agents/s12_worktree_task_isolation.py
```
Example prompts to try:
1. `Create tasks for backend auth and frontend login page, then list tasks.`
2. `Create worktree "auth-refactor" for task 1, create worktree "ui-login", then bind task 2 to "ui-login".`
2. `Create worktree "auth-refactor" for task 1, then bind task 2 to a new worktree "ui-login".`
3. `Run "git status --short" in worktree "auth-refactor".`
4. `Keep worktree "ui-login", then list worktrees and inspect worktree events.`
4. `Keep worktree "ui-login", then list worktrees and inspect events.`
5. `Remove worktree "auth-refactor" with complete_task=true, then list tasks/worktrees/events.`