mirror of
https://github.com/shareAI-lab/analysis_claude_code.git
synced 2026-05-06 16:26:16 +08:00
better doc
This commit is contained in:
@@ -1,49 +1,37 @@
|
||||
# s01: The Agent Loop
|
||||
|
||||
> The core of a coding agent is a while loop that feeds tool results back to the model until the model decides to stop.
|
||||
`[ s01 ] s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"One loop & Bash is all you need"* -- one tool + one loop = an agent.
|
||||
|
||||
Why can't a language model just answer a coding question? Because coding
|
||||
requires _interaction with the real world_. The model needs to read files,
|
||||
run tests, check errors, and iterate. A single prompt-response pair cannot
|
||||
do this.
|
||||
## Problem
|
||||
|
||||
Without the agent loop, you would have to copy-paste outputs back into the
|
||||
model yourself. The user becomes the loop. The agent loop automates this:
|
||||
call the model, execute whatever tools it asks for, feed the results back,
|
||||
repeat until the model says "I'm done."
|
||||
A language model can reason about code, but it can't *touch* the real world -- can't read files, run tests, or check errors. Without a loop, every tool call requires you to manually copy-paste results back. You become the loop.
|
||||
|
||||
Consider a simple task: "Create a Python file that prints hello." The model
|
||||
needs to (1) decide to write a file, (2) write it, (3) verify it works.
|
||||
That is three tool calls minimum. Without a loop, each one requires manual
|
||||
human intervention.
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
+----------+ +-------+ +---------+
|
||||
| User | ---> | LLM | ---> | Tool |
|
||||
| prompt | | | | execute |
|
||||
+----------+ +---+---+ +----+----+
|
||||
^ |
|
||||
| tool_result |
|
||||
+---------------+
|
||||
(loop continues)
|
||||
|
||||
The loop terminates when stop_reason != "tool_use".
|
||||
That single condition is the entire control flow.
|
||||
+--------+ +-------+ +---------+
|
||||
| User | ---> | LLM | ---> | Tool |
|
||||
| prompt | | | | execute |
|
||||
+--------+ +---+---+ +----+----+
|
||||
^ |
|
||||
| tool_result |
|
||||
+----------------+
|
||||
(loop until stop_reason != "tool_use")
|
||||
```
|
||||
|
||||
One exit condition controls the entire flow. The loop runs until the model stops calling tools.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The user provides a prompt. It becomes the first message.
|
||||
1. User prompt becomes the first message.
|
||||
|
||||
```python
|
||||
history.append({"role": "user", "content": query})
|
||||
messages.append({"role": "user", "content": query})
|
||||
```
|
||||
|
||||
2. The messages array is sent to the LLM along with the tool definitions.
|
||||
2. Send messages + tool definitions to the LLM.
|
||||
|
||||
```python
|
||||
response = client.messages.create(
|
||||
@@ -52,25 +40,18 @@ response = client.messages.create(
|
||||
)
|
||||
```
|
||||
|
||||
3. The assistant response is appended to messages.
|
||||
3. Append the assistant response. Check `stop_reason` -- if the model didn't call a tool, we're done.
|
||||
|
||||
```python
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
```
|
||||
|
||||
4. We check the stop reason. If the model did not call a tool, the loop
|
||||
ends. In this minimal lesson implementation, this is the only loop exit
|
||||
condition.
|
||||
|
||||
```python
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
```
|
||||
|
||||
5. For each tool_use block in the response, execute the tool (bash in this
|
||||
session) and collect results.
|
||||
4. Execute each tool call, collect results, append as a user message. Loop back to step 2.
|
||||
|
||||
```python
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
@@ -79,29 +60,24 @@ for block in response.content:
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
```
|
||||
|
||||
6. The results are appended as a user message, and the loop continues.
|
||||
|
||||
```python
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The minimum viable agent -- the entire pattern in under 30 lines
|
||||
(from `agents/s01_agent_loop.py`, lines 66-86):
|
||||
Assembled into one function:
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list):
|
||||
def agent_loop(query):
|
||||
messages = [{"role": "user", "content": query}]
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
@@ -114,9 +90,9 @@ def agent_loop(messages: list):
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
## What Changed
|
||||
That's the entire agent in under 30 lines. Everything else in this course layers on top -- without changing the loop.
|
||||
|
||||
This is session 1 -- the starting point. There is no prior session.
|
||||
## What Changed
|
||||
|
||||
| Component | Before | After |
|
||||
|---------------|------------|--------------------------------|
|
||||
@@ -125,10 +101,6 @@ This is session 1 -- the starting point. There is no prior session.
|
||||
| Messages | (none) | Accumulating list |
|
||||
| Control flow | (none) | `stop_reason != "tool_use"` |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
This loop is the foundation of LLM-based agents. Production implementations add error handling, token counting, streaming, retry logic, permission policy, and lifecycle orchestration, but the core interaction pattern still starts here. The simplicity is the point for this session: in this minimal implementation, one exit condition (`stop_reason != "tool_use"`) controls the flow we need to learn first. Everything else in this course layers on top of this loop. Understanding this loop gives you the base model, not the full production architecture.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -136,8 +108,6 @@ cd learn-claude-code
|
||||
python agents/s01_agent_loop.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Create a file called hello.py that prints "Hello, World!"`
|
||||
2. `List all Python files in this directory`
|
||||
3. `What is the current git branch?`
|
||||
|
||||
@@ -1,47 +1,43 @@
|
||||
# s02: Tools
|
||||
# s02: Tool Use
|
||||
|
||||
> A dispatch map routes tool calls to handler functions. The loop stays identical.
|
||||
`s01 > [ s02 ] s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"The loop didn't change"* -- adding tools means adding handlers, not rewriting the loop.
|
||||
|
||||
With only `bash`, the agent shells out for everything: reading files,
|
||||
writing files, editing files. This works but is fragile. `cat` output
|
||||
gets truncated unpredictably. `sed` replacements fail on special
|
||||
characters. The model wastes tokens constructing shell pipelines when
|
||||
a direct function call would be simpler.
|
||||
## Problem
|
||||
|
||||
More importantly, bash is a security surface. Every bash call can do
|
||||
anything the shell can do. With dedicated tools like `read_file` and
|
||||
`write_file`, you can enforce path sandboxing and block dangerous
|
||||
patterns at the tool level rather than hoping the model avoids them.
|
||||
With only `bash`, the agent shells out for everything. `cat` truncates unpredictably, `sed` fails on special characters, and every bash call is an unconstrained security surface. Dedicated tools like `read_file` and `write_file` let you enforce path sandboxing at the tool level.
|
||||
|
||||
The insight is that adding tools does not require changing the loop.
|
||||
The loop from s01 stays identical. You add entries to the tools array,
|
||||
add handler functions, and wire them together with a dispatch map.
|
||||
The key insight: adding tools does not require changing the loop.
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
+----------+ +-------+ +------------------+
|
||||
| User | ---> | LLM | ---> | Tool Dispatch |
|
||||
| prompt | | | | { |
|
||||
+----------+ +---+---+ | bash: run_bash |
|
||||
^ | read: run_read |
|
||||
| | write: run_wr |
|
||||
+----------+ edit: run_edit |
|
||||
tool_result| } |
|
||||
+------------------+
|
||||
+--------+ +-------+ +------------------+
|
||||
| User | ---> | LLM | ---> | Tool Dispatch |
|
||||
| prompt | | | | { |
|
||||
+--------+ +---+---+ | bash: run_bash |
|
||||
^ | read: run_read |
|
||||
| | write: run_wr |
|
||||
+-----------+ edit: run_edit |
|
||||
tool_result | } |
|
||||
+------------------+
|
||||
|
||||
The dispatch map is a dict: {tool_name: handler_function}
|
||||
The dispatch map is a dict: {tool_name: handler_function}.
|
||||
One lookup replaces any if/elif chain.
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Define handler functions for each tool. Each takes keyword arguments
|
||||
matching the tool's input_schema and returns a string result.
|
||||
1. Each tool gets a handler function. Path sandboxing prevents workspace escape.
|
||||
|
||||
```python
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
|
||||
def run_read(path: str, limit: int = None) -> str:
|
||||
text = safe_path(path).read_text()
|
||||
lines = text.splitlines()
|
||||
@@ -50,7 +46,7 @@ def run_read(path: str, limit: int = None) -> str:
|
||||
return "\n".join(lines)[:50000]
|
||||
```
|
||||
|
||||
2. Create the dispatch map linking tool names to handlers.
|
||||
2. The dispatch map links tool names to handlers.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
@@ -62,13 +58,14 @@ TOOL_HANDLERS = {
|
||||
}
|
||||
```
|
||||
|
||||
3. In the agent loop, look up the handler by name instead of hardcoding.
|
||||
3. In the loop, look up the handler by name. The loop body itself is unchanged from s01.
|
||||
|
||||
```python
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input)
|
||||
output = handler(**block.input) if handler \
|
||||
else f"Unknown tool: {block.name}"
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
@@ -76,51 +73,7 @@ for block in response.content:
|
||||
})
|
||||
```
|
||||
|
||||
4. Path sandboxing prevents the model from escaping the workspace.
|
||||
|
||||
```python
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The dispatch pattern (from `agents/s02_tool_use.py`, lines 93-129):
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": lambda **kw: run_bash(kw["command"]),
|
||||
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
|
||||
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
|
||||
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"],
|
||||
kw["new_text"]),
|
||||
}
|
||||
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler \
|
||||
else f"Unknown tool: {block.name}"
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
Add a tool = add a handler + add a schema entry. The loop never changes.
|
||||
|
||||
## What Changed From s01
|
||||
|
||||
@@ -131,10 +84,6 @@ def agent_loop(messages: list):
|
||||
| Path safety | None | `safe_path()` sandbox |
|
||||
| Agent loop | Unchanged | Unchanged |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
The dispatch map scales linearly: add a tool, add a handler, add a schema entry. The loop never changes. Handlers are pure functions, so they test in isolation. Any agent that outgrows a dispatch map has a design problem, not a scaling problem.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -142,10 +91,7 @@ cd learn-claude-code
|
||||
python agents/s02_tool_use.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Read the file requirements.txt`
|
||||
2. `Create a file called greet.py with a greet(name) function`
|
||||
3. `Edit greet.py to add a docstring to the function`
|
||||
4. `Read greet.py to verify the edit worked`
|
||||
5. `Run the greet function with bash: python -c "from greet import greet; greet('World')"`
|
||||
|
||||
@@ -1,159 +1,87 @@
|
||||
# s03: TodoWrite
|
||||
|
||||
> A TodoManager lets the agent track its own progress, and a nag reminder injection forces it to keep updating when it forgets.
|
||||
`s01 > s02 > [ s03 ] s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"Plan before you act"* -- visible plans improve task completion.
|
||||
|
||||
When an agent works on a multi-step task, it often loses track of what it
|
||||
has done and what remains. Without explicit planning, the model might repeat
|
||||
work, skip steps, or wander off on tangents. The user has no visibility
|
||||
into the agent's internal plan.
|
||||
## Problem
|
||||
|
||||
This is worse than it sounds. Long conversations cause the model to "drift"
|
||||
-- the system prompt fades in influence as the context window fills with
|
||||
tool results. A 10-step refactoring task might complete steps 1-3, then
|
||||
the model starts improvising because it forgot steps 4-10 existed.
|
||||
On multi-step tasks, the model loses track. It repeats work, skips steps, or wanders off. Long conversations make this worse -- the system prompt fades as tool results fill the context. A 10-step refactoring might complete steps 1-3, then the model starts improvising because it forgot steps 4-10.
|
||||
|
||||
The solution is structured state: a TodoManager that the model writes to
|
||||
explicitly. The model creates a plan, marks items in_progress as it works,
|
||||
and marks them completed when done. A nag reminder injects a nudge if the
|
||||
model goes 3+ rounds without updating its todos.
|
||||
|
||||
Note: the nag threshold of 3 rounds is low for visibility. Production systems tune higher. From s07, this course switches to the Task board for durable multi-step work; TodoWrite remains available for quick checklists.
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
+----------+ +-------+ +---------+
|
||||
| User | ---> | LLM | ---> | Tools |
|
||||
| prompt | | | | + todo |
|
||||
+----------+ +---+---+ +----+----+
|
||||
^ |
|
||||
| tool_result |
|
||||
+---------------+
|
||||
|
|
||||
+-----------+-----------+
|
||||
| TodoManager state |
|
||||
| [ ] task A |
|
||||
| [>] task B <- doing |
|
||||
| [x] task C |
|
||||
+-----------------------+
|
||||
|
|
||||
if rounds_since_todo >= 3:
|
||||
inject <reminder> into tool_result
|
||||
+--------+ +-------+ +---------+
|
||||
| User | ---> | LLM | ---> | Tools |
|
||||
| prompt | | | | + todo |
|
||||
+--------+ +---+---+ +----+----+
|
||||
^ |
|
||||
| tool_result |
|
||||
+----------------+
|
||||
|
|
||||
+-----------+-----------+
|
||||
| TodoManager state |
|
||||
| [ ] task A |
|
||||
| [>] task B <- doing |
|
||||
| [x] task C |
|
||||
+-----------------------+
|
||||
|
|
||||
if rounds_since_todo >= 3:
|
||||
inject <reminder> into tool_result
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The TodoManager validates and stores a list of items with statuses.
|
||||
Only one item can be `in_progress` at a time.
|
||||
1. TodoManager stores items with statuses. Only one item can be `in_progress` at a time.
|
||||
|
||||
```python
|
||||
class TodoManager:
|
||||
def __init__(self):
|
||||
self.items = []
|
||||
|
||||
def update(self, items: list) -> str:
|
||||
validated = []
|
||||
in_progress_count = 0
|
||||
validated, in_progress_count = [], 0
|
||||
for item in items:
|
||||
status = item.get("status", "pending")
|
||||
if status == "in_progress":
|
||||
in_progress_count += 1
|
||||
validated.append({
|
||||
"id": item["id"],
|
||||
"text": item["text"],
|
||||
"status": status,
|
||||
})
|
||||
validated.append({"id": item["id"], "text": item["text"],
|
||||
"status": status})
|
||||
if in_progress_count > 1:
|
||||
raise ValueError("Only one task can be in_progress")
|
||||
self.items = validated
|
||||
return self.render()
|
||||
```
|
||||
|
||||
2. The `todo` tool is added to the dispatch map like any other tool.
|
||||
2. The `todo` tool goes into the dispatch map like any other tool.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": lambda **kw: run_bash(kw["command"]),
|
||||
# ...other tools...
|
||||
"todo": lambda **kw: TODO.update(kw["items"]),
|
||||
# ...base tools...
|
||||
"todo": lambda **kw: TODO.update(kw["items"]),
|
||||
}
|
||||
```
|
||||
|
||||
3. The nag reminder injects a `<reminder>` tag into the tool_result
|
||||
messages when the model goes 3+ rounds without calling `todo`.
|
||||
3. A nag reminder injects a nudge if the model goes 3+ rounds without calling `todo`.
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list):
|
||||
rounds_since_todo = 0
|
||||
while True:
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
last = messages[-1]
|
||||
if (last["role"] == "user"
|
||||
and isinstance(last.get("content"), list)):
|
||||
last["content"].insert(0, {
|
||||
"type": "text",
|
||||
"text": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
# ... rest of loop ...
|
||||
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
last = messages[-1]
|
||||
if last["role"] == "user" and isinstance(last.get("content"), list):
|
||||
last["content"].insert(0, {
|
||||
"type": "text",
|
||||
"text": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
```
|
||||
|
||||
4. The system prompt instructs the model to use todos for planning.
|
||||
|
||||
```python
|
||||
SYSTEM = f"""You are a coding agent at {WORKDIR}.
|
||||
Use the todo tool to plan multi-step tasks.
|
||||
Mark in_progress before starting, completed when done.
|
||||
Prefer tools over prose."""
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The TodoManager and nag injection (from `agents/s03_todo_write.py`,
|
||||
lines 51-85 and 158-187):
|
||||
|
||||
```python
|
||||
class TodoManager:
|
||||
def update(self, items: list) -> str:
|
||||
validated = []
|
||||
in_progress_count = 0
|
||||
for item in items:
|
||||
status = item.get("status", "pending")
|
||||
if status == "in_progress":
|
||||
in_progress_count += 1
|
||||
validated.append({
|
||||
"id": item["id"],
|
||||
"text": item["text"],
|
||||
"status": status,
|
||||
})
|
||||
if in_progress_count > 1:
|
||||
raise ValueError("Only one in_progress")
|
||||
self.items = validated
|
||||
return self.render()
|
||||
|
||||
# In agent_loop:
|
||||
if rounds_since_todo >= 3:
|
||||
last["content"].insert(0, {
|
||||
"type": "text",
|
||||
"text": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
```
|
||||
The "one in_progress at a time" constraint forces sequential focus. The nag reminder creates accountability.
|
||||
|
||||
## What Changed From s02
|
||||
|
||||
| Component | Before (s02) | After (s03) |
|
||||
|----------------|------------------|--------------------------|
|
||||
| Tools | 4 | 5 (+todo) |
|
||||
| Planning | None | TodoManager with statuses|
|
||||
| Component | Before (s02) | After (s03) |
|
||||
|----------------|------------------|----------------------------|
|
||||
| Tools | 4 | 5 (+todo) |
|
||||
| Planning | None | TodoManager with statuses |
|
||||
| Nag injection | None | `<reminder>` after 3 rounds|
|
||||
| Agent loop | Simple dispatch | + rounds_since_todo counter|
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Visible plans improve task completion because the model can self-monitor progress. The nag mechanism creates accountability -- without it, the model may abandon plans mid-execution as conversation context grows and earlier instructions fade. The "one in_progress at a time" constraint enforces sequential focus, preventing context-switching overhead that degrades output quality. This pattern works because it externalizes the model's working memory into structured state that survives attention drift.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -161,8 +89,6 @@ cd learn-claude-code
|
||||
python agents/s03_todo_write.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Refactor the file hello.py: add type hints, docstrings, and a main guard`
|
||||
2. `Create a Python package with __init__.py, utils.py, and tests/test_utils.py`
|
||||
3. `Review all Python files and fix any style issues`
|
||||
|
||||
@@ -1,45 +1,32 @@
|
||||
# s04: Subagents
|
||||
|
||||
> A subagent runs with a fresh messages list, shares the filesystem with the parent, and returns only a summary -- keeping the parent context clean.
|
||||
`s01 > s02 > s03 > [ s04 ] s05 > s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"Process isolation = context isolation"* -- fresh messages[] per subagent.
|
||||
|
||||
As the agent works, its messages array grows. Every tool call, every file
|
||||
read, every bash output accumulates. After 20-30 tool calls, the context
|
||||
window is crowded with irrelevant history. Reading a 500-line file to
|
||||
answer a quick question permanently adds 500 lines to the context.
|
||||
## Problem
|
||||
|
||||
This is particularly bad for exploratory tasks. "What testing framework
|
||||
does this project use?" might require reading 5 files, but the parent
|
||||
agent does not need all 5 file contents in its history -- it just needs
|
||||
the answer: "pytest with conftest.py configuration."
|
||||
As the agent works, its messages array grows. Every file read, every bash output stays in context permanently. "What testing framework does this project use?" might require reading 5 files, but the parent only needs the answer: "pytest."
|
||||
|
||||
In this course, a practical solution is fresh-context isolation: spawn a child agent with `messages=[]`.
|
||||
The child explores, reads files, runs commands. When it finishes, only its
|
||||
final text response returns to the parent. The child's entire message
|
||||
history is discarded.
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
Parent agent Subagent
|
||||
+------------------+ +------------------+
|
||||
| messages=[...] | | messages=[] | <-- fresh
|
||||
| | dispatch | |
|
||||
| tool: task | ---------->| while tool_use: |
|
||||
| prompt="..." | | call tools |
|
||||
| | summary | append results |
|
||||
| result = "..." | <--------- | return last text |
|
||||
| tool: task | ----------> | while tool_use: |
|
||||
| prompt="..." | | call tools |
|
||||
| | summary | append results |
|
||||
| result = "..." | <---------- | return last text |
|
||||
+------------------+ +------------------+
|
||||
|
|
||||
Parent context stays clean.
|
||||
Subagent context is discarded.
|
||||
|
||||
Parent context stays clean. Subagent context is discarded.
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The parent agent gets a `task` tool that triggers subagent spawning.
|
||||
The child gets all base tools except `task` (no recursive spawning).
|
||||
1. The parent gets a `task` tool. The child gets all base tools except `task` (no recursive spawning).
|
||||
|
||||
```python
|
||||
PARENT_TOOLS = CHILD_TOOLS + [
|
||||
@@ -47,65 +34,18 @@ PARENT_TOOLS = CHILD_TOOLS + [
|
||||
"description": "Spawn a subagent with fresh context.",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"prompt": {"type": "string"},
|
||||
"description": {"type": "string"},
|
||||
},
|
||||
"properties": {"prompt": {"type": "string"}},
|
||||
"required": ["prompt"],
|
||||
}},
|
||||
]
|
||||
```
|
||||
|
||||
2. The subagent starts with a fresh messages list containing only
|
||||
the delegated prompt. It shares the same filesystem.
|
||||
2. The subagent starts with `messages=[]` and runs its own loop. Only the final text returns to the parent.
|
||||
|
||||
```python
|
||||
def run_subagent(prompt: str) -> str:
|
||||
sub_messages = [{"role": "user", "content": prompt}]
|
||||
for _ in range(30): # safety limit
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SUBAGENT_SYSTEM,
|
||||
messages=sub_messages,
|
||||
tools=CHILD_TOOLS, max_tokens=8000,
|
||||
)
|
||||
sub_messages.append({
|
||||
"role": "assistant", "content": response.content
|
||||
})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
# execute tools, append results...
|
||||
```
|
||||
|
||||
3. Only the final text returns to the parent. The child's 30+ tool
|
||||
call history is discarded.
|
||||
|
||||
```python
|
||||
return "".join(
|
||||
b.text for b in response.content if hasattr(b, "text")
|
||||
) or "(no summary)"
|
||||
```
|
||||
|
||||
4. The parent receives this summary as a normal tool_result.
|
||||
|
||||
```python
|
||||
if block.name == "task":
|
||||
output = run_subagent(block.input["prompt"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": str(output),
|
||||
})
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The subagent function (from `agents/s04_subagent.py`,
|
||||
lines 110-128):
|
||||
|
||||
```python
|
||||
def run_subagent(prompt: str) -> str:
|
||||
sub_messages = [{"role": "user", "content": prompt}]
|
||||
for _ in range(30):
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SUBAGENT_SYSTEM,
|
||||
messages=sub_messages,
|
||||
@@ -129,6 +69,8 @@ def run_subagent(prompt: str) -> str:
|
||||
) or "(no summary)"
|
||||
```
|
||||
|
||||
The child's entire message history (possibly 30+ tool calls) is discarded. The parent receives a one-paragraph summary as a normal `tool_result`.
|
||||
|
||||
## What Changed From s03
|
||||
|
||||
| Component | Before (s03) | After (s04) |
|
||||
@@ -138,10 +80,6 @@ def run_subagent(prompt: str) -> str:
|
||||
| Subagent | None | `run_subagent()` function |
|
||||
| Return value | N/A | Summary text only |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Fresh-context isolation is a practical way to approximate context isolation in this session. A fresh `messages[]` means the subagent starts without the parent's conversation history. The tradeoff is communication overhead -- results must be compressed back to the parent, losing detail. This is a message-history isolation strategy, not OS process isolation. Limiting subagent depth (no recursive spawning) prevents unbounded resource consumption, and a max iteration count ensures runaway children terminate.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -149,8 +87,6 @@ cd learn-claude-code
|
||||
python agents/s04_subagent.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Use a subtask to find what testing framework this project uses`
|
||||
2. `Delegate: read all .py files and summarize what each one does`
|
||||
3. `Use a task to create a new module, then verify it from here`
|
||||
|
||||
@@ -1,27 +1,14 @@
|
||||
# s05: Skills
|
||||
|
||||
> Two-layer skill injection avoids system prompt bloat by putting skill names in the system prompt (cheap) and full skill bodies in tool_result (on demand).
|
||||
`s01 > s02 > s03 > s04 > [ s05 ] s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"Load on demand, not upfront"* -- inject knowledge via tool_result, not system prompt.
|
||||
|
||||
You want the agent to follow specific workflows for different domains:
|
||||
git conventions, testing patterns, code review checklists. The naive
|
||||
approach is to put everything in the system prompt. But the system prompt
|
||||
has limited effective attention -- too much text and the model starts
|
||||
ignoring parts of it.
|
||||
## Problem
|
||||
|
||||
If you have 10 skills at 2000 tokens each, that is 20,000 tokens of system
|
||||
prompt. The model pays attention to the beginning and end but skims the
|
||||
middle. Worse, most of those skills are irrelevant to any given task. A
|
||||
file editing task does not need the git workflow instructions.
|
||||
You want the agent to follow domain-specific workflows: git conventions, testing patterns, code review checklists. Putting everything in the system prompt wastes tokens on unused skills. 10 skills at 2000 tokens each = 20,000 tokens, most of which are irrelevant to any given task.
|
||||
|
||||
The two-layer approach solves this: Layer 1 puts short skill descriptions
|
||||
in the system prompt (~100 tokens per skill). Layer 2 loads the full skill
|
||||
body into a tool_result only when the model calls `load_skill`. The model
|
||||
learns what skills exist (cheap) and loads them on demand (only when
|
||||
relevant).
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
System prompt (Layer 1 -- always present):
|
||||
@@ -38,11 +25,12 @@ When model calls load_skill("git"):
|
||||
| <skill name="git"> |
|
||||
| Full git workflow instructions... | ~2000 tokens
|
||||
| Step 1: ... |
|
||||
| Step 2: ... |
|
||||
| </skill> |
|
||||
+--------------------------------------+
|
||||
```
|
||||
|
||||
Layer 1: skill *names* in system prompt (cheap). Layer 2: full *body* via tool_result (on demand).
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Skill files live in `.skills/` as Markdown with YAML frontmatter.
|
||||
@@ -53,62 +41,7 @@ When model calls load_skill("git"):
|
||||
test.md # ---\n description: Testing patterns\n ---\n ...
|
||||
```
|
||||
|
||||
2. The SkillLoader parses frontmatter and separates metadata from body.
|
||||
|
||||
```python
|
||||
class SkillLoader:
|
||||
def _parse_frontmatter(self, text: str) -> tuple:
|
||||
match = re.match(
|
||||
r"^---\n(.*?)\n---\n(.*)", text, re.DOTALL
|
||||
)
|
||||
if not match:
|
||||
return {}, text
|
||||
meta = {}
|
||||
for line in match.group(1).strip().splitlines():
|
||||
if ":" in line:
|
||||
key, val = line.split(":", 1)
|
||||
meta[key.strip()] = val.strip()
|
||||
return meta, match.group(2).strip()
|
||||
```
|
||||
|
||||
3. Layer 1: `get_descriptions()` returns short lines for the system prompt.
|
||||
|
||||
```python
|
||||
def get_descriptions(self) -> str:
|
||||
lines = []
|
||||
for name, skill in self.skills.items():
|
||||
desc = skill["meta"].get("description", "No description")
|
||||
lines.append(f" - {name}: {desc}")
|
||||
return "\n".join(lines)
|
||||
|
||||
SYSTEM = f"""You are a coding agent at {WORKDIR}.
|
||||
Skills available:
|
||||
{SKILL_LOADER.get_descriptions()}"""
|
||||
```
|
||||
|
||||
4. Layer 2: `get_content()` returns the full body wrapped in `<skill>` tags.
|
||||
|
||||
```python
|
||||
def get_content(self, name: str) -> str:
|
||||
skill = self.skills.get(name)
|
||||
if not skill:
|
||||
return f"Error: Unknown skill '{name}'."
|
||||
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
|
||||
```
|
||||
|
||||
5. The `load_skill` tool is just another entry in the dispatch map.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
# ...base tools...
|
||||
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
|
||||
}
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The SkillLoader class (from `agents/s05_skill_loading.py`,
|
||||
lines 51-97):
|
||||
2. SkillLoader parses frontmatter, separates metadata from body.
|
||||
|
||||
```python
|
||||
class SkillLoader:
|
||||
@@ -117,9 +50,7 @@ class SkillLoader:
|
||||
for f in sorted(skills_dir.glob("*.md")):
|
||||
text = f.read_text()
|
||||
meta, body = self._parse_frontmatter(text)
|
||||
self.skills[f.stem] = {
|
||||
"meta": meta, "body": body
|
||||
}
|
||||
self.skills[f.stem] = {"meta": meta, "body": body}
|
||||
|
||||
def get_descriptions(self) -> str:
|
||||
lines = []
|
||||
@@ -132,10 +63,24 @@ class SkillLoader:
|
||||
skill = self.skills.get(name)
|
||||
if not skill:
|
||||
return f"Error: Unknown skill '{name}'."
|
||||
return (f"<skill name=\"{name}\">\n"
|
||||
f"{skill['body']}\n</skill>")
|
||||
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
|
||||
```
|
||||
|
||||
3. Layer 1 goes into the system prompt. Layer 2 is just another tool handler.
|
||||
|
||||
```python
|
||||
SYSTEM = f"""You are a coding agent at {WORKDIR}.
|
||||
Skills available:
|
||||
{SKILL_LOADER.get_descriptions()}"""
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
# ...base tools...
|
||||
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
|
||||
}
|
||||
```
|
||||
|
||||
The model learns what skills exist (cheap) and loads them when relevant (expensive).
|
||||
|
||||
## What Changed From s04
|
||||
|
||||
| Component | Before (s04) | After (s05) |
|
||||
@@ -145,10 +90,6 @@ class SkillLoader:
|
||||
| Knowledge | None | .skills/*.md files |
|
||||
| Injection | None | Two-layer (system + result)|
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Two-layer injection solves the attention budget problem. Putting all skill content in the system prompt wastes tokens on unused skills. Layer 1 (compact summaries) costs roughly 120 tokens total. Layer 2 (full content) loads on demand via tool_result. This scales to dozens of skills without degrading model attention quality. The key insight is that the model only needs to know what skills exist (cheap) to decide when to load one (expensive). This is the same lazy-loading principle used in software module systems.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -156,8 +97,6 @@ cd learn-claude-code
|
||||
python agents/s05_skill_loading.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `What skills are available?`
|
||||
2. `Load the agent-builder skill and follow its instructions`
|
||||
3. `I need to do a code review -- load the relevant skill first`
|
||||
|
||||
@@ -1,30 +1,16 @@
|
||||
# s06: Compact
|
||||
# s06: Context Compact
|
||||
|
||||
> A three-layer compression pipeline lets the agent work indefinitely by strategically forgetting old tool results, auto-summarizing when tokens exceed a threshold, and allowing manual compression on demand.
|
||||
`s01 > s02 > s03 > s04 > s05 > [ s06 ] | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"Strategic forgetting"* -- forget old context to enable infinite sessions.
|
||||
|
||||
The context window is finite. After enough tool calls, the messages array
|
||||
exceeds the model's context limit and the API call fails. Even before
|
||||
hitting the hard limit, performance degrades: the model becomes slower,
|
||||
less accurate, and starts ignoring earlier messages.
|
||||
## Problem
|
||||
|
||||
A 200,000 token context window sounds large, but a single `read_file` on
|
||||
a 1000-line source file consumes ~4000 tokens. After reading 30 files and
|
||||
running 20 bash commands, you are at 100,000+ tokens. The agent cannot
|
||||
work on large codebases without some form of compression.
|
||||
The context window is finite. A single `read_file` on a 1000-line file costs ~4000 tokens. After reading 30 files and running 20 bash commands, you hit 100,000+ tokens. The agent cannot work on large codebases without compression.
|
||||
|
||||
The three-layer pipeline addresses this with increasing aggressiveness:
|
||||
Layer 1 (micro-compact) silently replaces old tool results every turn.
|
||||
Layer 2 (auto-compact) triggers a full summarization when tokens exceed
|
||||
a threshold. Layer 3 (manual compact) lets the model trigger compression
|
||||
itself.
|
||||
## Solution
|
||||
|
||||
Teaching simplification: the token estimation here uses a rough
|
||||
characters/4 heuristic. Production systems use proper tokenizer
|
||||
libraries for accurate counts.
|
||||
|
||||
## The Solution
|
||||
Three layers, increasing in aggressiveness:
|
||||
|
||||
```
|
||||
Every turn:
|
||||
@@ -56,8 +42,7 @@ continue [Layer 2: auto_compact]
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Layer 1 -- micro_compact**: Before each LLM call, find all
|
||||
tool_result entries older than the last 3 and replace their content.
|
||||
1. **Layer 1 -- micro_compact**: Before each LLM call, replace old tool results with placeholders.
|
||||
|
||||
```python
|
||||
def micro_compact(messages: list) -> list:
|
||||
@@ -69,25 +54,22 @@ def micro_compact(messages: list) -> list:
|
||||
tool_results.append((i, j, part))
|
||||
if len(tool_results) <= KEEP_RECENT:
|
||||
return messages
|
||||
to_clear = tool_results[:-KEEP_RECENT]
|
||||
for _, _, part in to_clear:
|
||||
for _, _, part in tool_results[:-KEEP_RECENT]:
|
||||
if len(part.get("content", "")) > 100:
|
||||
tool_id = part.get("tool_use_id", "")
|
||||
tool_name = tool_name_map.get(tool_id, "unknown")
|
||||
part["content"] = f"[Previous: used {tool_name}]"
|
||||
return messages
|
||||
```
|
||||
|
||||
2. **Layer 2 -- auto_compact**: When estimated tokens exceed 50,000,
|
||||
save the full transcript and ask the LLM to summarize.
|
||||
2. **Layer 2 -- auto_compact**: When tokens exceed threshold, save full transcript to disk, then ask the LLM to summarize.
|
||||
|
||||
```python
|
||||
def auto_compact(messages: list) -> list:
|
||||
TRANSCRIPT_DIR.mkdir(exist_ok=True)
|
||||
# Save transcript for recovery
|
||||
transcript_path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
|
||||
with open(transcript_path, "w") as f:
|
||||
for msg in messages:
|
||||
f.write(json.dumps(msg, default=str) + "\n")
|
||||
# LLM summarizes
|
||||
response = client.messages.create(
|
||||
model=MODEL,
|
||||
messages=[{"role": "user", "content":
|
||||
@@ -95,62 +77,29 @@ def auto_compact(messages: list) -> list:
|
||||
+ json.dumps(messages, default=str)[:80000]}],
|
||||
max_tokens=2000,
|
||||
)
|
||||
summary = response.content[0].text
|
||||
return [
|
||||
{"role": "user", "content": f"[Compressed]\n\n{summary}"},
|
||||
{"role": "user", "content": f"[Compressed]\n\n{response.content[0].text}"},
|
||||
{"role": "assistant", "content": "Understood. Continuing."},
|
||||
]
|
||||
```
|
||||
|
||||
3. **Layer 3 -- manual compact**: The `compact` tool triggers the same
|
||||
summarization on demand.
|
||||
3. **Layer 3 -- manual compact**: The `compact` tool triggers the same summarization on demand.
|
||||
|
||||
```python
|
||||
if manual_compact:
|
||||
messages[:] = auto_compact(messages)
|
||||
```
|
||||
|
||||
4. The agent loop integrates all three layers.
|
||||
4. The loop integrates all three:
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
micro_compact(messages)
|
||||
micro_compact(messages) # Layer 1
|
||||
if estimate_tokens(messages) > THRESHOLD:
|
||||
messages[:] = auto_compact(messages)
|
||||
messages[:] = auto_compact(messages) # Layer 2
|
||||
response = client.messages.create(...)
|
||||
# ... tool execution ...
|
||||
if manual_compact:
|
||||
messages[:] = auto_compact(messages)
|
||||
messages[:] = auto_compact(messages) # Layer 3
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The three-layer pipeline (from `agents/s06_context_compact.py`,
|
||||
lines 67-93 and 189-223):
|
||||
|
||||
```python
|
||||
THRESHOLD = 50000
|
||||
KEEP_RECENT = 3
|
||||
|
||||
def micro_compact(messages):
|
||||
# Replace old tool results with placeholders
|
||||
...
|
||||
|
||||
def auto_compact(messages):
|
||||
# Save transcript, LLM summarize, replace messages
|
||||
...
|
||||
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
micro_compact(messages) # Layer 1
|
||||
if estimate_tokens(messages) > THRESHOLD:
|
||||
messages[:] = auto_compact(messages) # Layer 2
|
||||
response = client.messages.create(...)
|
||||
# ...
|
||||
if manual_compact:
|
||||
messages[:] = auto_compact(messages) # Layer 3
|
||||
```
|
||||
Transcripts preserve full history on disk. Nothing is truly lost -- just moved out of active context.
|
||||
|
||||
## What Changed From s05
|
||||
|
||||
@@ -160,13 +109,8 @@ def agent_loop(messages):
|
||||
| Context mgmt | None | Three-layer compression |
|
||||
| Micro-compact | None | Old results -> placeholders|
|
||||
| Auto-compact | None | Token threshold trigger |
|
||||
| Manual compact | None | `compact` tool |
|
||||
| Transcripts | None | Saved to .transcripts/ |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Context windows are finite, but agent sessions can be infinite. Three compression layers solve this at different granularities: micro-compact (replace old tool outputs), auto-compact (LLM summarizes when approaching limit), and manual compact (user-triggered). The key insight is that forgetting is a feature, not a bug -- it enables unbounded sessions. Transcripts preserve the full history on disk so nothing is truly lost, just moved out of the active context. The layered approach lets each layer operate independently at its own granularity, from silent per-turn cleanup to full conversation reset.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -174,9 +118,6 @@ cd learn-claude-code
|
||||
python agents/s06_context_compact.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Read every Python file in the agents/ directory one by one`
|
||||
(watch micro-compact replace old results)
|
||||
1. `Read every Python file in the agents/ directory one by one` (watch micro-compact replace old results)
|
||||
2. `Keep reading files until compression triggers automatically`
|
||||
3. `Use the compact tool to manually compress the conversation`
|
||||
|
||||
@@ -1,29 +1,14 @@
|
||||
# s07: Tasks
|
||||
|
||||
> Tasks are persisted as JSON files with a dependency graph, so state survives context compression and can be shared across agents.
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | [ s07 ] s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
> *"State survives /compact"* -- file-based state outlives context compression.
|
||||
|
||||
## Problem
|
||||
|
||||
In-memory state (for example the TodoManager from s03) is fragile under compression (s06). Once earlier turns are compacted into summaries, in-memory todo state is gone.
|
||||
In-memory state (TodoManager from s03) dies when context compresses (s06). After auto_compact replaces messages with a summary, the todo list is gone. The agent can only reconstruct from summary text -- lossy and error-prone.
|
||||
|
||||
s06 -> s07 is the key transition:
|
||||
|
||||
1. Todo list state in memory is conversational and lossy.
|
||||
2. Task board state on disk is durable and recoverable.
|
||||
|
||||
A second issue is visibility: in-memory structures are process-local, so teammates cannot reliably share that state.
|
||||
|
||||
## When to Use Task vs Todo
|
||||
|
||||
From s07 onward, Task is the default. Todo remains for short linear checklists.
|
||||
|
||||
## Quick Decision Matrix
|
||||
|
||||
| Situation | Prefer | Why |
|
||||
|---|---|---|
|
||||
| Short, single-session checklist | Todo | Lowest ceremony, fastest capture |
|
||||
| Cross-session work, dependencies, or teammates | Task | Durable state, dependency graph, shared visibility |
|
||||
| Unsure which one to use | Task | Easier to simplify later than migrate mid-run |
|
||||
File-based tasks solve this: write state to disk, and it survives compression, process restarts, and eventually multi-agent sharing (s09+).
|
||||
|
||||
## Solution
|
||||
|
||||
@@ -45,29 +30,28 @@ Dependency resolution:
|
||||
|
||||
## How It Works
|
||||
|
||||
1. TaskManager provides CRUD with one JSON file per task.
|
||||
1. TaskManager: one JSON file per task, CRUD with dependency graph.
|
||||
|
||||
```python
|
||||
class TaskManager:
|
||||
def create(self, subject: str, description: str = "") -> str:
|
||||
task = {
|
||||
"id": self._next_id,
|
||||
"subject": subject,
|
||||
"description": description,
|
||||
"status": "pending",
|
||||
"blockedBy": [],
|
||||
"blocks": [],
|
||||
"owner": "",
|
||||
}
|
||||
def __init__(self, tasks_dir: Path):
|
||||
self.dir = tasks_dir
|
||||
self.dir.mkdir(exist_ok=True)
|
||||
self._next_id = self._max_id() + 1
|
||||
|
||||
def create(self, subject, description=""):
|
||||
task = {"id": self._next_id, "subject": subject,
|
||||
"status": "pending", "blockedBy": [],
|
||||
"blocks": [], "owner": ""}
|
||||
self._save(task)
|
||||
self._next_id += 1
|
||||
return json.dumps(task, indent=2)
|
||||
```
|
||||
|
||||
2. Completing a task clears that dependency from other tasks.
|
||||
2. Completing a task clears its ID from every other task's `blockedBy` list.
|
||||
|
||||
```python
|
||||
def _clear_dependency(self, completed_id: int):
|
||||
def _clear_dependency(self, completed_id):
|
||||
for f in self.dir.glob("task_*.json"):
|
||||
task = json.loads(f.read_text())
|
||||
if completed_id in task.get("blockedBy", []):
|
||||
@@ -85,63 +69,22 @@ def update(self, task_id, status=None,
|
||||
task["status"] = status
|
||||
if status == "completed":
|
||||
self._clear_dependency(task_id)
|
||||
if add_blocks:
|
||||
task["blocks"] = list(set(task["blocks"] + add_blocks))
|
||||
for blocked_id in add_blocks:
|
||||
blocked = self._load(blocked_id)
|
||||
if task_id not in blocked["blockedBy"]:
|
||||
blocked["blockedBy"].append(task_id)
|
||||
self._save(blocked)
|
||||
self._save(task)
|
||||
```
|
||||
|
||||
4. Task tools are added to the dispatch map.
|
||||
4. Four task tools go into the dispatch map.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
# ...base tools...
|
||||
"task_create": lambda **kw: TASKS.create(kw["subject"]),
|
||||
"task_update": lambda **kw: TASKS.update(kw["task_id"],
|
||||
kw.get("status")),
|
||||
"task_update": lambda **kw: TASKS.update(kw["task_id"], kw.get("status")),
|
||||
"task_list": lambda **kw: TASKS.list_all(),
|
||||
"task_get": lambda **kw: TASKS.get(kw["task_id"]),
|
||||
}
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
TaskManager with dependency graph (from `agents/s07_task_system.py`, lines 46-123):
|
||||
|
||||
```python
|
||||
class TaskManager:
|
||||
def __init__(self, tasks_dir: Path):
|
||||
self.dir = tasks_dir
|
||||
self.dir.mkdir(exist_ok=True)
|
||||
self._next_id = self._max_id() + 1
|
||||
|
||||
def _load(self, task_id: int) -> dict:
|
||||
path = self.dir / f"task_{task_id}.json"
|
||||
return json.loads(path.read_text())
|
||||
|
||||
def _save(self, task: dict):
|
||||
path = self.dir / f"task_{task['id']}.json"
|
||||
path.write_text(json.dumps(task, indent=2))
|
||||
|
||||
def create(self, subject, description=""):
|
||||
task = {"id": self._next_id, "subject": subject,
|
||||
"status": "pending", "blockedBy": [],
|
||||
"blocks": [], "owner": ""}
|
||||
self._save(task)
|
||||
self._next_id += 1
|
||||
return json.dumps(task, indent=2)
|
||||
|
||||
def _clear_dependency(self, completed_id):
|
||||
for f in self.dir.glob("task_*.json"):
|
||||
task = json.loads(f.read_text())
|
||||
if completed_id in task.get("blockedBy", []):
|
||||
task["blockedBy"].remove(completed_id)
|
||||
self._save(task)
|
||||
```
|
||||
From s07 onward, Task is the default for multi-step work. Todo remains for quick checklists.
|
||||
|
||||
## What Changed From s06
|
||||
|
||||
@@ -152,14 +95,6 @@ class TaskManager:
|
||||
| Dependencies | None | `blockedBy + blocks` graph |
|
||||
| Persistence | Lost on compact | Survives compression |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
File-based state survives compaction and process restarts. The dependency graph preserves execution order even when conversation details are forgotten. This turns transient chat context into durable work state.
|
||||
|
||||
Durability still needs a write discipline: reload task JSON before each write, validate expected `status/blockedBy`, then persist atomically. Otherwise concurrent writers can overwrite each other.
|
||||
|
||||
Course-level implication: s07+ defaults to Task because it better matches long-running and collaborative engineering workflows.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -167,8 +102,6 @@ cd learn-claude-code
|
||||
python agents/s07_task_system.py
|
||||
```
|
||||
|
||||
Suggested prompts:
|
||||
|
||||
1. `Create 3 tasks: "Setup project", "Write code", "Write tests". Make them depend on each other in order.`
|
||||
2. `List all tasks and show the dependency graph`
|
||||
3. `Complete task 1 and then list tasks to see task 2 unblocked`
|
||||
|
||||
@@ -1,30 +1,19 @@
|
||||
# s08: Background Tasks
|
||||
|
||||
> A BackgroundManager runs commands in separate threads and drains a notification queue before each LLM call, so the agent never blocks on long-running operations.
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > [ s08 ] s09 > s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"Fire and forget"* -- non-blocking threads + notification queue.
|
||||
|
||||
Some commands take minutes: `npm install`, `pytest`, `docker build`. With
|
||||
a blocking agent loop, the model sits idle waiting for the subprocess to
|
||||
finish. It cannot do anything else. If the user asked "install dependencies
|
||||
and while that runs, create the config file," the agent would install
|
||||
first, _then_ create the config -- sequentially, not in parallel.
|
||||
## Problem
|
||||
|
||||
The agent needs concurrency. Not full multi-threading of the agent loop
|
||||
itself, but the ability to fire off a long command and continue working
|
||||
while it runs. When the command finishes, its result should appear
|
||||
naturally in the conversation.
|
||||
Some commands take minutes: `npm install`, `pytest`, `docker build`. With a blocking loop, the model sits idle waiting. If the user asks "install dependencies and while that runs, create the config file," the agent does them sequentially, not in parallel.
|
||||
|
||||
The solution is a BackgroundManager that runs commands in daemon threads
|
||||
and collects results in a notification queue. Before each LLM call, the
|
||||
queue is drained and results are injected into the messages.
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
Main thread Background thread
|
||||
+-----------------+ +-----------------+
|
||||
| agent loop | | task executes |
|
||||
| agent loop | | subprocess runs |
|
||||
| ... | | ... |
|
||||
| [LLM call] <---+------- | enqueue(result) |
|
||||
| ^drain queue | +-----------------+
|
||||
@@ -36,16 +25,12 @@ Agent --[spawn A]--[spawn B]--[other work]----
|
||||
v v
|
||||
[A runs] [B runs] (parallel)
|
||||
| |
|
||||
+-- notification queue --+
|
||||
|
|
||||
[results injected before
|
||||
next LLM call]
|
||||
+-- results injected before next LLM call --+
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The BackgroundManager tracks tasks and maintains a thread-safe
|
||||
notification queue.
|
||||
1. BackgroundManager tracks tasks with a thread-safe notification queue.
|
||||
|
||||
```python
|
||||
class BackgroundManager:
|
||||
@@ -55,110 +40,51 @@ class BackgroundManager:
|
||||
self._lock = threading.Lock()
|
||||
```
|
||||
|
||||
2. `run()` starts a daemon thread and returns a task_id immediately.
|
||||
2. `run()` starts a daemon thread and returns immediately.
|
||||
|
||||
```python
|
||||
def run(self, command: str) -> str:
|
||||
task_id = str(uuid.uuid4())[:8]
|
||||
self.tasks[task_id] = {
|
||||
"status": "running",
|
||||
"result": None,
|
||||
"command": command,
|
||||
}
|
||||
self.tasks[task_id] = {"status": "running", "command": command}
|
||||
thread = threading.Thread(
|
||||
target=self._execute,
|
||||
args=(task_id, command),
|
||||
daemon=True,
|
||||
)
|
||||
target=self._execute, args=(task_id, command), daemon=True)
|
||||
thread.start()
|
||||
return f"Background task {task_id} started"
|
||||
```
|
||||
|
||||
3. The thread target `_execute` runs the subprocess and pushes
|
||||
results to the notification queue.
|
||||
3. When the subprocess finishes, its result goes into the notification queue.
|
||||
|
||||
```python
|
||||
def _execute(self, task_id: str, command: str):
|
||||
def _execute(self, task_id, command):
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=300)
|
||||
output = (r.stdout + r.stderr).strip()[:50000]
|
||||
status = "completed"
|
||||
except subprocess.TimeoutExpired:
|
||||
output = "Error: Timeout (300s)"
|
||||
status = "timeout"
|
||||
self.tasks[task_id]["status"] = status
|
||||
self.tasks[task_id]["result"] = output
|
||||
with self._lock:
|
||||
self._notification_queue.append({
|
||||
"task_id": task_id,
|
||||
"status": status,
|
||||
"result": output[:500],
|
||||
})
|
||||
"task_id": task_id, "result": output[:500]})
|
||||
```
|
||||
|
||||
4. `drain_notifications()` returns and clears pending results.
|
||||
|
||||
```python
|
||||
def drain_notifications(self) -> list:
|
||||
with self._lock:
|
||||
notifs = list(self._notification_queue)
|
||||
self._notification_queue.clear()
|
||||
return notifs
|
||||
```
|
||||
|
||||
5. The agent loop drains notifications before each LLM call.
|
||||
4. The agent loop drains notifications before each LLM call.
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
notifs = BG.drain_notifications()
|
||||
if notifs and messages:
|
||||
if notifs:
|
||||
notif_text = "\n".join(
|
||||
f"[bg:{n['task_id']}] {n['status']}: "
|
||||
f"{n['result']}" for n in notifs
|
||||
)
|
||||
f"[bg:{n['task_id']}] {n['result']}" for n in notifs)
|
||||
messages.append({"role": "user",
|
||||
"content": f"<background-results>"
|
||||
f"\n{notif_text}\n"
|
||||
"content": f"<background-results>\n{notif_text}\n"
|
||||
f"</background-results>"})
|
||||
messages.append({"role": "assistant",
|
||||
"content": "Noted background results."})
|
||||
response = client.messages.create(...)
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The BackgroundManager (from `agents/s08_background_tasks.py`, lines 49-107):
|
||||
|
||||
```python
|
||||
class BackgroundManager:
|
||||
def __init__(self):
|
||||
self.tasks = {}
|
||||
self._notification_queue = []
|
||||
self._lock = threading.Lock()
|
||||
|
||||
def run(self, command: str) -> str:
|
||||
task_id = str(uuid.uuid4())[:8]
|
||||
self.tasks[task_id] = {"status": "running",
|
||||
"result": None,
|
||||
"command": command}
|
||||
thread = threading.Thread(
|
||||
target=self._execute,
|
||||
args=(task_id, command), daemon=True)
|
||||
thread.start()
|
||||
return f"Background task {task_id} started"
|
||||
|
||||
def _execute(self, task_id, command):
|
||||
# run subprocess, push to queue
|
||||
...
|
||||
|
||||
def drain_notifications(self) -> list:
|
||||
with self._lock:
|
||||
notifs = list(self._notification_queue)
|
||||
self._notification_queue.clear()
|
||||
return notifs
|
||||
```
|
||||
The loop stays single-threaded. Only subprocess I/O is parallelized.
|
||||
|
||||
## What Changed From s07
|
||||
|
||||
@@ -169,10 +95,6 @@ class BackgroundManager:
|
||||
| Notification | None | Queue drained per loop |
|
||||
| Concurrency | None | Daemon threads |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
The agent loop is inherently single-threaded (one LLM call at a time). Background threads break this constraint for I/O-bound work (tests, builds, installs). The notification queue pattern ("drain before next LLM call") ensures results arrive at natural conversation breakpoints rather than interrupting the model's reasoning mid-thought. This is a minimal concurrency model: the agent loop stays single-threaded and deterministic, while only the I/O-bound subprocess execution is parallelized.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -180,8 +102,6 @@ cd learn-claude-code
|
||||
python agents/s08_background_tasks.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Run "sleep 5 && echo done" in the background, then create a file while it runs`
|
||||
2. `Start 3 background tasks: "sleep 2", "sleep 4", "sleep 6". Check their status.`
|
||||
3. `Run pytest in the background and keep working on other things`
|
||||
|
||||
@@ -1,31 +1,16 @@
|
||||
# s09: Agent Teams
|
||||
|
||||
> Persistent teammates with JSONL inboxes are one teaching protocol for turning isolated agents into a communicating team -- spawn, message, broadcast, and drain.
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > [ s09 ] s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"Append to send, drain to read"* -- async mailboxes for persistent teammates.
|
||||
|
||||
Subagents (s04) are disposable: spawn, work, return summary, die. They
|
||||
have no identity, no memory between invocations, and no way to receive
|
||||
follow-up instructions. Background tasks (s08) run shell commands but
|
||||
cannot make LLM-guided decisions or communicate findings.
|
||||
## Problem
|
||||
|
||||
For real teamwork you need three things: (1) persistent agents that
|
||||
survive beyond a single prompt, (2) identity and lifecycle management,
|
||||
and (3) a communication channel between agents. Without messaging, even
|
||||
persistent teammates are deaf and mute -- they can work in parallel but
|
||||
never coordinate.
|
||||
Subagents (s04) are disposable: spawn, work, return summary, die. No identity, no memory between invocations. Background tasks (s08) run shell commands but can't make LLM-guided decisions.
|
||||
|
||||
The solution combines a TeammateManager for spawning persistent named
|
||||
agents with a MessageBus using JSONL inbox files. Each teammate runs
|
||||
its own agent loop in a thread, checks its inbox before every LLM call,
|
||||
and can send messages to any other teammate or the lead.
|
||||
Real teamwork needs: (1) persistent agents that outlive a single prompt, (2) identity and lifecycle management, (3) a communication channel between agents.
|
||||
|
||||
Note on the s06-to-s07 bridge: TodoManager items from s03 die with
|
||||
compression (s06). File-based tasks (s07) survive compression because
|
||||
they live on disk. Teams build on this same principle -- config.json and
|
||||
inbox files persist outside the context window.
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
Teammate lifecycle:
|
||||
@@ -39,28 +24,18 @@ Communication:
|
||||
bob.jsonl
|
||||
lead.jsonl
|
||||
|
||||
+--------+ send("alice","bob","...") +--------+
|
||||
| alice | -----------------------------> | bob |
|
||||
| loop | bob.jsonl << {json_line} | loop |
|
||||
+--------+ +--------+
|
||||
^ |
|
||||
| BUS.read_inbox("alice") |
|
||||
+---- alice.jsonl -> read + drain ---------+
|
||||
|
||||
5 message types:
|
||||
+-------------------------+------------------------------+
|
||||
| message | Normal text between agents |
|
||||
| broadcast | Sent to all teammates |
|
||||
| shutdown_request | Request graceful shutdown |
|
||||
| shutdown_response | Approve/reject shutdown |
|
||||
| plan_approval_response | Approve/reject plan |
|
||||
+-------------------------+------------------------------+
|
||||
+--------+ send("alice","bob","...") +--------+
|
||||
| alice | -----------------------------> | bob |
|
||||
| loop | bob.jsonl << {json_line} | loop |
|
||||
+--------+ +--------+
|
||||
^ |
|
||||
| BUS.read_inbox("alice") |
|
||||
+---- alice.jsonl -> read + drain ---------+
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The TeammateManager maintains config.json with the team roster.
|
||||
Each member has a name, role, and status.
|
||||
1. TeammateManager maintains config.json with the team roster.
|
||||
|
||||
```python
|
||||
class TeammateManager:
|
||||
@@ -73,60 +48,43 @@ class TeammateManager:
|
||||
```
|
||||
|
||||
2. `spawn()` creates a teammate and starts its agent loop in a thread.
|
||||
Re-spawning an idle teammate reactivates it.
|
||||
|
||||
```python
|
||||
def spawn(self, name: str, role: str, prompt: str) -> str:
|
||||
member = self._find_member(name)
|
||||
if member:
|
||||
if member["status"] not in ("idle", "shutdown"):
|
||||
return f"Error: '{name}' is currently {member['status']}"
|
||||
member["status"] = "working"
|
||||
else:
|
||||
member = {"name": name, "role": role, "status": "working"}
|
||||
self.config["members"].append(member)
|
||||
member = {"name": name, "role": role, "status": "working"}
|
||||
self.config["members"].append(member)
|
||||
self._save_config()
|
||||
thread = threading.Thread(
|
||||
target=self._teammate_loop,
|
||||
args=(name, role, prompt), daemon=True)
|
||||
self.threads[name] = thread
|
||||
thread.start()
|
||||
return f"Spawned teammate '{name}' (role: {role})"
|
||||
```
|
||||
|
||||
3. The MessageBus handles JSONL inbox files. `send()` appends a JSON
|
||||
line; `read_inbox()` reads all lines and drains the file.
|
||||
3. MessageBus: append-only JSONL inboxes. `send()` appends a JSON line; `read_inbox()` reads all and drains.
|
||||
|
||||
```python
|
||||
class MessageBus:
|
||||
def send(self, sender, to, content,
|
||||
msg_type="message", extra=None):
|
||||
def send(self, sender, to, content, msg_type="message", extra=None):
|
||||
msg = {"type": msg_type, "from": sender,
|
||||
"content": content,
|
||||
"timestamp": time.time()}
|
||||
"content": content, "timestamp": time.time()}
|
||||
if extra:
|
||||
msg.update(extra)
|
||||
with open(self.dir / f"{to}.jsonl", "a") as f:
|
||||
f.write(json.dumps(msg) + "\n")
|
||||
return f"Sent {msg_type} to {to}"
|
||||
|
||||
def read_inbox(self, name):
|
||||
path = self.dir / f"{name}.jsonl"
|
||||
if not path.exists():
|
||||
return "[]"
|
||||
msgs = [json.loads(l)
|
||||
for l in path.read_text().strip().splitlines()
|
||||
if l]
|
||||
if not path.exists(): return "[]"
|
||||
msgs = [json.loads(l) for l in path.read_text().strip().splitlines() if l]
|
||||
path.write_text("") # drain
|
||||
return json.dumps(msgs, indent=2)
|
||||
```
|
||||
|
||||
4. Each teammate checks its inbox before every LLM call and injects
|
||||
received messages into the conversation context.
|
||||
4. Each teammate checks its inbox before every LLM call, injecting received messages into context.
|
||||
|
||||
```python
|
||||
def _teammate_loop(self, name, role, prompt):
|
||||
sys_prompt = f"You are '{name}', role: {role}, at {WORKDIR}."
|
||||
messages = [{"role": "user", "content": prompt}]
|
||||
for _ in range(50):
|
||||
inbox = BUS.read_inbox(name)
|
||||
@@ -135,66 +93,11 @@ def _teammate_loop(self, name, role, prompt):
|
||||
"content": f"<inbox>{inbox}</inbox>"})
|
||||
messages.append({"role": "assistant",
|
||||
"content": "Noted inbox messages."})
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=sys_prompt,
|
||||
messages=messages, tools=TOOLS)
|
||||
messages.append({"role": "assistant",
|
||||
"content": response.content})
|
||||
response = client.messages.create(...)
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
# execute tools, append results...
|
||||
self._find_member(name)["status"] = "idle"
|
||||
self._save_config()
|
||||
```
|
||||
|
||||
5. `broadcast()` sends the same message to all teammates except the
|
||||
sender.
|
||||
|
||||
```python
|
||||
def broadcast(self, sender, content, teammates):
|
||||
count = 0
|
||||
for name in teammates:
|
||||
if name != sender:
|
||||
self.send(sender, name, content, "broadcast")
|
||||
count += 1
|
||||
return f"Broadcast to {count} teammates"
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The TeammateManager + MessageBus core (from `agents/s09_agent_teams.py`):
|
||||
|
||||
```python
|
||||
class TeammateManager:
|
||||
def spawn(self, name, role, prompt):
|
||||
member = self._find_member(name) or {
|
||||
"name": name, "role": role, "status": "working"
|
||||
}
|
||||
member["status"] = "working"
|
||||
self._save_config()
|
||||
thread = threading.Thread(
|
||||
target=self._teammate_loop,
|
||||
args=(name, role, prompt), daemon=True)
|
||||
thread.start()
|
||||
return f"Spawned '{name}'"
|
||||
|
||||
class MessageBus:
|
||||
def send(self, sender, to, content,
|
||||
msg_type="message", extra=None):
|
||||
msg = {"type": msg_type, "from": sender,
|
||||
"content": content, "timestamp": time.time()}
|
||||
if extra: msg.update(extra)
|
||||
with open(self.dir / f"{to}.jsonl", "a") as f:
|
||||
f.write(json.dumps(msg) + "\n")
|
||||
|
||||
def read_inbox(self, name):
|
||||
path = self.dir / f"{name}.jsonl"
|
||||
if not path.exists(): return "[]"
|
||||
msgs = [json.loads(l)
|
||||
for l in path.read_text().strip().splitlines()
|
||||
if l]
|
||||
path.write_text("")
|
||||
return json.dumps(msgs, indent=2)
|
||||
```
|
||||
|
||||
## What Changed From s08
|
||||
@@ -206,16 +109,7 @@ class MessageBus:
|
||||
| Persistence | None | config.json + JSONL inboxes|
|
||||
| Threads | Background cmds | Full agent loops per thread|
|
||||
| Lifecycle | Fire-and-forget | idle -> working -> idle |
|
||||
| Communication | None | 5 message types + broadcast|
|
||||
|
||||
Teaching simplification: this implementation does not use lock files
|
||||
for inbox access. In production, concurrent append from multiple writers
|
||||
would need file locking or atomic rename. The single-writer-per-inbox
|
||||
pattern used here is safe for the teaching scenario.
|
||||
|
||||
## Design Rationale
|
||||
|
||||
File-based mailboxes (append-only JSONL) are easy to inspect and reason about in a teaching codebase. The "drain on read" pattern (read all, truncate) gives batch delivery with very little machinery. The tradeoff is latency -- messages are only seen at the next poll -- but for LLM-driven agents where each turn takes seconds, polling latency is acceptable for this course.
|
||||
| Communication | None | message + broadcast |
|
||||
|
||||
## Try It
|
||||
|
||||
@@ -224,8 +118,6 @@ cd learn-claude-code
|
||||
python agents/s09_agent_teams.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Spawn alice (coder) and bob (tester). Have alice send bob a message.`
|
||||
2. `Broadcast "status update: phase 1 complete" to all teammates`
|
||||
3. `Check the lead inbox for any messages`
|
||||
|
||||
@@ -1,27 +1,20 @@
|
||||
# s10: Team Protocols
|
||||
|
||||
> The same request_id handshake pattern powers both shutdown and plan approval -- one FSM, two applications.
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > [ s10 ] s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"Same request_id, two protocols"* -- one FSM pattern powers shutdown + plan approval.
|
||||
|
||||
In s09, teammates work and communicate but there is no structured
|
||||
coordination. Two problems arise:
|
||||
## Problem
|
||||
|
||||
**Shutdown**: How do you stop a teammate cleanly? Killing the thread
|
||||
leaves files partially written and config.json in a wrong state.
|
||||
Graceful shutdown requires a handshake: the lead requests, the teammate
|
||||
decides whether to approve (finish and exit) or reject (keep working).
|
||||
In s09, teammates work and communicate but lack structured coordination:
|
||||
|
||||
**Plan approval**: How do you gate execution? When the lead says
|
||||
"refactor the auth module," the teammate starts immediately. For
|
||||
high-risk changes, the lead should review the plan before execution
|
||||
begins. A junior proposes, a senior approves.
|
||||
**Shutdown**: Killing a thread leaves files half-written and config.json stale. You need a handshake: the lead requests, the teammate approves (finish and exit) or rejects (keep working).
|
||||
|
||||
Both problems share the same structure: one side sends a request with a
|
||||
unique ID, the other side responds referencing that ID. A finite state
|
||||
machine tracks each request through pending -> approved | rejected.
|
||||
**Plan approval**: When the lead says "refactor the auth module," the teammate starts immediately. For high-risk changes, the lead should review the plan first.
|
||||
|
||||
## The Solution
|
||||
Both share the same structure: one side sends a request with a unique ID, the other responds referencing that ID.
|
||||
|
||||
## Solution
|
||||
|
||||
```
|
||||
Shutdown Protocol Plan Approval Protocol
|
||||
@@ -35,12 +28,8 @@ Lead Teammate Teammate Lead
|
||||
|<--shutdown_resp-| |<--plan_resp-----|
|
||||
| {req_id:"abc", | | {req_id:"xyz", |
|
||||
| approve:true} | | approve:true} |
|
||||
| | | |
|
||||
v v v v
|
||||
tracker["abc"] exits proceeds tracker["xyz"]
|
||||
= approved = approved
|
||||
|
||||
Shared FSM (identical for both protocols):
|
||||
Shared FSM:
|
||||
[pending] --approve--> [approved]
|
||||
[pending] --reject---> [rejected]
|
||||
|
||||
@@ -51,128 +40,46 @@ Trackers:
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The lead initiates shutdown by generating a request_id and sending
|
||||
a shutdown_request through the inbox.
|
||||
1. The lead initiates shutdown by generating a request_id and sending through the inbox.
|
||||
|
||||
```python
|
||||
shutdown_requests = {}
|
||||
|
||||
def handle_shutdown_request(teammate: str) -> str:
|
||||
req_id = str(uuid.uuid4())[:8]
|
||||
shutdown_requests[req_id] = {
|
||||
"target": teammate, "status": "pending",
|
||||
}
|
||||
shutdown_requests[req_id] = {"target": teammate, "status": "pending"}
|
||||
BUS.send("lead", teammate, "Please shut down gracefully.",
|
||||
"shutdown_request", {"request_id": req_id})
|
||||
return f"Shutdown request {req_id} sent (status: pending)"
|
||||
```
|
||||
|
||||
2. The teammate receives the request in its inbox and calls the
|
||||
`shutdown_response` tool to approve or reject.
|
||||
2. The teammate receives the request and responds with approve/reject.
|
||||
|
||||
```python
|
||||
if tool_name == "shutdown_response":
|
||||
req_id = args["request_id"]
|
||||
approve = args["approve"]
|
||||
if req_id in shutdown_requests:
|
||||
shutdown_requests[req_id]["status"] = \
|
||||
"approved" if approve else "rejected"
|
||||
shutdown_requests[req_id]["status"] = "approved" if approve else "rejected"
|
||||
BUS.send(sender, "lead", args.get("reason", ""),
|
||||
"shutdown_response",
|
||||
{"request_id": req_id, "approve": approve})
|
||||
return f"Shutdown {'approved' if approve else 'rejected'}"
|
||||
```
|
||||
|
||||
3. The teammate loop checks for approved shutdown and exits.
|
||||
|
||||
```python
|
||||
if (block.name == "shutdown_response"
|
||||
and block.input.get("approve")):
|
||||
should_exit = True
|
||||
# ...
|
||||
member["status"] = "shutdown" if should_exit else "idle"
|
||||
```
|
||||
|
||||
4. Plan approval follows the identical pattern. The teammate submits
|
||||
a plan, generating a request_id.
|
||||
3. Plan approval follows the identical pattern. The teammate submits a plan (generating a request_id), the lead reviews (referencing the same request_id).
|
||||
|
||||
```python
|
||||
plan_requests = {}
|
||||
|
||||
if tool_name == "plan_approval":
|
||||
plan_text = args.get("plan", "")
|
||||
req_id = str(uuid.uuid4())[:8]
|
||||
plan_requests[req_id] = {
|
||||
"from": sender, "plan": plan_text,
|
||||
"status": "pending",
|
||||
}
|
||||
BUS.send(sender, "lead", plan_text,
|
||||
"plan_approval_request",
|
||||
{"request_id": req_id, "plan": plan_text})
|
||||
return f"Plan submitted (request_id={req_id})"
|
||||
```
|
||||
|
||||
5. The lead reviews and responds with the same request_id.
|
||||
|
||||
```python
|
||||
def handle_plan_review(request_id, approve, feedback=""):
|
||||
req = plan_requests.get(request_id)
|
||||
if not req:
|
||||
return f"Error: Unknown request_id '{request_id}'"
|
||||
req["status"] = "approved" if approve else "rejected"
|
||||
BUS.send("lead", req["from"], feedback,
|
||||
"plan_approval_response",
|
||||
{"request_id": request_id,
|
||||
"approve": approve,
|
||||
"feedback": feedback})
|
||||
return f"Plan {req['status']} for '{req['from']}'"
|
||||
```
|
||||
|
||||
6. Both protocols use the same `plan_approval` tool name with two
|
||||
modes: teammates submit (no request_id), the lead reviews (with
|
||||
request_id).
|
||||
|
||||
```python
|
||||
# Lead tool dispatch:
|
||||
"plan_approval": lambda **kw: handle_plan_review(
|
||||
kw["request_id"], kw["approve"],
|
||||
kw.get("feedback", "")),
|
||||
# Teammate: submit mode (generate request_id)
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The dual protocol handlers (from `agents/s10_team_protocols.py`):
|
||||
|
||||
```python
|
||||
shutdown_requests = {}
|
||||
plan_requests = {}
|
||||
|
||||
# -- Shutdown --
|
||||
def handle_shutdown_request(teammate):
|
||||
req_id = str(uuid.uuid4())[:8]
|
||||
shutdown_requests[req_id] = {
|
||||
"target": teammate, "status": "pending"
|
||||
}
|
||||
BUS.send("lead", teammate,
|
||||
"Please shut down gracefully.",
|
||||
"shutdown_request",
|
||||
{"request_id": req_id})
|
||||
|
||||
# -- Plan Approval --
|
||||
def handle_plan_review(request_id, approve, feedback=""):
|
||||
req = plan_requests[request_id]
|
||||
req["status"] = "approved" if approve else "rejected"
|
||||
BUS.send("lead", req["from"], feedback,
|
||||
"plan_approval_response",
|
||||
{"request_id": request_id,
|
||||
"approve": approve})
|
||||
|
||||
# Both use the same FSM:
|
||||
# pending -> approved | rejected
|
||||
# Both correlate by request_id across async inboxes
|
||||
{"request_id": request_id, "approve": approve})
|
||||
```
|
||||
|
||||
One FSM, two applications. The same `pending -> approved | rejected` state machine handles any request-response protocol.
|
||||
|
||||
## What Changed From s09
|
||||
|
||||
| Component | Before (s09) | After (s10) |
|
||||
@@ -180,14 +87,9 @@ def handle_plan_review(request_id, approve, feedback=""):
|
||||
| Tools | 9 | 12 (+shutdown_req/resp +plan)|
|
||||
| Shutdown | Natural exit only| Request-response handshake |
|
||||
| Plan gating | None | Submit/review with approval |
|
||||
| Request tracking| None | Two tracker dicts |
|
||||
| Correlation | None | request_id per request |
|
||||
| FSM | None | pending -> approved/rejected |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
The request_id correlation pattern turns any async interaction into a trackable finite state machine. The same 3-state machine (pending -> approved/rejected) applies to shutdown, plan approval, or any future protocol. This is why one pattern handles multiple protocols -- the FSM does not care what it is approving. The request_id provides correlation across async inboxes where messages may arrive out of order, making the pattern robust to timing variations between agents.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -195,8 +97,6 @@ cd learn-claude-code
|
||||
python agents/s10_team_protocols.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Spawn alice as a coder. Then request her shutdown.`
|
||||
2. `List teammates to see alice's status after shutdown approval`
|
||||
3. `Spawn bob with a risky refactoring task. Review and reject his plan.`
|
||||
|
||||
@@ -1,28 +1,18 @@
|
||||
# s11: Autonomous Agents
|
||||
|
||||
> An idle cycle with task board polling lets teammates find and claim work themselves, with identity re-injection after context compression.
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > [ s11 ] s12`
|
||||
|
||||
## The Problem
|
||||
> *"Poll, claim, work, repeat"* -- no coordinator needed, agents self-organize.
|
||||
|
||||
In s09-s10, teammates only work when explicitly told to. The lead must
|
||||
spawn each teammate with a specific prompt. If the task board has 10
|
||||
unclaimed tasks, the lead must manually assign each one. This does not
|
||||
scale.
|
||||
## Problem
|
||||
|
||||
True autonomy means teammates find work themselves. When a teammate
|
||||
finishes its current task, it should scan the task board for unclaimed
|
||||
work, claim a task, and start working -- without any instruction from
|
||||
the lead.
|
||||
In s09-s10, teammates only work when explicitly told to. The lead must spawn each one with a specific prompt. 10 unclaimed tasks on the board? The lead assigns each one manually. Doesn't scale.
|
||||
|
||||
But autonomous agents face a subtlety: after context compression, the
|
||||
agent might forget who it is. If the messages are summarized, the
|
||||
original system prompt identity ("you are alice, role: coder") fades.
|
||||
Identity re-injection solves this by inserting an identity block at the
|
||||
start of compressed contexts.
|
||||
True autonomy: teammates scan the task board themselves, claim unclaimed tasks, work on them, then look for more.
|
||||
|
||||
Note: token estimation here uses characters/4 (rough). The nag threshold of 3 rounds is low for teaching visibility.
|
||||
One subtlety: after context compression (s06), the agent might forget who it is. Identity re-injection fixes this.
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
Teammate lifecycle with idle cycle:
|
||||
@@ -36,8 +26,7 @@ Teammate lifecycle with idle cycle:
|
||||
| WORK | <------------- | LLM |
|
||||
+---+---+ +-------+
|
||||
|
|
||||
| stop_reason != tool_use
|
||||
| (or idle tool called)
|
||||
| stop_reason != tool_use (or idle tool called)
|
||||
v
|
||||
+--------+
|
||||
| IDLE | poll every 5s for up to 60s
|
||||
@@ -52,14 +41,11 @@ Teammate lifecycle with idle cycle:
|
||||
Identity re-injection after compression:
|
||||
if len(messages) <= 3:
|
||||
messages.insert(0, identity_block)
|
||||
"You are 'alice', role: coder, team: my-team"
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The teammate loop has two phases: WORK and IDLE. WORK runs the
|
||||
standard agent loop. When the LLM stops calling tools (or calls
|
||||
the `idle` tool), the teammate enters the IDLE phase.
|
||||
1. The teammate loop has two phases: WORK and IDLE. When the LLM stops calling tools (or calls `idle`), the teammate enters IDLE.
|
||||
|
||||
```python
|
||||
def _loop(self, name, role, prompt):
|
||||
@@ -67,12 +53,6 @@ def _loop(self, name, role, prompt):
|
||||
# -- WORK PHASE --
|
||||
messages = [{"role": "user", "content": prompt}]
|
||||
for _ in range(50):
|
||||
inbox = BUS.read_inbox(name)
|
||||
for msg in inbox:
|
||||
if msg.get("type") == "shutdown_request":
|
||||
self._set_status(name, "shutdown")
|
||||
return
|
||||
messages.append(...)
|
||||
response = client.messages.create(...)
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
@@ -89,36 +69,31 @@ def _loop(self, name, role, prompt):
|
||||
self._set_status(name, "working")
|
||||
```
|
||||
|
||||
2. The idle phase polls the inbox and task board in a loop.
|
||||
2. The idle phase polls inbox and task board in a loop.
|
||||
|
||||
```python
|
||||
def _idle_poll(self, name, messages):
|
||||
polls = IDLE_TIMEOUT // POLL_INTERVAL # 60s / 5s = 12
|
||||
for _ in range(polls):
|
||||
for _ in range(IDLE_TIMEOUT // POLL_INTERVAL): # 60s / 5s = 12
|
||||
time.sleep(POLL_INTERVAL)
|
||||
# Check inbox for new messages
|
||||
inbox = BUS.read_inbox(name)
|
||||
if inbox:
|
||||
messages.append({"role": "user",
|
||||
"content": f"<inbox>{inbox}</inbox>"})
|
||||
return True
|
||||
# Scan task board for unclaimed tasks
|
||||
unclaimed = scan_unclaimed_tasks()
|
||||
if unclaimed:
|
||||
task = unclaimed[0]
|
||||
claim_task(task["id"], name)
|
||||
claim_task(unclaimed[0]["id"], name)
|
||||
messages.append({"role": "user",
|
||||
"content": f"<auto-claimed>Task #{task['id']}: "
|
||||
f"{task['subject']}</auto-claimed>"})
|
||||
"content": f"<auto-claimed>Task #{unclaimed[0]['id']}: "
|
||||
f"{unclaimed[0]['subject']}</auto-claimed>"})
|
||||
return True
|
||||
return False # timeout -> shutdown
|
||||
```
|
||||
|
||||
3. Task board scanning looks for pending, unowned, unblocked tasks.
|
||||
3. Task board scanning: find pending, unowned, unblocked tasks.
|
||||
|
||||
```python
|
||||
def scan_unclaimed_tasks() -> list:
|
||||
TASKS_DIR.mkdir(exist_ok=True)
|
||||
unclaimed = []
|
||||
for f in sorted(TASKS_DIR.glob("task_*.json")):
|
||||
task = json.loads(f.read_text())
|
||||
@@ -127,77 +102,19 @@ def scan_unclaimed_tasks() -> list:
|
||||
and not task.get("blockedBy")):
|
||||
unclaimed.append(task)
|
||||
return unclaimed
|
||||
|
||||
def claim_task(task_id: int, owner: str):
|
||||
path = TASKS_DIR / f"task_{task_id}.json"
|
||||
task = json.loads(path.read_text())
|
||||
task["status"] = "in_progress"
|
||||
task["owner"] = owner
|
||||
path.write_text(json.dumps(task, indent=2))
|
||||
```
|
||||
|
||||
4. Identity re-injection inserts an identity block when the context
|
||||
is too short, indicating compression has occurred.
|
||||
4. Identity re-injection: when context is too short (compression happened), insert an identity block.
|
||||
|
||||
```python
|
||||
def make_identity_block(name, role, team_name):
|
||||
return {"role": "user",
|
||||
"content": f"<identity>You are '{name}', "
|
||||
f"role: {role}, team: {team_name}. "
|
||||
f"Continue your work.</identity>"}
|
||||
|
||||
# Before resuming work after idle:
|
||||
if len(messages) <= 3:
|
||||
messages.insert(0, make_identity_block(
|
||||
name, role, team_name))
|
||||
messages.insert(0, {"role": "user",
|
||||
"content": f"<identity>You are '{name}', role: {role}, "
|
||||
f"team: {team_name}. Continue your work.</identity>"})
|
||||
messages.insert(1, {"role": "assistant",
|
||||
"content": f"I am {name}. Continuing."})
|
||||
```
|
||||
|
||||
5. The `idle` tool lets the teammate explicitly signal it has no more
|
||||
work, entering the idle polling phase early.
|
||||
|
||||
```python
|
||||
{"name": "idle",
|
||||
"description": "Signal that you have no more work. "
|
||||
"Enters idle polling phase.",
|
||||
"input_schema": {"type": "object", "properties": {}}},
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The autonomous loop (from `agents/s11_autonomous_agents.py`):
|
||||
|
||||
```python
|
||||
def _loop(self, name, role, prompt):
|
||||
while True:
|
||||
# WORK PHASE
|
||||
for _ in range(50):
|
||||
response = client.messages.create(...)
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
for block in response.content:
|
||||
if block.name == "idle":
|
||||
idle_requested = True
|
||||
if idle_requested:
|
||||
break
|
||||
|
||||
# IDLE PHASE
|
||||
self._set_status(name, "idle")
|
||||
for _ in range(IDLE_TIMEOUT // POLL_INTERVAL):
|
||||
time.sleep(POLL_INTERVAL)
|
||||
inbox = BUS.read_inbox(name)
|
||||
if inbox: resume = True; break
|
||||
unclaimed = scan_unclaimed_tasks()
|
||||
if unclaimed:
|
||||
claim_task(unclaimed[0]["id"], name)
|
||||
resume = True; break
|
||||
if not resume:
|
||||
self._set_status(name, "shutdown")
|
||||
return
|
||||
self._set_status(name, "working")
|
||||
```
|
||||
|
||||
## What Changed From s10
|
||||
|
||||
| Component | Before (s10) | After (s11) |
|
||||
@@ -209,10 +126,6 @@ def _loop(self, name, role, prompt):
|
||||
| Identity | System prompt | + re-injection after compress|
|
||||
| Timeout | None | 60s idle -> auto shutdown |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Polling + timeout makes agents self-organizing without a central coordinator. Each agent independently polls the task board, claims unclaimed work, and returns to idle when done. The timeout triggers the poll cycle, and if no work appears within the window, the agent shuts itself down. This is the same pattern as work-stealing thread pools -- distributed, no single point of failure. Identity re-injection after compression ensures agents maintain their role even when conversation history is summarized away.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -220,8 +133,6 @@ cd learn-claude-code
|
||||
python agents/s11_autonomous_agents.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Create 3 tasks on the board, then spawn alice and bob. Watch them auto-claim.`
|
||||
2. `Spawn a coder teammate and let it find work from the task board itself`
|
||||
3. `Create tasks with dependencies. Watch teammates respect the blocked order.`
|
||||
|
||||
@@ -1,238 +1,109 @@
|
||||
# s12: Worktree + Task Isolation
|
||||
|
||||
> Isolate by directory, coordinate by task ID -- tasks are the control plane, worktrees are the execution plane, and an event stream makes every lifecycle step observable.
|
||||
`s01 > s02 > s03 > s04 > s05 > s06 | s07 > s08 > s09 > s10 > s11 > [ s12 ]`
|
||||
|
||||
## The Problem
|
||||
> *"Isolate by directory, coordinate by task ID"* -- task board + optional worktree lanes.
|
||||
|
||||
By s11, agents can claim and complete tasks autonomously. But every task runs in one shared directory. Ask two agents to refactor different modules at the same time and you hit three failure modes:
|
||||
## Problem
|
||||
|
||||
Agent A edits `auth.py`. Agent B edits `auth.py`. Neither knows the other touched it. Unstaged changes collide, task status says "in_progress" but the directory is a mess, and when something breaks there is no way to roll back one agent's work without destroying the other's. The task board tracks _what to do_ but has no opinion about _where to do it_.
|
||||
By s11, agents can claim and complete tasks autonomously. But every task runs in one shared directory. Two agents refactoring different modules at the same time will collide: agent A edits `config.py`, agent B edits `config.py`, unstaged changes mix, and neither can roll back cleanly.
|
||||
|
||||
The fix is to separate the two concerns. Tasks manage goals. Worktrees manage execution context. Bind them by task ID, and each agent gets its own directory, its own branch, and a clean teardown path.
|
||||
The task board tracks *what to do* but has no opinion about *where to do it*. The fix: give each task its own git worktree directory. Tasks manage goals, worktrees manage execution context. Bind them by task ID.
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
Control Plane (.tasks/) Execution Plane (.worktrees/)
|
||||
+---------------------------+ +---------------------------+
|
||||
| task_1.json | | index.json |
|
||||
| id: 1 | | name: "auth-refactor" |
|
||||
| subject: "Auth refactor"| bind | path: ".worktrees/..." |
|
||||
| status: "in_progress" | <----> | branch: "wt/auth-..." |
|
||||
| worktree: "auth-refactor"| | task_id: 1 |
|
||||
+---------------------------+ | status: "active" |
|
||||
+---------------------------+
|
||||
| task_2.json | | |
|
||||
| id: 2 | bind | name: "ui-login" |
|
||||
| subject: "Login page" | <----> | task_id: 2 |
|
||||
| worktree: "ui-login" | | status: "active" |
|
||||
+---------------------------+ +---------------------------+
|
||||
|
|
||||
+---------------------------+
|
||||
| events.jsonl (append-only)|
|
||||
| worktree.create.before |
|
||||
| worktree.create.after |
|
||||
| worktree.remove.after |
|
||||
| task.completed |
|
||||
+---------------------------+
|
||||
Control plane (.tasks/) Execution plane (.worktrees/)
|
||||
+------------------+ +------------------------+
|
||||
| task_1.json | | auth-refactor/ |
|
||||
| status: in_progress <------> branch: wt/auth-refactor
|
||||
| worktree: "auth-refactor" | task_id: 1 |
|
||||
+------------------+ +------------------------+
|
||||
| task_2.json | | ui-login/ |
|
||||
| status: pending <------> branch: wt/ui-login
|
||||
| worktree: "ui-login" | task_id: 2 |
|
||||
+------------------+ +------------------------+
|
||||
|
|
||||
index.json (worktree registry)
|
||||
events.jsonl (lifecycle log)
|
||||
|
||||
State machines:
|
||||
Task: pending -> in_progress -> completed
|
||||
Worktree: absent -> active -> removed | kept
|
||||
```
|
||||
|
||||
Three state layers make this work:
|
||||
|
||||
1. **Control plane** (`.tasks/task_*.json`) -- what is assigned, in progress, or done. Key fields: `id`, `subject`, `status`, `owner`, `worktree`.
|
||||
2. **Execution plane** (`.worktrees/index.json`) -- where commands run and whether the workspace is still valid. Key fields: `name`, `path`, `branch`, `task_id`, `status`.
|
||||
3. **Runtime state** (in-memory) -- per-turn execution continuity: `current_task`, `current_worktree`, `tool_result`, `error`.
|
||||
|
||||
## How It Works
|
||||
|
||||
The lifecycle has five steps. Each step is a tool call.
|
||||
|
||||
1. **Create a task.** Persist the goal first. The task starts as `pending` with an empty `worktree` field.
|
||||
1. **Create a task.** Persist the goal first.
|
||||
|
||||
```python
|
||||
task = {
|
||||
"id": self._next_id,
|
||||
"subject": subject,
|
||||
"status": "pending",
|
||||
"owner": "",
|
||||
"worktree": "",
|
||||
}
|
||||
self._save(task)
|
||||
TASKS.create("Implement auth refactor")
|
||||
# -> .tasks/task_1.json status=pending worktree=""
|
||||
```
|
||||
|
||||
2. **Create a worktree.** Allocate an isolated directory and branch. If you pass `task_id`, the task auto-advances to `in_progress` and the binding is written to both sides.
|
||||
2. **Create a worktree and bind to the task.** Passing `task_id` auto-advances the task to `in_progress`.
|
||||
|
||||
```python
|
||||
self._run_git(["worktree", "add", "-b", branch, str(path), base_ref])
|
||||
|
||||
entry = {
|
||||
"name": name,
|
||||
"path": str(path),
|
||||
"branch": branch,
|
||||
"task_id": task_id,
|
||||
"status": "active",
|
||||
}
|
||||
idx["worktrees"].append(entry)
|
||||
self._save_index(idx)
|
||||
|
||||
if task_id is not None:
|
||||
self.tasks.bind_worktree(task_id, name)
|
||||
WORKTREES.create("auth-refactor", task_id=1)
|
||||
# -> git worktree add -b wt/auth-refactor .worktrees/auth-refactor HEAD
|
||||
# -> index.json gets new entry, task_1.json gets worktree="auth-refactor"
|
||||
```
|
||||
|
||||
3. **Run commands in the worktree.** `worktree_run` sets `cwd` to the worktree path. Edits happen in the isolated directory, not the shared workspace.
|
||||
The binding writes state to both sides:
|
||||
|
||||
```python
|
||||
r = subprocess.run(
|
||||
command,
|
||||
shell=True,
|
||||
cwd=path,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300,
|
||||
)
|
||||
```
|
||||
|
||||
4. **Observe.** `worktree_status` shows git state inside the isolated context. `worktree_events` queries the append-only event stream.
|
||||
|
||||
5. **Close out.** Two choices:
|
||||
- `worktree_keep(name)` -- preserve the directory, mark lifecycle as `kept`.
|
||||
- `worktree_remove(name, complete_task=True)` -- remove the directory, complete the bound task, unbind, and emit `task.completed`. This is the closeout pattern: one call handles teardown and task completion together.
|
||||
|
||||
## State Machines
|
||||
|
||||
```
|
||||
Task: pending -------> in_progress -------> completed
|
||||
(worktree_create (worktree_remove
|
||||
with task_id) with complete_task=true)
|
||||
|
||||
Worktree: absent --------> active -----------> removed | kept
|
||||
(worktree_create) (worktree_remove | worktree_keep)
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The closeout pattern -- teardown + task completion in one operation (from `agents/s12_worktree_task_isolation.py`):
|
||||
|
||||
```python
|
||||
def remove(self, name: str, force: bool = False, complete_task: bool = False) -> str:
|
||||
wt = self._find(name)
|
||||
if not wt:
|
||||
return f"Error: Unknown worktree '{name}'"
|
||||
|
||||
self.events.emit(
|
||||
"worktree.remove.before",
|
||||
task={"id": wt.get("task_id")} if wt.get("task_id") is not None else {},
|
||||
worktree={"name": name, "path": wt.get("path")},
|
||||
)
|
||||
try:
|
||||
args = ["worktree", "remove"]
|
||||
if force:
|
||||
args.append("--force")
|
||||
args.append(wt["path"])
|
||||
self._run_git(args)
|
||||
|
||||
if complete_task and wt.get("task_id") is not None:
|
||||
task_id = wt["task_id"]
|
||||
self.tasks.update(task_id, status="completed")
|
||||
self.tasks.unbind_worktree(task_id)
|
||||
self.events.emit("task.completed", task={
|
||||
"id": task_id, "status": "completed",
|
||||
}, worktree={"name": name})
|
||||
|
||||
idx = self._load_index()
|
||||
for item in idx.get("worktrees", []):
|
||||
if item.get("name") == name:
|
||||
item["status"] = "removed"
|
||||
item["removed_at"] = time.time()
|
||||
self._save_index(idx)
|
||||
|
||||
self.events.emit(
|
||||
"worktree.remove.after",
|
||||
task={"id": wt.get("task_id")} if wt.get("task_id") is not None else {},
|
||||
worktree={"name": name, "path": wt.get("path"), "status": "removed"},
|
||||
)
|
||||
return f"Removed worktree '{name}'"
|
||||
except Exception as e:
|
||||
self.events.emit(
|
||||
"worktree.remove.failed",
|
||||
worktree={"name": name},
|
||||
error=str(e),
|
||||
)
|
||||
raise
|
||||
```
|
||||
|
||||
The task-side binding (from `agents/s12_worktree_task_isolation.py`):
|
||||
|
||||
```python
|
||||
def bind_worktree(self, task_id: int, worktree: str, owner: str = "") -> str:
|
||||
def bind_worktree(self, task_id, worktree):
|
||||
task = self._load(task_id)
|
||||
task["worktree"] = worktree
|
||||
if task["status"] == "pending":
|
||||
task["status"] = "in_progress"
|
||||
task["updated_at"] = time.time()
|
||||
self._save(task)
|
||||
```
|
||||
|
||||
The dispatch map wiring all tools together:
|
||||
3. **Run commands in the worktree.** `cwd` points to the isolated directory.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": lambda **kw: run_bash(kw["command"]),
|
||||
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
|
||||
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
|
||||
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"], kw["new_text"]),
|
||||
"task_create": lambda **kw: TASKS.create(kw["subject"], kw.get("description", "")),
|
||||
"task_list": lambda **kw: TASKS.list_all(),
|
||||
"task_get": lambda **kw: TASKS.get(kw["task_id"]),
|
||||
"task_update": lambda **kw: TASKS.update(kw["task_id"], kw.get("status"), kw.get("owner")),
|
||||
"task_bind_worktree": lambda **kw: TASKS.bind_worktree(kw["task_id"], kw["worktree"]),
|
||||
"worktree_create": lambda **kw: WORKTREES.create(kw["name"], kw.get("task_id")),
|
||||
"worktree_list": lambda **kw: WORKTREES.list_all(),
|
||||
"worktree_status": lambda **kw: WORKTREES.status(kw["name"]),
|
||||
"worktree_run": lambda **kw: WORKTREES.run(kw["name"], kw["command"]),
|
||||
"worktree_keep": lambda **kw: WORKTREES.keep(kw["name"]),
|
||||
"worktree_remove": lambda **kw: WORKTREES.remove(kw["name"], kw.get("force", False), kw.get("complete_task", False)),
|
||||
"worktree_events": lambda **kw: EVENTS.list_recent(kw.get("limit", 20)),
|
||||
}
|
||||
subprocess.run(command, shell=True, cwd=worktree_path,
|
||||
capture_output=True, text=True, timeout=300)
|
||||
```
|
||||
|
||||
## Event Stream
|
||||
4. **Close out.** Two choices:
|
||||
- `worktree_keep(name)` -- preserve the directory for later.
|
||||
- `worktree_remove(name, complete_task=True)` -- remove directory, complete the bound task, emit event. One call handles teardown + completion.
|
||||
|
||||
Every lifecycle transition emits a before/after/failed triplet to `.worktrees/events.jsonl`. This is an append-only log, not a replacement for task/worktree state files.
|
||||
```python
|
||||
def remove(self, name, force=False, complete_task=False):
|
||||
self._run_git(["worktree", "remove", wt["path"]])
|
||||
if complete_task and wt.get("task_id") is not None:
|
||||
self.tasks.update(wt["task_id"], status="completed")
|
||||
self.tasks.unbind_worktree(wt["task_id"])
|
||||
self.events.emit("task.completed", ...)
|
||||
```
|
||||
|
||||
Events emitted:
|
||||
|
||||
- `worktree.create.before` / `worktree.create.after` / `worktree.create.failed`
|
||||
- `worktree.remove.before` / `worktree.remove.after` / `worktree.remove.failed`
|
||||
- `worktree.keep`
|
||||
- `task.completed` (when `complete_task=true` succeeds)
|
||||
|
||||
Payload shape:
|
||||
5. **Event stream.** Every lifecycle step emits to `.worktrees/events.jsonl`:
|
||||
|
||||
```json
|
||||
{
|
||||
"event": "worktree.remove.after",
|
||||
"task": {"id": 7, "status": "completed"},
|
||||
"worktree": {"name": "auth-refactor", "path": "...", "status": "removed"},
|
||||
"task": {"id": 1, "status": "completed"},
|
||||
"worktree": {"name": "auth-refactor", "status": "removed"},
|
||||
"ts": 1730000000
|
||||
}
|
||||
```
|
||||
|
||||
This gives you three things: policy decoupling (audit and notifications stay outside the core flow), failure compensation (`*.failed` records mark partial transitions), and queryability (`worktree_events` tool reads the log directly).
|
||||
Events emitted: `worktree.create.before/after/failed`, `worktree.remove.before/after/failed`, `worktree.keep`, `task.completed`.
|
||||
|
||||
After a crash, state reconstructs from `.tasks/` + `.worktrees/index.json` on disk. Conversation memory is volatile; file state is durable.
|
||||
|
||||
## What Changed From s11
|
||||
|
||||
| Component | Before (s11) | After (s12) |
|
||||
|--------------------|----------------------------|----------------------------------------------|
|
||||
| Coordination state | Task board (`owner/status`) | Task board + explicit `worktree` binding |
|
||||
| Execution scope | Shared directory | Task-scoped isolated directory |
|
||||
| Recoverability | Task status only | Task status + worktree index |
|
||||
| Teardown semantics | Task completion | Task completion + explicit keep/remove |
|
||||
| Lifecycle visibility | Implicit in logs | Explicit events in `.worktrees/events.jsonl` |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Separating control plane from execution plane means you can reason about _what to do_ and _where to do it_ independently. A task can exist without a worktree (planning phase). A worktree can exist without a task (ad-hoc exploration). Binding them is an explicit action that writes state to both sides. This composability is the point -- it keeps the system recoverable after crashes. After an interruption, state reconstructs from `.tasks/` + `.worktrees/index.json` on disk. Volatile in-memory session state downgrades into explicit, durable file state. The event stream adds observability without coupling side effects into the critical path: auditing, notifications, and quota checks consume events rather than intercepting state writes.
|
||||
| Coordination | Task board (owner/status) | Task board + explicit worktree binding |
|
||||
| Execution scope | Shared directory | Task-scoped isolated directory |
|
||||
| Recoverability | Task status only | Task status + worktree index |
|
||||
| Teardown | Task completion | Task completion + explicit keep/remove |
|
||||
| Lifecycle visibility | Implicit in logs | Explicit events in `.worktrees/events.jsonl` |
|
||||
|
||||
## Try It
|
||||
|
||||
@@ -241,10 +112,8 @@ cd learn-claude-code
|
||||
python agents/s12_worktree_task_isolation.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Create tasks for backend auth and frontend login page, then list tasks.`
|
||||
2. `Create worktree "auth-refactor" for task 1, create worktree "ui-login", then bind task 2 to "ui-login".`
|
||||
2. `Create worktree "auth-refactor" for task 1, then bind task 2 to a new worktree "ui-login".`
|
||||
3. `Run "git status --short" in worktree "auth-refactor".`
|
||||
4. `Keep worktree "ui-login", then list worktrees and inspect worktree events.`
|
||||
4. `Keep worktree "ui-login", then list worktrees and inspect events.`
|
||||
5. `Remove worktree "auth-refactor" with complete_task=true, then list tasks/worktrees/events.`
|
||||
|
||||
Reference in New Issue
Block a user