mirror of
https://github.com/shareAI-lab/analysis_claude_code.git
synced 2026-05-06 16:26:16 +08:00
feat: build an AI agent from 0 to 1 -- 11 progressive sessions
- 11 sessions from basic agent loop to autonomous teams - Python MVP implementations for each session - Mental-model-first docs in en/zh/ja - Interactive web platform with step-through visualizations - Incremental architecture: each session adds one mechanism
This commit is contained in:
143
docs/en/s01-the-agent-loop.md
Normal file
143
docs/en/s01-the-agent-loop.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# s01: The Agent Loop
|
||||
|
||||
> The entire secret of AI coding agents is a while loop that feeds tool results back to the model until the model decides to stop.
|
||||
|
||||
## The Problem
|
||||
|
||||
Why can't a language model just answer a coding question? Because coding
|
||||
requires _interaction with the real world_. The model needs to read files,
|
||||
run tests, check errors, and iterate. A single prompt-response pair cannot
|
||||
do this.
|
||||
|
||||
Without the agent loop, you would have to copy-paste outputs back into the
|
||||
model yourself. The user becomes the loop. The agent loop automates this:
|
||||
call the model, execute whatever tools it asks for, feed the results back,
|
||||
repeat until the model says "I'm done."
|
||||
|
||||
Consider a simple task: "Create a Python file that prints hello." The model
|
||||
needs to (1) decide to write a file, (2) write it, (3) verify it works.
|
||||
That is three tool calls minimum. Without a loop, each one requires manual
|
||||
human intervention.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
+----------+ +-------+ +---------+
|
||||
| User | ---> | LLM | ---> | Tool |
|
||||
| prompt | | | | execute |
|
||||
+----------+ +---+---+ +----+----+
|
||||
^ |
|
||||
| tool_result |
|
||||
+---------------+
|
||||
(loop continues)
|
||||
|
||||
The loop terminates when stop_reason != "tool_use".
|
||||
That single condition is the entire control flow.
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The user provides a prompt. It becomes the first message.
|
||||
|
||||
```python
|
||||
history.append({"role": "user", "content": query})
|
||||
```
|
||||
|
||||
2. The messages array is sent to the LLM along with the tool definitions.
|
||||
|
||||
```python
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
```
|
||||
|
||||
3. The assistant response is appended to messages.
|
||||
|
||||
```python
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
```
|
||||
|
||||
4. We check the stop reason. If the model did not call a tool, the loop
|
||||
ends. This is the only exit condition.
|
||||
|
||||
```python
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
```
|
||||
|
||||
5. For each tool_use block in the response, execute the tool (bash in this
|
||||
session) and collect results.
|
||||
|
||||
```python
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
```
|
||||
|
||||
6. The results are appended as a user message, and the loop continues.
|
||||
|
||||
```python
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The minimum viable agent -- the entire pattern in under 30 lines
|
||||
(from `agents/s01_agent_loop.py`, lines 66-86):
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
output = run_bash(block.input["command"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
## What Changed
|
||||
|
||||
This is session 1 -- the starting point. There is no prior session.
|
||||
|
||||
| Component | Before | After |
|
||||
|---------------|------------|--------------------------------|
|
||||
| Agent loop | (none) | `while True` + stop_reason |
|
||||
| Tools | (none) | `bash` (one tool) |
|
||||
| Messages | (none) | Accumulating list |
|
||||
| Control flow | (none) | `stop_reason != "tool_use"` |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
This loop is the universal foundation of all LLM-based agents. Production implementations add error handling, token counting, streaming, and retry logic, but the fundamental structure is unchanged. The simplicity is the point: one exit condition (`stop_reason != "tool_use"`) controls the entire flow. Everything else in this course -- tools, planning, compression, teams -- layers on top of this loop without modifying it. Understanding this loop means understanding every agent.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s01_agent_loop.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Create a file called hello.py that prints "Hello, World!"`
|
||||
2. `List all Python files in this directory`
|
||||
3. `What is the current git branch?`
|
||||
4. `Create a directory called test_output and write 3 files in it`
|
||||
151
docs/en/s02-tool-use.md
Normal file
151
docs/en/s02-tool-use.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# s02: Tools
|
||||
|
||||
> A dispatch map routes tool calls to handler functions -- the loop itself does not change at all.
|
||||
|
||||
## The Problem
|
||||
|
||||
With only `bash`, the agent shells out for everything: reading files,
|
||||
writing files, editing files. This works but is fragile. `cat` output
|
||||
gets truncated unpredictably. `sed` replacements fail on special
|
||||
characters. The model wastes tokens constructing shell pipelines when
|
||||
a direct function call would be simpler.
|
||||
|
||||
More importantly, bash is a security surface. Every bash call can do
|
||||
anything the shell can do. With dedicated tools like `read_file` and
|
||||
`write_file`, you can enforce path sandboxing and block dangerous
|
||||
patterns at the tool level rather than hoping the model avoids them.
|
||||
|
||||
The insight is that adding tools does not require changing the loop.
|
||||
The loop from s01 stays identical. You add entries to the tools array,
|
||||
add handler functions, and wire them together with a dispatch map.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
+----------+ +-------+ +------------------+
|
||||
| User | ---> | LLM | ---> | Tool Dispatch |
|
||||
| prompt | | | | { |
|
||||
+----------+ +---+---+ | bash: run_bash |
|
||||
^ | read: run_read |
|
||||
| | write: run_wr |
|
||||
+----------+ edit: run_edit |
|
||||
tool_result| } |
|
||||
+------------------+
|
||||
|
||||
The dispatch map is a dict: {tool_name: handler_function}
|
||||
One lookup replaces any if/elif chain.
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Define handler functions for each tool. Each takes keyword arguments
|
||||
matching the tool's input_schema and returns a string result.
|
||||
|
||||
```python
|
||||
def run_read(path: str, limit: int = None) -> str:
|
||||
text = safe_path(path).read_text()
|
||||
lines = text.splitlines()
|
||||
if limit and limit < len(lines):
|
||||
lines = lines[:limit]
|
||||
return "\n".join(lines)[:50000]
|
||||
```
|
||||
|
||||
2. Create the dispatch map linking tool names to handlers.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": lambda **kw: run_bash(kw["command"]),
|
||||
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
|
||||
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
|
||||
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"],
|
||||
kw["new_text"]),
|
||||
}
|
||||
```
|
||||
|
||||
3. In the agent loop, look up the handler by name instead of hardcoding.
|
||||
|
||||
```python
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input)
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
```
|
||||
|
||||
4. Path sandboxing prevents the model from escaping the workspace.
|
||||
|
||||
```python
|
||||
def safe_path(p: str) -> Path:
|
||||
path = (WORKDIR / p).resolve()
|
||||
if not path.is_relative_to(WORKDIR):
|
||||
raise ValueError(f"Path escapes workspace: {p}")
|
||||
return path
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The dispatch pattern (from `agents/s02_tool_use.py`, lines 93-129):
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": lambda **kw: run_bash(kw["command"]),
|
||||
"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),
|
||||
"write_file": lambda **kw: run_write(kw["path"], kw["content"]),
|
||||
"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"],
|
||||
kw["new_text"]),
|
||||
}
|
||||
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM, messages=messages,
|
||||
tools=TOOLS, max_tokens=8000,
|
||||
)
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
return
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input) if handler \
|
||||
else f"Unknown tool: {block.name}"
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": output,
|
||||
})
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
## What Changed From s01
|
||||
|
||||
| Component | Before (s01) | After (s02) |
|
||||
|----------------|--------------------|----------------------------|
|
||||
| Tools | 1 (bash only) | 4 (bash, read, write, edit)|
|
||||
| Dispatch | Hardcoded bash call | `TOOL_HANDLERS` dict |
|
||||
| Path safety | None | `safe_path()` sandbox |
|
||||
| Agent loop | Unchanged | Unchanged |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
The dispatch map pattern scales linearly -- adding a tool means adding one handler and one schema entry. The loop never changes. This separation of concerns (loop vs handlers) is why agent frameworks can support dozens of tools without increasing control flow complexity. The pattern also enables independent testing of each handler in isolation, since handlers are pure functions with no coupling to the loop. Any agent that outgrows a dispatch map has a design problem, not a scaling problem.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s02_tool_use.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Read the file requirements.txt`
|
||||
2. `Create a file called greet.py with a greet(name) function`
|
||||
3. `Edit greet.py to add a docstring to the function`
|
||||
4. `Read greet.py to verify the edit worked`
|
||||
5. `Run the greet function with bash: python -c "from greet import greet; greet('World')"`
|
||||
170
docs/en/s03-todo-write.md
Normal file
170
docs/en/s03-todo-write.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# s03: TodoWrite
|
||||
|
||||
> A TodoManager lets the agent track its own progress, and a nag reminder injection forces it to keep updating when it forgets.
|
||||
|
||||
## The Problem
|
||||
|
||||
When an agent works on a multi-step task, it often loses track of what it
|
||||
has done and what remains. Without explicit planning, the model might repeat
|
||||
work, skip steps, or wander off on tangents. The user has no visibility
|
||||
into the agent's internal plan.
|
||||
|
||||
This is worse than it sounds. Long conversations cause the model to "drift"
|
||||
-- the system prompt fades in influence as the context window fills with
|
||||
tool results. A 10-step refactoring task might complete steps 1-3, then
|
||||
the model starts improvising because it forgot steps 4-10 existed.
|
||||
|
||||
The solution is structured state: a TodoManager that the model writes to
|
||||
explicitly. The model creates a plan, marks items in_progress as it works,
|
||||
and marks them completed when done. A nag reminder injects a nudge if the
|
||||
model goes 3+ rounds without updating its todos.
|
||||
|
||||
Teaching simplification: the nag threshold of 3 rounds is set low for
|
||||
teaching visibility. Production agents typically use a higher threshold
|
||||
around 10 to avoid excessive prompting.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
+----------+ +-------+ +---------+
|
||||
| User | ---> | LLM | ---> | Tools |
|
||||
| prompt | | | | + todo |
|
||||
+----------+ +---+---+ +----+----+
|
||||
^ |
|
||||
| tool_result |
|
||||
+---------------+
|
||||
|
|
||||
+-----------+-----------+
|
||||
| TodoManager state |
|
||||
| [ ] task A |
|
||||
| [>] task B <- doing |
|
||||
| [x] task C |
|
||||
+-----------------------+
|
||||
|
|
||||
if rounds_since_todo >= 3:
|
||||
inject <reminder> into tool_result
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The TodoManager validates and stores a list of items with statuses.
|
||||
Only one item can be `in_progress` at a time.
|
||||
|
||||
```python
|
||||
class TodoManager:
|
||||
def __init__(self):
|
||||
self.items = []
|
||||
|
||||
def update(self, items: list) -> str:
|
||||
validated = []
|
||||
in_progress_count = 0
|
||||
for item in items:
|
||||
status = item.get("status", "pending")
|
||||
if status == "in_progress":
|
||||
in_progress_count += 1
|
||||
validated.append({
|
||||
"id": item["id"],
|
||||
"text": item["text"],
|
||||
"status": status,
|
||||
})
|
||||
if in_progress_count > 1:
|
||||
raise ValueError("Only one task can be in_progress")
|
||||
self.items = validated
|
||||
return self.render()
|
||||
```
|
||||
|
||||
2. The `todo` tool is added to the dispatch map like any other tool.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
"bash": lambda **kw: run_bash(kw["command"]),
|
||||
# ...other tools...
|
||||
"todo": lambda **kw: TODO.update(kw["items"]),
|
||||
}
|
||||
```
|
||||
|
||||
3. The nag reminder injects a `<reminder>` tag into the tool_result
|
||||
messages when the model goes 3+ rounds without calling `todo`.
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list):
|
||||
rounds_since_todo = 0
|
||||
while True:
|
||||
if rounds_since_todo >= 3 and messages:
|
||||
last = messages[-1]
|
||||
if (last["role"] == "user"
|
||||
and isinstance(last.get("content"), list)):
|
||||
last["content"].insert(0, {
|
||||
"type": "text",
|
||||
"text": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
# ... rest of loop ...
|
||||
rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
|
||||
```
|
||||
|
||||
4. The system prompt instructs the model to use todos for planning.
|
||||
|
||||
```python
|
||||
SYSTEM = f"""You are a coding agent at {WORKDIR}.
|
||||
Use the todo tool to plan multi-step tasks.
|
||||
Mark in_progress before starting, completed when done.
|
||||
Prefer tools over prose."""
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The TodoManager and nag injection (from `agents/s03_todo_write.py`,
|
||||
lines 51-85 and 158-187):
|
||||
|
||||
```python
|
||||
class TodoManager:
|
||||
def update(self, items: list) -> str:
|
||||
validated = []
|
||||
in_progress_count = 0
|
||||
for item in items:
|
||||
status = item.get("status", "pending")
|
||||
if status == "in_progress":
|
||||
in_progress_count += 1
|
||||
validated.append({
|
||||
"id": item["id"],
|
||||
"text": item["text"],
|
||||
"status": status,
|
||||
})
|
||||
if in_progress_count > 1:
|
||||
raise ValueError("Only one in_progress")
|
||||
self.items = validated
|
||||
return self.render()
|
||||
|
||||
# In agent_loop:
|
||||
if rounds_since_todo >= 3:
|
||||
last["content"].insert(0, {
|
||||
"type": "text",
|
||||
"text": "<reminder>Update your todos.</reminder>",
|
||||
})
|
||||
```
|
||||
|
||||
## What Changed From s02
|
||||
|
||||
| Component | Before (s02) | After (s03) |
|
||||
|----------------|------------------|--------------------------|
|
||||
| Tools | 4 | 5 (+todo) |
|
||||
| Planning | None | TodoManager with statuses|
|
||||
| Nag injection | None | `<reminder>` after 3 rounds|
|
||||
| Agent loop | Simple dispatch | + rounds_since_todo counter|
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Visible plans improve task completion because the model can self-monitor progress. The nag mechanism creates accountability -- without it, the model may abandon plans mid-execution as conversation context grows and earlier instructions fade. The "one in_progress at a time" constraint enforces sequential focus, preventing context-switching overhead that degrades output quality. This pattern works because it externalizes the model's working memory into structured state that survives attention drift.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s03_todo_write.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Refactor the file hello.py: add type hints, docstrings, and a main guard`
|
||||
2. `Create a Python package with __init__.py, utils.py, and tests/test_utils.py`
|
||||
3. `Review all Python files and fix any style issues`
|
||||
157
docs/en/s04-subagent.md
Normal file
157
docs/en/s04-subagent.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# s04: Subagents
|
||||
|
||||
> A subagent runs with a fresh messages list, shares the filesystem with the parent, and returns only a summary -- keeping the parent context clean.
|
||||
|
||||
## The Problem
|
||||
|
||||
As the agent works, its messages array grows. Every tool call, every file
|
||||
read, every bash output accumulates. After 20-30 tool calls, the context
|
||||
window is crowded with irrelevant history. Reading a 500-line file to
|
||||
answer a quick question permanently adds 500 lines to the context.
|
||||
|
||||
This is particularly bad for exploratory tasks. "What testing framework
|
||||
does this project use?" might require reading 5 files, but the parent
|
||||
agent does not need all 5 file contents in its history -- it just needs
|
||||
the answer: "pytest with conftest.py configuration."
|
||||
|
||||
The solution is process isolation: spawn a child agent with `messages=[]`.
|
||||
The child explores, reads files, runs commands. When it finishes, only its
|
||||
final text response returns to the parent. The child's entire message
|
||||
history is discarded.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
Parent agent Subagent
|
||||
+------------------+ +------------------+
|
||||
| messages=[...] | | messages=[] | <-- fresh
|
||||
| | dispatch | |
|
||||
| tool: task | ---------->| while tool_use: |
|
||||
| prompt="..." | | call tools |
|
||||
| | summary | append results |
|
||||
| result = "..." | <--------- | return last text |
|
||||
+------------------+ +------------------+
|
||||
|
|
||||
Parent context stays clean.
|
||||
Subagent context is discarded.
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The parent agent gets a `task` tool that triggers subagent spawning.
|
||||
The child gets all base tools except `task` (no recursive spawning).
|
||||
|
||||
```python
|
||||
PARENT_TOOLS = CHILD_TOOLS + [
|
||||
{"name": "task",
|
||||
"description": "Spawn a subagent with fresh context.",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"prompt": {"type": "string"},
|
||||
"description": {"type": "string"},
|
||||
},
|
||||
"required": ["prompt"],
|
||||
}},
|
||||
]
|
||||
```
|
||||
|
||||
2. The subagent starts with a fresh messages list containing only
|
||||
the delegated prompt. It shares the same filesystem.
|
||||
|
||||
```python
|
||||
def run_subagent(prompt: str) -> str:
|
||||
sub_messages = [{"role": "user", "content": prompt}]
|
||||
for _ in range(30): # safety limit
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SUBAGENT_SYSTEM,
|
||||
messages=sub_messages,
|
||||
tools=CHILD_TOOLS, max_tokens=8000,
|
||||
)
|
||||
sub_messages.append({
|
||||
"role": "assistant", "content": response.content
|
||||
})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
# execute tools, append results...
|
||||
```
|
||||
|
||||
3. Only the final text returns to the parent. The child's 30+ tool
|
||||
call history is discarded.
|
||||
|
||||
```python
|
||||
return "".join(
|
||||
b.text for b in response.content if hasattr(b, "text")
|
||||
) or "(no summary)"
|
||||
```
|
||||
|
||||
4. The parent receives this summary as a normal tool_result.
|
||||
|
||||
```python
|
||||
if block.name == "task":
|
||||
output = run_subagent(block.input["prompt"])
|
||||
results.append({
|
||||
"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": str(output),
|
||||
})
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The subagent function (from `agents/s04_subagent.py`,
|
||||
lines 110-128):
|
||||
|
||||
```python
|
||||
def run_subagent(prompt: str) -> str:
|
||||
sub_messages = [{"role": "user", "content": prompt}]
|
||||
for _ in range(30):
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SUBAGENT_SYSTEM,
|
||||
messages=sub_messages,
|
||||
tools=CHILD_TOOLS, max_tokens=8000,
|
||||
)
|
||||
sub_messages.append({"role": "assistant",
|
||||
"content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
results = []
|
||||
for block in response.content:
|
||||
if block.type == "tool_use":
|
||||
handler = TOOL_HANDLERS.get(block.name)
|
||||
output = handler(**block.input)
|
||||
results.append({"type": "tool_result",
|
||||
"tool_use_id": block.id,
|
||||
"content": str(output)[:50000]})
|
||||
sub_messages.append({"role": "user", "content": results})
|
||||
return "".join(
|
||||
b.text for b in response.content if hasattr(b, "text")
|
||||
) or "(no summary)"
|
||||
```
|
||||
|
||||
## What Changed From s03
|
||||
|
||||
| Component | Before (s03) | After (s04) |
|
||||
|----------------|------------------|---------------------------|
|
||||
| Tools | 5 | 5 (base) + task (parent) |
|
||||
| Context | Single shared | Parent + child isolation |
|
||||
| Subagent | None | `run_subagent()` function |
|
||||
| Return value | N/A | Summary text only |
|
||||
| Todo system | TodoManager | Removed (not needed here) |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Process isolation gives context isolation for free. A fresh `messages[]` means the subagent cannot be confused by the parent's conversation history. The tradeoff is communication overhead -- results must be compressed back to the parent, losing detail. This is the same tradeoff as OS process isolation: safety and cleanliness in exchange for serialization cost. Limiting subagent depth (no recursive spawning) prevents unbounded resource consumption, and a max iteration count ensures runaway children terminate.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s04_subagent.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Use a subtask to find what testing framework this project uses`
|
||||
2. `Delegate: read all .py files and summarize what each one does`
|
||||
3. `Use a task to create a new module, then verify it from here`
|
||||
165
docs/en/s05-skill-loading.md
Normal file
165
docs/en/s05-skill-loading.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# s05: Skills
|
||||
|
||||
> Two-layer skill injection avoids system prompt bloat by putting skill names in the system prompt (cheap) and full skill bodies in tool_result (on demand).
|
||||
|
||||
## The Problem
|
||||
|
||||
You want the agent to follow specific workflows for different domains:
|
||||
git conventions, testing patterns, code review checklists. The naive
|
||||
approach is to put everything in the system prompt. But the system prompt
|
||||
has limited effective attention -- too much text and the model starts
|
||||
ignoring parts of it.
|
||||
|
||||
If you have 10 skills at 2000 tokens each, that is 20,000 tokens of system
|
||||
prompt. The model pays attention to the beginning and end but skims the
|
||||
middle. Worse, most of those skills are irrelevant to any given task. A
|
||||
file editing task does not need the git workflow instructions.
|
||||
|
||||
The two-layer approach solves this: Layer 1 puts short skill descriptions
|
||||
in the system prompt (~100 tokens per skill). Layer 2 loads the full skill
|
||||
body into a tool_result only when the model calls `load_skill`. The model
|
||||
learns what skills exist (cheap) and loads them on demand (only when
|
||||
relevant).
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
System prompt (Layer 1 -- always present):
|
||||
+--------------------------------------+
|
||||
| You are a coding agent. |
|
||||
| Skills available: |
|
||||
| - git: Git workflow helpers | ~100 tokens/skill
|
||||
| - test: Testing best practices |
|
||||
+--------------------------------------+
|
||||
|
||||
When model calls load_skill("git"):
|
||||
+--------------------------------------+
|
||||
| tool_result (Layer 2 -- on demand): |
|
||||
| <skill name="git"> |
|
||||
| Full git workflow instructions... | ~2000 tokens
|
||||
| Step 1: ... |
|
||||
| Step 2: ... |
|
||||
| </skill> |
|
||||
+--------------------------------------+
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Skill files live in `.skills/` as Markdown with YAML frontmatter.
|
||||
|
||||
```
|
||||
.skills/
|
||||
git.md # ---\n description: Git workflow\n ---\n ...
|
||||
test.md # ---\n description: Testing patterns\n ---\n ...
|
||||
```
|
||||
|
||||
2. The SkillLoader parses frontmatter and separates metadata from body.
|
||||
|
||||
```python
|
||||
class SkillLoader:
|
||||
def _parse_frontmatter(self, text: str) -> tuple:
|
||||
match = re.match(
|
||||
r"^---\n(.*?)\n---\n(.*)", text, re.DOTALL
|
||||
)
|
||||
if not match:
|
||||
return {}, text
|
||||
meta = {}
|
||||
for line in match.group(1).strip().splitlines():
|
||||
if ":" in line:
|
||||
key, val = line.split(":", 1)
|
||||
meta[key.strip()] = val.strip()
|
||||
return meta, match.group(2).strip()
|
||||
```
|
||||
|
||||
3. Layer 1: `get_descriptions()` returns short lines for the system prompt.
|
||||
|
||||
```python
|
||||
def get_descriptions(self) -> str:
|
||||
lines = []
|
||||
for name, skill in self.skills.items():
|
||||
desc = skill["meta"].get("description", "No description")
|
||||
lines.append(f" - {name}: {desc}")
|
||||
return "\n".join(lines)
|
||||
|
||||
SYSTEM = f"""You are a coding agent at {WORKDIR}.
|
||||
Skills available:
|
||||
{SKILL_LOADER.get_descriptions()}"""
|
||||
```
|
||||
|
||||
4. Layer 2: `get_content()` returns the full body wrapped in `<skill>` tags.
|
||||
|
||||
```python
|
||||
def get_content(self, name: str) -> str:
|
||||
skill = self.skills.get(name)
|
||||
if not skill:
|
||||
return f"Error: Unknown skill '{name}'."
|
||||
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
|
||||
```
|
||||
|
||||
5. The `load_skill` tool is just another entry in the dispatch map.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
# ...base tools...
|
||||
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
|
||||
}
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The SkillLoader class (from `agents/s05_skill_loading.py`,
|
||||
lines 51-97):
|
||||
|
||||
```python
|
||||
class SkillLoader:
|
||||
def __init__(self, skills_dir: Path):
|
||||
self.skills = {}
|
||||
for f in sorted(skills_dir.glob("*.md")):
|
||||
text = f.read_text()
|
||||
meta, body = self._parse_frontmatter(text)
|
||||
self.skills[f.stem] = {
|
||||
"meta": meta, "body": body
|
||||
}
|
||||
|
||||
def get_descriptions(self) -> str:
|
||||
lines = []
|
||||
for name, skill in self.skills.items():
|
||||
desc = skill["meta"].get("description", "")
|
||||
lines.append(f" - {name}: {desc}")
|
||||
return "\n".join(lines)
|
||||
|
||||
def get_content(self, name: str) -> str:
|
||||
skill = self.skills.get(name)
|
||||
if not skill:
|
||||
return f"Error: Unknown skill '{name}'."
|
||||
return (f"<skill name=\"{name}\">\n"
|
||||
f"{skill['body']}\n</skill>")
|
||||
```
|
||||
|
||||
## What Changed From s04
|
||||
|
||||
| Component | Before (s04) | After (s05) |
|
||||
|----------------|------------------|----------------------------|
|
||||
| Tools | 5 (base + task) | 5 (base + load_skill) |
|
||||
| System prompt | Static string | + skill descriptions |
|
||||
| Knowledge | None | .skills/*.md files |
|
||||
| Injection | None | Two-layer (system + result)|
|
||||
| Subagent | `run_subagent()` | Removed (different focus) |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Two-layer injection solves the attention budget problem. Putting all skill content in the system prompt wastes tokens on unused skills. Layer 1 (compact summaries) costs roughly 120 tokens total. Layer 2 (full content) loads on demand via tool_result. This scales to dozens of skills without degrading model attention quality. The key insight is that the model only needs to know what skills exist (cheap) to decide when to load one (expensive). This is the same lazy-loading principle used in software module systems.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s05_skill_loading.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `What skills are available?`
|
||||
2. `Load the agent-builder skill and follow its instructions`
|
||||
3. `I need to do a code review -- load the relevant skill first`
|
||||
4. `Build an MCP server using the mcp-builder skill`
|
||||
183
docs/en/s06-context-compact.md
Normal file
183
docs/en/s06-context-compact.md
Normal file
@@ -0,0 +1,183 @@
|
||||
# s06: Compact
|
||||
|
||||
> A three-layer compression pipeline lets the agent work indefinitely by strategically forgetting old tool results, auto-summarizing when tokens exceed a threshold, and allowing manual compression on demand.
|
||||
|
||||
## The Problem
|
||||
|
||||
The context window is finite. After enough tool calls, the messages array
|
||||
exceeds the model's context limit and the API call fails. Even before
|
||||
hitting the hard limit, performance degrades: the model becomes slower,
|
||||
less accurate, and starts ignoring earlier messages.
|
||||
|
||||
A 200,000 token context window sounds large, but a single `read_file` on
|
||||
a 1000-line source file consumes ~4000 tokens. After reading 30 files and
|
||||
running 20 bash commands, you are at 100,000+ tokens. The agent cannot
|
||||
work on large codebases without some form of compression.
|
||||
|
||||
The three-layer pipeline addresses this with increasing aggressiveness:
|
||||
Layer 1 (micro-compact) silently replaces old tool results every turn.
|
||||
Layer 2 (auto-compact) triggers a full summarization when tokens exceed
|
||||
a threshold. Layer 3 (manual compact) lets the model trigger compression
|
||||
itself.
|
||||
|
||||
Teaching simplification: the token estimation here uses a rough
|
||||
characters/4 heuristic. Production systems use proper tokenizer
|
||||
libraries for accurate counts.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
Every turn:
|
||||
+------------------+
|
||||
| Tool call result |
|
||||
+------------------+
|
||||
|
|
||||
v
|
||||
[Layer 1: micro_compact] (silent, every turn)
|
||||
Replace tool_result > 3 turns old
|
||||
with "[Previous: used {tool_name}]"
|
||||
|
|
||||
v
|
||||
[Check: tokens > 50000?]
|
||||
| |
|
||||
no yes
|
||||
| |
|
||||
v v
|
||||
continue [Layer 2: auto_compact]
|
||||
Save transcript to .transcripts/
|
||||
LLM summarizes conversation.
|
||||
Replace all messages with [summary].
|
||||
|
|
||||
v
|
||||
[Layer 3: compact tool]
|
||||
Model calls compact explicitly.
|
||||
Same summarization as auto_compact.
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Layer 1 -- micro_compact**: Before each LLM call, find all
|
||||
tool_result entries older than the last 3 and replace their content.
|
||||
|
||||
```python
|
||||
def micro_compact(messages: list) -> list:
|
||||
tool_results = []
|
||||
for i, msg in enumerate(messages):
|
||||
if msg["role"] == "user" and isinstance(msg.get("content"), list):
|
||||
for j, part in enumerate(msg["content"]):
|
||||
if isinstance(part, dict) and part.get("type") == "tool_result":
|
||||
tool_results.append((i, j, part))
|
||||
if len(tool_results) <= KEEP_RECENT:
|
||||
return messages
|
||||
to_clear = tool_results[:-KEEP_RECENT]
|
||||
for _, _, part in to_clear:
|
||||
if len(part.get("content", "")) > 100:
|
||||
tool_id = part.get("tool_use_id", "")
|
||||
tool_name = tool_name_map.get(tool_id, "unknown")
|
||||
part["content"] = f"[Previous: used {tool_name}]"
|
||||
return messages
|
||||
```
|
||||
|
||||
2. **Layer 2 -- auto_compact**: When estimated tokens exceed 50,000,
|
||||
save the full transcript and ask the LLM to summarize.
|
||||
|
||||
```python
|
||||
def auto_compact(messages: list) -> list:
|
||||
TRANSCRIPT_DIR.mkdir(exist_ok=True)
|
||||
transcript_path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl"
|
||||
with open(transcript_path, "w") as f:
|
||||
for msg in messages:
|
||||
f.write(json.dumps(msg, default=str) + "\n")
|
||||
response = client.messages.create(
|
||||
model=MODEL,
|
||||
messages=[{"role": "user", "content":
|
||||
"Summarize this conversation for continuity..."
|
||||
+ json.dumps(messages, default=str)[:80000]}],
|
||||
max_tokens=2000,
|
||||
)
|
||||
summary = response.content[0].text
|
||||
return [
|
||||
{"role": "user", "content": f"[Compressed]\n\n{summary}"},
|
||||
{"role": "assistant", "content": "Understood. Continuing."},
|
||||
]
|
||||
```
|
||||
|
||||
3. **Layer 3 -- manual compact**: The `compact` tool triggers the same
|
||||
summarization on demand.
|
||||
|
||||
```python
|
||||
if manual_compact:
|
||||
messages[:] = auto_compact(messages)
|
||||
```
|
||||
|
||||
4. The agent loop integrates all three layers.
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
micro_compact(messages)
|
||||
if estimate_tokens(messages) > THRESHOLD:
|
||||
messages[:] = auto_compact(messages)
|
||||
response = client.messages.create(...)
|
||||
# ... tool execution ...
|
||||
if manual_compact:
|
||||
messages[:] = auto_compact(messages)
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The three-layer pipeline (from `agents/s06_context_compact.py`,
|
||||
lines 67-93 and 189-223):
|
||||
|
||||
```python
|
||||
THRESHOLD = 50000
|
||||
KEEP_RECENT = 3
|
||||
|
||||
def micro_compact(messages):
|
||||
# Replace old tool results with placeholders
|
||||
...
|
||||
|
||||
def auto_compact(messages):
|
||||
# Save transcript, LLM summarize, replace messages
|
||||
...
|
||||
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
micro_compact(messages) # Layer 1
|
||||
if estimate_tokens(messages) > THRESHOLD:
|
||||
messages[:] = auto_compact(messages) # Layer 2
|
||||
response = client.messages.create(...)
|
||||
# ...
|
||||
if manual_compact:
|
||||
messages[:] = auto_compact(messages) # Layer 3
|
||||
```
|
||||
|
||||
## What Changed From s05
|
||||
|
||||
| Component | Before (s05) | After (s06) |
|
||||
|----------------|------------------|----------------------------|
|
||||
| Tools | 5 | 5 (base + compact) |
|
||||
| Context mgmt | None | Three-layer compression |
|
||||
| Micro-compact | None | Old results -> placeholders|
|
||||
| Auto-compact | None | Token threshold trigger |
|
||||
| Manual compact | None | `compact` tool |
|
||||
| Transcripts | None | Saved to .transcripts/ |
|
||||
| Skills | load_skill | Removed (different focus) |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Context windows are finite, but agent sessions can be infinite. Three compression layers solve this at different granularities: micro-compact (replace old tool outputs), auto-compact (LLM summarizes when approaching limit), and manual compact (user-triggered). The key insight is that forgetting is a feature, not a bug -- it enables unbounded sessions. Transcripts preserve the full history on disk so nothing is truly lost, just moved out of the active context. The layered approach lets each layer operate independently at its own granularity, from silent per-turn cleanup to full conversation reset.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s06_context_compact.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Read every Python file in the agents/ directory one by one`
|
||||
(watch micro-compact replace old results)
|
||||
2. `Keep reading files until compression triggers automatically`
|
||||
3. `Use the compact tool to manually compress the conversation`
|
||||
172
docs/en/s07-task-system.md
Normal file
172
docs/en/s07-task-system.md
Normal file
@@ -0,0 +1,172 @@
|
||||
# s07: Tasks
|
||||
|
||||
> Tasks persist as JSON files on the filesystem with a dependency graph, so they survive context compression and can be shared across agents.
|
||||
|
||||
## The Problem
|
||||
|
||||
In-memory state like TodoManager (s03) is lost when the context is
|
||||
compressed (s06). After auto_compact replaces messages with a summary,
|
||||
the todo list is gone. The agent has to reconstruct it from the summary
|
||||
text, which is lossy and error-prone.
|
||||
|
||||
This is the critical s06-to-s07 bridge: TodoManager items die with
|
||||
compression; file-based tasks don't. Moving state to the filesystem
|
||||
makes it compression-proof.
|
||||
|
||||
More fundamentally, in-memory state is invisible to other agents.
|
||||
When we eventually build teams (s09+), teammates need a shared task
|
||||
board. In-memory data structures are process-local.
|
||||
|
||||
The solution is to persist tasks as JSON files in `.tasks/`. Each task
|
||||
is a separate file with an ID, subject, status, and dependency graph.
|
||||
Completing task 1 automatically unblocks task 2 if task 2 has
|
||||
`blockedBy: [1]`. The file system becomes the source of truth.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
.tasks/
|
||||
task_1.json {"id":1, "status":"completed", ...}
|
||||
task_2.json {"id":2, "blockedBy":[1], "status":"pending"}
|
||||
task_3.json {"id":3, "blockedBy":[2], "status":"pending"}
|
||||
|
||||
Dependency resolution:
|
||||
+----------+ +----------+ +----------+
|
||||
| task 1 | --> | task 2 | --> | task 3 |
|
||||
| complete | | blocked | | blocked |
|
||||
+----------+ +----------+ +----------+
|
||||
| ^
|
||||
+--- completing task 1 removes it from
|
||||
task 2's blockedBy list
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The TaskManager provides CRUD operations. Each task is a JSON file.
|
||||
|
||||
```python
|
||||
class TaskManager:
|
||||
def create(self, subject: str, description: str = "") -> str:
|
||||
task = {
|
||||
"id": self._next_id,
|
||||
"subject": subject,
|
||||
"description": description,
|
||||
"status": "pending",
|
||||
"blockedBy": [],
|
||||
"blocks": [],
|
||||
"owner": "",
|
||||
}
|
||||
self._save(task)
|
||||
self._next_id += 1
|
||||
return json.dumps(task, indent=2)
|
||||
```
|
||||
|
||||
2. When a task is marked completed, `_clear_dependency` removes its ID
|
||||
from all other tasks' `blockedBy` lists.
|
||||
|
||||
```python
|
||||
def _clear_dependency(self, completed_id: int):
|
||||
for f in self.dir.glob("task_*.json"):
|
||||
task = json.loads(f.read_text())
|
||||
if completed_id in task.get("blockedBy", []):
|
||||
task["blockedBy"].remove(completed_id)
|
||||
self._save(task)
|
||||
```
|
||||
|
||||
3. The `update` method handles status changes and bidirectional dependency
|
||||
wiring.
|
||||
|
||||
```python
|
||||
def update(self, task_id, status=None,
|
||||
add_blocked_by=None, add_blocks=None):
|
||||
task = self._load(task_id)
|
||||
if status:
|
||||
task["status"] = status
|
||||
if status == "completed":
|
||||
self._clear_dependency(task_id)
|
||||
if add_blocks:
|
||||
task["blocks"] = list(set(task["blocks"] + add_blocks))
|
||||
for blocked_id in add_blocks:
|
||||
blocked = self._load(blocked_id)
|
||||
if task_id not in blocked["blockedBy"]:
|
||||
blocked["blockedBy"].append(task_id)
|
||||
self._save(blocked)
|
||||
self._save(task)
|
||||
```
|
||||
|
||||
4. Four task tools are added to the dispatch map.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
# ...base tools...
|
||||
"task_create": lambda **kw: TASKS.create(kw["subject"]),
|
||||
"task_update": lambda **kw: TASKS.update(kw["task_id"],
|
||||
kw.get("status")),
|
||||
"task_list": lambda **kw: TASKS.list_all(),
|
||||
"task_get": lambda **kw: TASKS.get(kw["task_id"]),
|
||||
}
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The TaskManager with dependency graph (from `agents/s07_task_system.py`,
|
||||
lines 46-123):
|
||||
|
||||
```python
|
||||
class TaskManager:
|
||||
def __init__(self, tasks_dir: Path):
|
||||
self.dir = tasks_dir
|
||||
self.dir.mkdir(exist_ok=True)
|
||||
self._next_id = self._max_id() + 1
|
||||
|
||||
def _load(self, task_id: int) -> dict:
|
||||
path = self.dir / f"task_{task_id}.json"
|
||||
return json.loads(path.read_text())
|
||||
|
||||
def _save(self, task: dict):
|
||||
path = self.dir / f"task_{task['id']}.json"
|
||||
path.write_text(json.dumps(task, indent=2))
|
||||
|
||||
def create(self, subject, description=""):
|
||||
task = {"id": self._next_id, "subject": subject,
|
||||
"status": "pending", "blockedBy": [],
|
||||
"blocks": [], "owner": ""}
|
||||
self._save(task)
|
||||
self._next_id += 1
|
||||
return json.dumps(task, indent=2)
|
||||
|
||||
def _clear_dependency(self, completed_id):
|
||||
for f in self.dir.glob("task_*.json"):
|
||||
task = json.loads(f.read_text())
|
||||
if completed_id in task.get("blockedBy", []):
|
||||
task["blockedBy"].remove(completed_id)
|
||||
self._save(task)
|
||||
```
|
||||
|
||||
## What Changed From s06
|
||||
|
||||
| Component | Before (s06) | After (s07) |
|
||||
|----------------|------------------|----------------------------|
|
||||
| Tools | 5 | 8 (+task_create/update/list/get)|
|
||||
| State storage | In-memory only | JSON files in .tasks/ |
|
||||
| Dependencies | None | blockedBy + blocks graph |
|
||||
| Compression | Three-layer | Removed (different focus) |
|
||||
| Persistence | Lost on compact | Survives compression |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
File-based state survives context compression. When the agent's conversation is compacted, in-memory state is lost, but tasks written to disk persist. The dependency graph ensures correct execution order even after context loss. This is the bridge between ephemeral conversation and persistent work -- the agent can forget conversation details but always has the task board to remind it what needs doing. The filesystem as source of truth also enables future multi-agent sharing, since any process can read the same JSON files.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s07_task_system.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Create 3 tasks: "Setup project", "Write code", "Write tests". Make them depend on each other in order.`
|
||||
2. `List all tasks and show the dependency graph`
|
||||
3. `Complete task 1 and then list tasks to see task 2 unblocked`
|
||||
4. `Create a task board for refactoring: parse -> transform -> emit -> test`
|
||||
188
docs/en/s08-background-tasks.md
Normal file
188
docs/en/s08-background-tasks.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# s08: Background Tasks
|
||||
|
||||
> A BackgroundManager runs commands in separate threads and drains a notification queue before each LLM call, so the agent never blocks on long-running operations.
|
||||
|
||||
## The Problem
|
||||
|
||||
Some commands take minutes: `npm install`, `pytest`, `docker build`. With
|
||||
a blocking agent loop, the model sits idle waiting for the subprocess to
|
||||
finish. It cannot do anything else. If the user asked "install dependencies
|
||||
and while that runs, create the config file," the agent would install
|
||||
first, _then_ create the config -- sequentially, not in parallel.
|
||||
|
||||
The agent needs concurrency. Not full multi-threading of the agent loop
|
||||
itself, but the ability to fire off a long command and continue working
|
||||
while it runs. When the command finishes, its result should appear
|
||||
naturally in the conversation.
|
||||
|
||||
The solution is a BackgroundManager that runs commands in daemon threads
|
||||
and collects results in a notification queue. Before each LLM call, the
|
||||
queue is drained and results are injected into the messages.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
Main thread Background thread
|
||||
+-----------------+ +-----------------+
|
||||
| agent loop | | task executes |
|
||||
| ... | | ... |
|
||||
| [LLM call] <---+------- | enqueue(result) |
|
||||
| ^drain queue | +-----------------+
|
||||
+-----------------+
|
||||
|
||||
Timeline:
|
||||
Agent --[spawn A]--[spawn B]--[other work]----
|
||||
| |
|
||||
v v
|
||||
[A runs] [B runs] (parallel)
|
||||
| |
|
||||
+-- notification queue --+
|
||||
|
|
||||
[results injected before
|
||||
next LLM call]
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The BackgroundManager tracks tasks and maintains a thread-safe
|
||||
notification queue.
|
||||
|
||||
```python
|
||||
class BackgroundManager:
|
||||
def __init__(self):
|
||||
self.tasks = {}
|
||||
self._notification_queue = []
|
||||
self._lock = threading.Lock()
|
||||
```
|
||||
|
||||
2. `run()` starts a daemon thread and returns a task_id immediately.
|
||||
|
||||
```python
|
||||
def run(self, command: str) -> str:
|
||||
task_id = str(uuid.uuid4())[:8]
|
||||
self.tasks[task_id] = {
|
||||
"status": "running",
|
||||
"result": None,
|
||||
"command": command,
|
||||
}
|
||||
thread = threading.Thread(
|
||||
target=self._execute,
|
||||
args=(task_id, command),
|
||||
daemon=True,
|
||||
)
|
||||
thread.start()
|
||||
return f"Background task {task_id} started"
|
||||
```
|
||||
|
||||
3. The thread target `_execute` runs the subprocess and pushes
|
||||
results to the notification queue.
|
||||
|
||||
```python
|
||||
def _execute(self, task_id: str, command: str):
|
||||
try:
|
||||
r = subprocess.run(command, shell=True, cwd=WORKDIR,
|
||||
capture_output=True, text=True, timeout=300)
|
||||
output = (r.stdout + r.stderr).strip()[:50000]
|
||||
status = "completed"
|
||||
except subprocess.TimeoutExpired:
|
||||
output = "Error: Timeout (300s)"
|
||||
status = "timeout"
|
||||
self.tasks[task_id]["status"] = status
|
||||
self.tasks[task_id]["result"] = output
|
||||
with self._lock:
|
||||
self._notification_queue.append({
|
||||
"task_id": task_id,
|
||||
"status": status,
|
||||
"result": output[:500],
|
||||
})
|
||||
```
|
||||
|
||||
4. `drain_notifications()` returns and clears pending results.
|
||||
|
||||
```python
|
||||
def drain_notifications(self) -> list:
|
||||
with self._lock:
|
||||
notifs = list(self._notification_queue)
|
||||
self._notification_queue.clear()
|
||||
return notifs
|
||||
```
|
||||
|
||||
5. The agent loop drains notifications before each LLM call.
|
||||
|
||||
```python
|
||||
def agent_loop(messages: list):
|
||||
while True:
|
||||
notifs = BG.drain_notifications()
|
||||
if notifs and messages:
|
||||
notif_text = "\n".join(
|
||||
f"[bg:{n['task_id']}] {n['status']}: "
|
||||
f"{n['result']}" for n in notifs
|
||||
)
|
||||
messages.append({"role": "user",
|
||||
"content": f"<background-results>"
|
||||
f"\n{notif_text}\n"
|
||||
f"</background-results>"})
|
||||
messages.append({"role": "assistant",
|
||||
"content": "Noted background results."})
|
||||
response = client.messages.create(...)
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The BackgroundManager (from `agents/s08_background_tasks.py`, lines 49-107):
|
||||
|
||||
```python
|
||||
class BackgroundManager:
|
||||
def __init__(self):
|
||||
self.tasks = {}
|
||||
self._notification_queue = []
|
||||
self._lock = threading.Lock()
|
||||
|
||||
def run(self, command: str) -> str:
|
||||
task_id = str(uuid.uuid4())[:8]
|
||||
self.tasks[task_id] = {"status": "running",
|
||||
"result": None,
|
||||
"command": command}
|
||||
thread = threading.Thread(
|
||||
target=self._execute,
|
||||
args=(task_id, command), daemon=True)
|
||||
thread.start()
|
||||
return f"Background task {task_id} started"
|
||||
|
||||
def _execute(self, task_id, command):
|
||||
# run subprocess, push to queue
|
||||
...
|
||||
|
||||
def drain_notifications(self) -> list:
|
||||
with self._lock:
|
||||
notifs = list(self._notification_queue)
|
||||
self._notification_queue.clear()
|
||||
return notifs
|
||||
```
|
||||
|
||||
## What Changed From s07
|
||||
|
||||
| Component | Before (s07) | After (s08) |
|
||||
|----------------|------------------|----------------------------|
|
||||
| Tools | 8 | 6 (base + background_run + check)|
|
||||
| Execution | Blocking only | Blocking + background threads|
|
||||
| Notification | None | Queue drained per loop |
|
||||
| Concurrency | None | Daemon threads |
|
||||
| Task system | File-based CRUD | Removed (different focus) |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
The agent loop is inherently single-threaded (one LLM call at a time). Background threads break this constraint for I/O-bound work (tests, builds, installs). The notification queue pattern ("drain before next LLM call") ensures results arrive at natural conversation breakpoints rather than interrupting the model's reasoning mid-thought. This is a minimal concurrency model: the agent loop stays single-threaded and deterministic, while only the I/O-bound subprocess execution is parallelized.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s08_background_tasks.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Run "sleep 5 && echo done" in the background, then create a file while it runs`
|
||||
2. `Start 3 background tasks: "sleep 2", "sleep 4", "sleep 6". Check their status.`
|
||||
3. `Run pytest in the background and keep working on other things`
|
||||
233
docs/en/s09-agent-teams.md
Normal file
233
docs/en/s09-agent-teams.md
Normal file
@@ -0,0 +1,233 @@
|
||||
# s09: Agent Teams
|
||||
|
||||
> Persistent teammates with JSONL inboxes turn isolated agents into a communicating team -- spawn, message, broadcast, and drain.
|
||||
|
||||
## The Problem
|
||||
|
||||
Subagents (s04) are disposable: spawn, work, return summary, die. They
|
||||
have no identity, no memory between invocations, and no way to receive
|
||||
follow-up instructions. Background tasks (s08) run shell commands but
|
||||
cannot make LLM-guided decisions or communicate findings.
|
||||
|
||||
For real teamwork you need three things: (1) persistent agents that
|
||||
survive beyond a single prompt, (2) identity and lifecycle management,
|
||||
and (3) a communication channel between agents. Without messaging, even
|
||||
persistent teammates are deaf and mute -- they can work in parallel but
|
||||
never coordinate.
|
||||
|
||||
The solution combines a TeammateManager for spawning persistent named
|
||||
agents with a MessageBus using JSONL inbox files. Each teammate runs
|
||||
its own agent loop in a thread, checks its inbox before every LLM call,
|
||||
and can send messages to any other teammate or the lead.
|
||||
|
||||
Note on the s06-to-s07 bridge: TodoManager items from s03 die with
|
||||
compression (s06). File-based tasks (s07) survive compression because
|
||||
they live on disk. Teams build on this same principle -- config.json and
|
||||
inbox files persist outside the context window.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
Teammate lifecycle:
|
||||
spawn -> WORKING -> IDLE -> WORKING -> ... -> SHUTDOWN
|
||||
|
||||
Communication:
|
||||
.team/
|
||||
config.json <- team roster + statuses
|
||||
inbox/
|
||||
alice.jsonl <- append-only, drain-on-read
|
||||
bob.jsonl
|
||||
lead.jsonl
|
||||
|
||||
+--------+ send("alice","bob","...") +--------+
|
||||
| alice | -----------------------------> | bob |
|
||||
| loop | bob.jsonl << {json_line} | loop |
|
||||
+--------+ +--------+
|
||||
^ |
|
||||
| BUS.read_inbox("alice") |
|
||||
+---- alice.jsonl -> read + drain ---------+
|
||||
|
||||
5 message types:
|
||||
+-------------------------+------------------------------+
|
||||
| message | Normal text between agents |
|
||||
| broadcast | Sent to all teammates |
|
||||
| shutdown_request | Request graceful shutdown |
|
||||
| shutdown_response | Approve/reject shutdown |
|
||||
| plan_approval_response | Approve/reject plan |
|
||||
+-------------------------+------------------------------+
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The TeammateManager maintains config.json with the team roster.
|
||||
Each member has a name, role, and status.
|
||||
|
||||
```python
|
||||
class TeammateManager:
|
||||
def __init__(self, team_dir: Path):
|
||||
self.dir = team_dir
|
||||
self.dir.mkdir(exist_ok=True)
|
||||
self.config_path = self.dir / "config.json"
|
||||
self.config = self._load_config()
|
||||
self.threads = {}
|
||||
```
|
||||
|
||||
2. `spawn()` creates a teammate and starts its agent loop in a thread.
|
||||
Re-spawning an idle teammate reactivates it.
|
||||
|
||||
```python
|
||||
def spawn(self, name: str, role: str, prompt: str) -> str:
|
||||
member = self._find_member(name)
|
||||
if member:
|
||||
if member["status"] not in ("idle", "shutdown"):
|
||||
return f"Error: '{name}' is currently {member['status']}"
|
||||
member["status"] = "working"
|
||||
else:
|
||||
member = {"name": name, "role": role, "status": "working"}
|
||||
self.config["members"].append(member)
|
||||
self._save_config()
|
||||
thread = threading.Thread(
|
||||
target=self._teammate_loop,
|
||||
args=(name, role, prompt), daemon=True)
|
||||
self.threads[name] = thread
|
||||
thread.start()
|
||||
return f"Spawned teammate '{name}' (role: {role})"
|
||||
```
|
||||
|
||||
3. The MessageBus handles JSONL inbox files. `send()` appends a JSON
|
||||
line; `read_inbox()` reads all lines and drains the file.
|
||||
|
||||
```python
|
||||
class MessageBus:
|
||||
def send(self, sender, to, content,
|
||||
msg_type="message", extra=None):
|
||||
msg = {"type": msg_type, "from": sender,
|
||||
"content": content,
|
||||
"timestamp": time.time()}
|
||||
if extra:
|
||||
msg.update(extra)
|
||||
with open(self.dir / f"{to}.jsonl", "a") as f:
|
||||
f.write(json.dumps(msg) + "\n")
|
||||
return f"Sent {msg_type} to {to}"
|
||||
|
||||
def read_inbox(self, name):
|
||||
path = self.dir / f"{name}.jsonl"
|
||||
if not path.exists():
|
||||
return "[]"
|
||||
msgs = [json.loads(l)
|
||||
for l in path.read_text().strip().splitlines()
|
||||
if l]
|
||||
path.write_text("") # drain
|
||||
return json.dumps(msgs, indent=2)
|
||||
```
|
||||
|
||||
4. Each teammate checks its inbox before every LLM call and injects
|
||||
received messages into the conversation context.
|
||||
|
||||
```python
|
||||
def _teammate_loop(self, name, role, prompt):
|
||||
sys_prompt = f"You are '{name}', role: {role}, at {WORKDIR}."
|
||||
messages = [{"role": "user", "content": prompt}]
|
||||
for _ in range(50):
|
||||
inbox = BUS.read_inbox(name)
|
||||
if inbox != "[]":
|
||||
messages.append({"role": "user",
|
||||
"content": f"<inbox>{inbox}</inbox>"})
|
||||
messages.append({"role": "assistant",
|
||||
"content": "Noted inbox messages."})
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=sys_prompt,
|
||||
messages=messages, tools=TOOLS)
|
||||
messages.append({"role": "assistant",
|
||||
"content": response.content})
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
# execute tools, append results...
|
||||
self._find_member(name)["status"] = "idle"
|
||||
self._save_config()
|
||||
```
|
||||
|
||||
5. `broadcast()` sends the same message to all teammates except the
|
||||
sender.
|
||||
|
||||
```python
|
||||
def broadcast(self, sender, content, teammates):
|
||||
count = 0
|
||||
for name in teammates:
|
||||
if name != sender:
|
||||
self.send(sender, name, content, "broadcast")
|
||||
count += 1
|
||||
return f"Broadcast to {count} teammates"
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The TeammateManager + MessageBus core (from `agents/s09_agent_teams.py`):
|
||||
|
||||
```python
|
||||
class TeammateManager:
|
||||
def spawn(self, name, role, prompt):
|
||||
member = self._find_member(name) or {
|
||||
"name": name, "role": role, "status": "working"
|
||||
}
|
||||
member["status"] = "working"
|
||||
self._save_config()
|
||||
thread = threading.Thread(
|
||||
target=self._teammate_loop,
|
||||
args=(name, role, prompt), daemon=True)
|
||||
thread.start()
|
||||
return f"Spawned '{name}'"
|
||||
|
||||
class MessageBus:
|
||||
def send(self, sender, to, content,
|
||||
msg_type="message", extra=None):
|
||||
msg = {"type": msg_type, "from": sender,
|
||||
"content": content, "timestamp": time.time()}
|
||||
if extra: msg.update(extra)
|
||||
with open(self.dir / f"{to}.jsonl", "a") as f:
|
||||
f.write(json.dumps(msg) + "\n")
|
||||
|
||||
def read_inbox(self, name):
|
||||
path = self.dir / f"{name}.jsonl"
|
||||
if not path.exists(): return "[]"
|
||||
msgs = [json.loads(l)
|
||||
for l in path.read_text().strip().splitlines()
|
||||
if l]
|
||||
path.write_text("")
|
||||
return json.dumps(msgs, indent=2)
|
||||
```
|
||||
|
||||
## What Changed From s08
|
||||
|
||||
| Component | Before (s08) | After (s09) |
|
||||
|----------------|------------------|----------------------------|
|
||||
| Tools | 6 | 9 (+spawn/send/read_inbox) |
|
||||
| Agents | Single | Lead + N teammates |
|
||||
| Persistence | None | config.json + JSONL inboxes|
|
||||
| Threads | Background cmds | Full agent loops per thread|
|
||||
| Lifecycle | Fire-and-forget | idle -> working -> idle |
|
||||
| Communication | None | 5 message types + broadcast|
|
||||
|
||||
Teaching simplification: this implementation does not use lock files
|
||||
for inbox access. In production, concurrent append from multiple writers
|
||||
would need file locking or atomic rename. The single-writer-per-inbox
|
||||
pattern used here is safe for the teaching scenario.
|
||||
|
||||
## Design Rationale
|
||||
|
||||
File-based mailboxes (append-only JSONL) provide concurrency-safe inter-agent communication. Append is atomic on most filesystems, avoiding lock contention. The "drain on read" pattern (read all, truncate) gives batch delivery. This is simpler and more robust than shared memory or socket-based IPC for agent coordination. The tradeoff is latency -- messages are only seen at the next poll -- but for LLM-driven agents where each turn takes seconds, polling latency is negligible compared to inference time.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s09_agent_teams.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Spawn alice (coder) and bob (tester). Have alice send bob a message.`
|
||||
2. `Broadcast "status update: phase 1 complete" to all teammates`
|
||||
3. `Check the lead inbox for any messages`
|
||||
4. Type `/team` to see the team roster with statuses
|
||||
5. Type `/inbox` to manually check the lead's inbox
|
||||
204
docs/en/s10-team-protocols.md
Normal file
204
docs/en/s10-team-protocols.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# s10: Team Protocols
|
||||
|
||||
> The same request_id handshake pattern powers both shutdown and plan approval -- one FSM, two applications.
|
||||
|
||||
## The Problem
|
||||
|
||||
In s09, teammates work and communicate but there is no structured
|
||||
coordination. Two problems arise:
|
||||
|
||||
**Shutdown**: How do you stop a teammate cleanly? Killing the thread
|
||||
leaves files partially written and config.json in a wrong state.
|
||||
Graceful shutdown requires a handshake: the lead requests, the teammate
|
||||
decides whether to approve (finish and exit) or reject (keep working).
|
||||
|
||||
**Plan approval**: How do you gate execution? When the lead says
|
||||
"refactor the auth module," the teammate starts immediately. For
|
||||
high-risk changes, the lead should review the plan before execution
|
||||
begins. A junior proposes, a senior approves.
|
||||
|
||||
Both problems share the same structure: one side sends a request with a
|
||||
unique ID, the other side responds referencing that ID. A finite state
|
||||
machine tracks each request through pending -> approved | rejected.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
Shutdown Protocol Plan Approval Protocol
|
||||
================== ======================
|
||||
|
||||
Lead Teammate Teammate Lead
|
||||
| | | |
|
||||
|--shutdown_req-->| |--plan_req------>|
|
||||
| {req_id:"abc"} | | {req_id:"xyz"} |
|
||||
| | | |
|
||||
|<--shutdown_resp-| |<--plan_resp-----|
|
||||
| {req_id:"abc", | | {req_id:"xyz", |
|
||||
| approve:true} | | approve:true} |
|
||||
| | | |
|
||||
v v v v
|
||||
tracker["abc"] exits proceeds tracker["xyz"]
|
||||
= approved = approved
|
||||
|
||||
Shared FSM (identical for both protocols):
|
||||
[pending] --approve--> [approved]
|
||||
[pending] --reject---> [rejected]
|
||||
|
||||
Trackers:
|
||||
shutdown_requests = {req_id: {target, status}}
|
||||
plan_requests = {req_id: {from, plan, status}}
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The lead initiates shutdown by generating a request_id and sending
|
||||
a shutdown_request through the inbox.
|
||||
|
||||
```python
|
||||
shutdown_requests = {}
|
||||
|
||||
def handle_shutdown_request(teammate: str) -> str:
|
||||
req_id = str(uuid.uuid4())[:8]
|
||||
shutdown_requests[req_id] = {
|
||||
"target": teammate, "status": "pending",
|
||||
}
|
||||
BUS.send("lead", teammate, "Please shut down gracefully.",
|
||||
"shutdown_request", {"request_id": req_id})
|
||||
return f"Shutdown request {req_id} sent (status: pending)"
|
||||
```
|
||||
|
||||
2. The teammate receives the request in its inbox and calls the
|
||||
`shutdown_response` tool to approve or reject.
|
||||
|
||||
```python
|
||||
if tool_name == "shutdown_response":
|
||||
req_id = args["request_id"]
|
||||
approve = args["approve"]
|
||||
if req_id in shutdown_requests:
|
||||
shutdown_requests[req_id]["status"] = \
|
||||
"approved" if approve else "rejected"
|
||||
BUS.send(sender, "lead", args.get("reason", ""),
|
||||
"shutdown_response",
|
||||
{"request_id": req_id, "approve": approve})
|
||||
return f"Shutdown {'approved' if approve else 'rejected'}"
|
||||
```
|
||||
|
||||
3. The teammate loop checks for approved shutdown and exits.
|
||||
|
||||
```python
|
||||
if (block.name == "shutdown_response"
|
||||
and block.input.get("approve")):
|
||||
should_exit = True
|
||||
# ...
|
||||
member["status"] = "shutdown" if should_exit else "idle"
|
||||
```
|
||||
|
||||
4. Plan approval follows the identical pattern. The teammate submits
|
||||
a plan, generating a request_id.
|
||||
|
||||
```python
|
||||
plan_requests = {}
|
||||
|
||||
if tool_name == "plan_approval":
|
||||
plan_text = args.get("plan", "")
|
||||
req_id = str(uuid.uuid4())[:8]
|
||||
plan_requests[req_id] = {
|
||||
"from": sender, "plan": plan_text,
|
||||
"status": "pending",
|
||||
}
|
||||
BUS.send(sender, "lead", plan_text,
|
||||
"plan_approval_request",
|
||||
{"request_id": req_id, "plan": plan_text})
|
||||
return f"Plan submitted (request_id={req_id})"
|
||||
```
|
||||
|
||||
5. The lead reviews and responds with the same request_id.
|
||||
|
||||
```python
|
||||
def handle_plan_review(request_id, approve, feedback=""):
|
||||
req = plan_requests.get(request_id)
|
||||
if not req:
|
||||
return f"Error: Unknown request_id '{request_id}'"
|
||||
req["status"] = "approved" if approve else "rejected"
|
||||
BUS.send("lead", req["from"], feedback,
|
||||
"plan_approval_response",
|
||||
{"request_id": request_id,
|
||||
"approve": approve,
|
||||
"feedback": feedback})
|
||||
return f"Plan {req['status']} for '{req['from']}'"
|
||||
```
|
||||
|
||||
6. Both protocols use the same `plan_approval` tool name with two
|
||||
modes: teammates submit (no request_id), the lead reviews (with
|
||||
request_id).
|
||||
|
||||
```python
|
||||
# Lead tool dispatch:
|
||||
"plan_approval": lambda **kw: handle_plan_review(
|
||||
kw["request_id"], kw["approve"],
|
||||
kw.get("feedback", "")),
|
||||
# Teammate: submit mode (generate request_id)
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The dual protocol handlers (from `agents/s10_team_protocols.py`):
|
||||
|
||||
```python
|
||||
shutdown_requests = {}
|
||||
plan_requests = {}
|
||||
|
||||
# -- Shutdown --
|
||||
def handle_shutdown_request(teammate):
|
||||
req_id = str(uuid.uuid4())[:8]
|
||||
shutdown_requests[req_id] = {
|
||||
"target": teammate, "status": "pending"
|
||||
}
|
||||
BUS.send("lead", teammate,
|
||||
"Please shut down gracefully.",
|
||||
"shutdown_request",
|
||||
{"request_id": req_id})
|
||||
|
||||
# -- Plan Approval --
|
||||
def handle_plan_review(request_id, approve, feedback=""):
|
||||
req = plan_requests[request_id]
|
||||
req["status"] = "approved" if approve else "rejected"
|
||||
BUS.send("lead", req["from"], feedback,
|
||||
"plan_approval_response",
|
||||
{"request_id": request_id,
|
||||
"approve": approve})
|
||||
|
||||
# Both use the same FSM:
|
||||
# pending -> approved | rejected
|
||||
# Both correlate by request_id across async inboxes
|
||||
```
|
||||
|
||||
## What Changed From s09
|
||||
|
||||
| Component | Before (s09) | After (s10) |
|
||||
|----------------|------------------|------------------------------|
|
||||
| Tools | 9 | 12 (+shutdown_req/resp +plan)|
|
||||
| Shutdown | Natural exit only| Request-response handshake |
|
||||
| Plan gating | None | Submit/review with approval |
|
||||
| Request tracking| None | Two tracker dicts |
|
||||
| Correlation | None | request_id per request |
|
||||
| FSM | None | pending -> approved/rejected |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
The request_id correlation pattern turns any async interaction into a trackable finite state machine. The same 3-state machine (pending -> approved/rejected) applies to shutdown, plan approval, or any future protocol. This is why one pattern handles multiple protocols -- the FSM does not care what it is approving. The request_id provides correlation across async inboxes where messages may arrive out of order, making the pattern robust to timing variations between agents.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s10_team_protocols.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Spawn alice as a coder. Then request her shutdown.`
|
||||
2. `List teammates to see alice's status after shutdown approval`
|
||||
3. `Spawn bob with a risky refactoring task. Review and reject his plan.`
|
||||
4. `Spawn charlie, have him submit a plan, then approve it.`
|
||||
5. Type `/team` to monitor statuses
|
||||
232
docs/en/s11-autonomous-agents.md
Normal file
232
docs/en/s11-autonomous-agents.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# s11: Autonomous Agents
|
||||
|
||||
> An idle cycle with task board polling lets teammates find and claim work themselves, with identity re-injection after context compression.
|
||||
|
||||
## The Problem
|
||||
|
||||
In s09-s10, teammates only work when explicitly told to. The lead must
|
||||
spawn each teammate with a specific prompt. If the task board has 10
|
||||
unclaimed tasks, the lead must manually assign each one. This does not
|
||||
scale.
|
||||
|
||||
True autonomy means teammates find work themselves. When a teammate
|
||||
finishes its current task, it should scan the task board for unclaimed
|
||||
work, claim a task, and start working -- without any instruction from
|
||||
the lead.
|
||||
|
||||
But autonomous agents face a subtlety: after context compression, the
|
||||
agent might forget who it is. If the messages are summarized, the
|
||||
original system prompt identity ("you are alice, role: coder") fades.
|
||||
Identity re-injection solves this by inserting an identity block at the
|
||||
start of compressed contexts.
|
||||
|
||||
Teaching simplification: the token estimation used here is rough
|
||||
(characters / 4). Production systems use proper tokenizer libraries.
|
||||
The nag threshold of 3 rounds (from s03) is set low for teaching
|
||||
visibility; production agents typically use a higher threshold around 10.
|
||||
|
||||
## The Solution
|
||||
|
||||
```
|
||||
Teammate lifecycle with idle cycle:
|
||||
|
||||
+-------+
|
||||
| spawn |
|
||||
+---+---+
|
||||
|
|
||||
v
|
||||
+-------+ tool_use +-------+
|
||||
| WORK | <------------- | LLM |
|
||||
+---+---+ +-------+
|
||||
|
|
||||
| stop_reason != tool_use
|
||||
| (or idle tool called)
|
||||
v
|
||||
+--------+
|
||||
| IDLE | poll every 5s for up to 60s
|
||||
+---+----+
|
||||
|
|
||||
+---> check inbox --> message? ----------> WORK
|
||||
|
|
||||
+---> scan .tasks/ --> unclaimed? -------> claim -> WORK
|
||||
|
|
||||
+---> 60s timeout ----------------------> SHUTDOWN
|
||||
|
||||
Identity re-injection after compression:
|
||||
if len(messages) <= 3:
|
||||
messages.insert(0, identity_block)
|
||||
"You are 'alice', role: coder, team: my-team"
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The teammate loop has two phases: WORK and IDLE. WORK runs the
|
||||
standard agent loop. When the LLM stops calling tools (or calls
|
||||
the `idle` tool), the teammate enters the IDLE phase.
|
||||
|
||||
```python
|
||||
def _loop(self, name, role, prompt):
|
||||
while True:
|
||||
# -- WORK PHASE --
|
||||
messages = [{"role": "user", "content": prompt}]
|
||||
for _ in range(50):
|
||||
inbox = BUS.read_inbox(name)
|
||||
for msg in inbox:
|
||||
if msg.get("type") == "shutdown_request":
|
||||
self._set_status(name, "shutdown")
|
||||
return
|
||||
messages.append(...)
|
||||
response = client.messages.create(...)
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
# execute tools...
|
||||
if idle_requested:
|
||||
break
|
||||
|
||||
# -- IDLE PHASE --
|
||||
self._set_status(name, "idle")
|
||||
resume = self._idle_poll(name, messages)
|
||||
if not resume:
|
||||
self._set_status(name, "shutdown")
|
||||
return
|
||||
self._set_status(name, "working")
|
||||
```
|
||||
|
||||
2. The idle phase polls the inbox and task board in a loop.
|
||||
|
||||
```python
|
||||
def _idle_poll(self, name, messages):
|
||||
polls = IDLE_TIMEOUT // POLL_INTERVAL # 60s / 5s = 12
|
||||
for _ in range(polls):
|
||||
time.sleep(POLL_INTERVAL)
|
||||
# Check inbox for new messages
|
||||
inbox = BUS.read_inbox(name)
|
||||
if inbox:
|
||||
messages.append({"role": "user",
|
||||
"content": f"<inbox>{inbox}</inbox>"})
|
||||
return True
|
||||
# Scan task board for unclaimed tasks
|
||||
unclaimed = scan_unclaimed_tasks()
|
||||
if unclaimed:
|
||||
task = unclaimed[0]
|
||||
claim_task(task["id"], name)
|
||||
messages.append({"role": "user",
|
||||
"content": f"<auto-claimed>Task #{task['id']}: "
|
||||
f"{task['subject']}</auto-claimed>"})
|
||||
return True
|
||||
return False # timeout -> shutdown
|
||||
```
|
||||
|
||||
3. Task board scanning looks for pending, unowned, unblocked tasks.
|
||||
|
||||
```python
|
||||
def scan_unclaimed_tasks() -> list:
|
||||
TASKS_DIR.mkdir(exist_ok=True)
|
||||
unclaimed = []
|
||||
for f in sorted(TASKS_DIR.glob("task_*.json")):
|
||||
task = json.loads(f.read_text())
|
||||
if (task.get("status") == "pending"
|
||||
and not task.get("owner")
|
||||
and not task.get("blockedBy")):
|
||||
unclaimed.append(task)
|
||||
return unclaimed
|
||||
|
||||
def claim_task(task_id: int, owner: str):
|
||||
path = TASKS_DIR / f"task_{task_id}.json"
|
||||
task = json.loads(path.read_text())
|
||||
task["status"] = "in_progress"
|
||||
task["owner"] = owner
|
||||
path.write_text(json.dumps(task, indent=2))
|
||||
```
|
||||
|
||||
4. Identity re-injection inserts an identity block when the context
|
||||
is too short, indicating compression has occurred.
|
||||
|
||||
```python
|
||||
def make_identity_block(name, role, team_name):
|
||||
return {"role": "user",
|
||||
"content": f"<identity>You are '{name}', "
|
||||
f"role: {role}, team: {team_name}. "
|
||||
f"Continue your work.</identity>"}
|
||||
|
||||
# Before resuming work after idle:
|
||||
if len(messages) <= 3:
|
||||
messages.insert(0, make_identity_block(
|
||||
name, role, team_name))
|
||||
messages.insert(1, {"role": "assistant",
|
||||
"content": f"I am {name}. Continuing."})
|
||||
```
|
||||
|
||||
5. The `idle` tool lets the teammate explicitly signal it has no more
|
||||
work, entering the idle polling phase early.
|
||||
|
||||
```python
|
||||
{"name": "idle",
|
||||
"description": "Signal that you have no more work. "
|
||||
"Enters idle polling phase.",
|
||||
"input_schema": {"type": "object", "properties": {}}},
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The autonomous loop (from `agents/s11_autonomous_agents.py`):
|
||||
|
||||
```python
|
||||
def _loop(self, name, role, prompt):
|
||||
while True:
|
||||
# WORK PHASE
|
||||
for _ in range(50):
|
||||
response = client.messages.create(...)
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
for block in response.content:
|
||||
if block.name == "idle":
|
||||
idle_requested = True
|
||||
if idle_requested:
|
||||
break
|
||||
|
||||
# IDLE PHASE
|
||||
self._set_status(name, "idle")
|
||||
for _ in range(IDLE_TIMEOUT // POLL_INTERVAL):
|
||||
time.sleep(POLL_INTERVAL)
|
||||
inbox = BUS.read_inbox(name)
|
||||
if inbox: resume = True; break
|
||||
unclaimed = scan_unclaimed_tasks()
|
||||
if unclaimed:
|
||||
claim_task(unclaimed[0]["id"], name)
|
||||
resume = True; break
|
||||
if not resume:
|
||||
self._set_status(name, "shutdown")
|
||||
return
|
||||
self._set_status(name, "working")
|
||||
```
|
||||
|
||||
## What Changed From s10
|
||||
|
||||
| Component | Before (s10) | After (s11) |
|
||||
|----------------|------------------|----------------------------|
|
||||
| Tools | 12 | 14 (+idle, +claim_task) |
|
||||
| Autonomy | Lead-directed | Self-organizing |
|
||||
| Idle phase | None | Poll inbox + task board |
|
||||
| Task claiming | Manual only | Auto-claim unclaimed tasks |
|
||||
| Identity | System prompt | + re-injection after compress|
|
||||
| Timeout | None | 60s idle -> auto shutdown |
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Polling + timeout makes agents self-organizing without a central coordinator. Each agent independently polls the task board, claims unclaimed work, and returns to idle when done. The timeout triggers the poll cycle, and if no work appears within the window, the agent shuts itself down. This is the same pattern as work-stealing thread pools -- distributed, no single point of failure. Identity re-injection after compression ensures agents maintain their role even when conversation history is summarized away.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
cd learn-claude-code
|
||||
python agents/s11_autonomous_agents.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `Create 3 tasks on the board, then spawn alice and bob. Watch them auto-claim.`
|
||||
2. `Spawn a coder teammate and let it find work from the task board itself`
|
||||
3. `Create tasks with dependencies. Watch teammates respect the blocked order.`
|
||||
4. Type `/tasks` to see the task board with owners
|
||||
5. Type `/team` to monitor who is working vs idle
|
||||
Reference in New Issue
Block a user