Follow up PR #265: refine chapters, diagrams, and add S20 (#283)

* feat: s01-s14 docs quality overhaul — tool pipeline, single-agent, knowledge & resilience

Rewrite code.py and README (zh/en/ja) for s01-s14, each chapter building
incrementally on the previous. Key fixes across chapters:

- s01-s04: agent loop, tool dispatch, permission pipeline, hooks
- s05-s08: todo write, subagent, skill loading, context compact
- s09-s11: memory system, system prompt assembly, error recovery
- s12-s14: task graph, background tasks, cron scheduler

All chapters CC source-verified. Code inherits fixes forward (PROMPT_SECTIONS,
json.dumps cache, real-state context, can_start dep protection, etc.).

* feat: s15-s19 docs quality overhaul — multi-agent platform: teams, protocols, autonomy, worktree, MCP tools

Rewrite code.py and README (zh/en/ja) for s15-s19, the multi-agent platform
chapters. Each chapter inherits all previous fixes and adds one mechanism:

- s15: agent teams (TeamCreate, teammate threads, shared task list)
- s16: team protocols (plan approval, shutdown handshake, consume_inbox)
- s17: autonomous agents (idle polling, auto-claim, consume_lead_inbox)
- s18: worktree isolation (git worktree, bind_task, cwd switching, safety)
- s19: MCP tools (MCPClient, normalize_mcp_name, assemble_tool_pool, no cache)

All appendix source code references verified against CC source. Config priority
corrected: claude.ai < plugin < user < project < local.

* fix: 5 regressions across s05-s19 — glob safety, todo validation, memory extraction, protocol types, dep crash

- s05-s09: glob results now filter with is_relative_to(WORKDIR) (inherited from s02)
- s06-s08: todo_write validates content/status required fields (inherited from s05)
- s09: extract_memories uses pre-compression snapshot instead of compacted messages
- s16: submit_plan docstring clarifies protocol-only (not code-level gate)
- s17-s19: match_response restores type mismatch validation (from s16)
- s17-s19: claim_task deps list handles missing dep files without crashing

* fix: s12 Todo V2 logic reversal, s14/s15 cron range validation, s18/s19 worktree name validation

- s12 README (zh/en/ja): fix Todo V2 direction — interactive defaults to Task,
  non-interactive/SDK defaults to TodoWrite. Fix env var name to
  CLAUDE_CODE_ENABLE_TASKS (not TODO_V2).
- s14/s15: add _validate_cron_field with per-field range checks (minute 0-59,
  hour 0-23, dom 1-31, month 1-12, dow 0-6), step > 0, range lo <= hi.
  Replace old try/except validation that only caught exceptions.
- s18/s19: add validate_worktree_name() to remove_worktree and keep_worktree,
  not just create_worktree.

* fix: align s16-s19 teaching tool consistency

* fix pr265 chapter diagrams

* Add comprehensive s20 harness chapter

* Fix chapter smoke test regressions

* Clarify README tutorial track transition

---------

Co-authored-by: Haoran <bill-billion@outlook.com>
This commit is contained in:
gui-yue
2026-05-20 21:45:38 +08:00
committed by GitHub
parent c354cf7721
commit 1baf1aca5a
174 changed files with 35833 additions and 353 deletions

View File

@@ -0,0 +1,254 @@
# s10: System Prompt — Assembled at Runtime, Never Hardcoded
[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md)
s01 → ... → s08 → s09 → `s10` → [s11](../s11_error_recovery/) → s12 → ... → s20
> *"prompt is assembled, not hardcoded"* — Sections + on-demand assembly + caching.
>
> **Harness Layer**: Prompt — assembled at runtime, never hardcoded.
---
## The Problem
From s01 to s09, the system prompt was always one hardcoded line:
```python
SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks."
```
That worked for s01 — only bash, read, write. But by s09, the agent has memory, compression, skill loading. The prompt needs to describe more and more capabilities:
```python
SYSTEM = (
f"You are a coding agent at {WORKDIR}. "
"Use tools to solve tasks. Act, don't explain. "
"Before starting any multi-step task, use todo_write. "
"Skills are available via list_skills and load_skill. "
"Relevant memories are injected below when available. "
# ... add a capability, add a line
)
```
Three problems:
1. **Switching projects requires rewriting the entire prompt** — no way to know what to change and what to keep
2. **One change can break others** — adding a tool description might conflict with earlier instructions
3. **Every request carries everything** — even when the current conversation doesn't need certain sections, they waste tokens
The system prompt should be a configuration assembled at runtime based on current state: which tools are enabled, which context is visible, which memories are relevant, and which content must remain stable to hit prompt cache.
---
## The Solution
![System Prompt Overview](images/system-prompt-overview.en.svg)
s10 focuses on prompt assembly. It builds on the s08-s09 capabilities but doesn't re-implement compression or memory. The core change: split the hardcoded `SYSTEM` into independent sections, assemble them at runtime based on real state, and cache the result.
Four sections, two loading strategies:
| Section | Strategy | Content | Condition |
|---------|----------|---------|-----------|
| identity | always | who you are, how to work | always present |
| tools | always | available tool list | `enabled_tools` |
| workspace | always | working directory | always present |
| memory | on-demand | relevant memory content | whether `.memory/MEMORY.md` exists |
Key design: whether a section loads depends on real state (tools exist, files exist), not keywords in messages.
---
## How It Works
### PROMPT_SECTIONS: Topic-Keyed Fragments
Split the monolithic string into a dictionary, each key is a topic:
```python
PROMPT_SECTIONS = {
"identity": "You are a coding agent. Act, don't explain.",
"tools": "Available tools: bash, read_file, write_file.",
"workspace": f"Working directory: {WORKDIR}",
"memory": "Relevant memories are injected below when available.",
}
```
Each section is maintained independently. Changing `tools` doesn't affect `identity`; adding `memory` doesn't touch `workspace`.
### assemble_system_prompt: On-Demand Assembly
Not every section is needed every turn. No memory files? Loading the memory section just wastes tokens. Assembly is based on real state in context:
```python
def assemble_system_prompt(context: dict) -> str:
sections = []
# Always loaded
sections.append(PROMPT_SECTIONS["identity"])
sections.append(PROMPT_SECTIONS["tools"])
sections.append(PROMPT_SECTIONS["workspace"])
# On-demand — based on real state, not keywords
memories = context.get("memories", "")
if memories:
sections.append(f"Relevant memories:\n{memories}")
return "\n\n".join(sections)
```
"Always loaded" sections are needed every turn: identity, tools, workspace. "On-demand" sections are only useful under specific conditions.
Why not load everything? Tokens have cost (system prompt is billed every turn), and fewer instructions means more focused output (irrelevant instructions are noise).
### get_system_prompt: Cache to Avoid Re-Assembly
When context hasn't changed (multiple LLM calls in the same turn with the same context), re-assembling is wasteful. Use deterministic serialization to detect changes and return cached result:
```python
def get_system_prompt(context: dict) -> str:
global _last_context_key, _last_prompt
key = json.dumps(context, sort_keys=True, ensure_ascii=False, default=str)
if key == _last_context_key and _last_prompt:
return _last_prompt
_last_context_key = key
_last_prompt = assemble_system_prompt(context)
return _last_prompt
```
`json.dumps` instead of `hash()`: Python's built-in `hash()` has process randomization (unsuitable for stable cache keys) and throws `unhashable type` on nested dicts/lists.
Note: this cache only avoids redundant string assembly within a process. It's not the same as CC's API prompt cache, which uses `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` to separate static and dynamic parts — the static parts hit global cache and don't invalidate when dynamic content changes.
### context: Real State, Not Keyword Guessing
Context reflects the actual runtime state:
```python
def update_context(context: dict, messages: list) -> dict:
memories = ""
if MEMORY_INDEX.exists():
content = MEMORY_INDEX.read_text().strip()
if content:
memories = content
return {
"enabled_tools": list(TOOL_HANDLERS.keys()),
"workspace": str(WORKDIR),
"memories": memories,
}
```
`enabled_tools` lists actually registered tools. `memories` checks whether `.memory/MEMORY.md` exists. Section loading is based on this real state, not searching for keywords in messages.
### Putting It Together
```python
def agent_loop(messages: list, context: dict):
system = get_system_prompt(context)
while True:
response = client.messages.create(
model=MODEL, system=system, messages=messages,
tools=TOOLS, max_tokens=8000)
# ... tool execution ...
context = update_context(context, messages)
system = get_system_prompt(context)
```
At the start of each loop iteration, get the system prompt. If context changed, re-assemble; if not, return cached version.
---
## Changes From s09
| Component | Before (s09) | After (s10) |
|-----------|-------------|-------------|
| prompt | Hardcoded SYSTEM string | PROMPT_SECTIONS + assemble_system_prompt |
| caching | None | get_system_prompt (json.dumps detection + cache) |
| new functions | — | assemble_system_prompt, get_system_prompt, update_context |
| tools | bash, read_file, write_file (3) | bash, read_file, write_file (3) — unchanged |
| loop | Uses fixed SYSTEM | Uses get_system_prompt(context) |
---
## Try It
```sh
cd learn-claude-code
python s10_system_prompt/code.py
```
What to watch for:
1. Output shows which sections were loaded (`[assembled] sections: ...` label)
2. Cache hits show `[cache hit]` during continued conversation
3. Creating `.memory/MEMORY.md` makes the memory section appear on the next turn
Try these prompts:
1. `Read the file README.md` (observe the three always-loaded sections)
2. `Create a file called .memory/MEMORY.md with content "- [test](test.md) — test memory"` (write a memory index)
3. `Read the file code.py` (observe whether the memory section appears)
---
## What's Next
System prompts can now be assembled at runtime. But the agent still crashes on errors. Network hiccups, API rate limits, truncated output, context overflow — these aren't bugs, they're normal.
s11 Error Recovery → four recovery paths. Upgrade tokens, compress context, exponential backoff, switch models.
<details>
<summary>Deep Dive Into CC Source Code</summary>
> The following is based on analysis of CC source code `constants/prompts.ts` (914 lines), `constants/systemPromptSections.ts` (68 lines), `context.ts` (189 lines), `utils/api.ts` (718 lines), `utils/systemPrompt.ts` (123 lines), and `bootstrap/state.ts`.
### How many sections does CC's system prompt have?
The count varies based on feature flags, output style, KAIROS/Proactive mode, user type, token budget, etc. Roughly two categories:
**Static sections** (always loaded): identity, system, doing_tasks, actions, using_tools, tone_style, output_efficiency, etc.
**Dynamic sections** (loaded by state): session_guidance, memory, ant_model_override, env_info_simple, language, output_style, mcp_instructions, scratchpad, frc, summarize_tool_results, numeric_length_anchors, token_budget, brief, etc.
`mcp_instructions` is the only volatile section (created via `DANGEROUS_uncachedSystemPromptSection()`), because MCP servers can connect and disconnect between turns.
### Assembly Function
```typescript
getSystemPrompt(tools, model, additionalWorkingDirs?, mcpClients?): Promise<string[]>
```
Returns `string[]` (each element is a section), separated by `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` between static and dynamic parts.
### cache scope
When global cache boundary is enabled, static sections are merged into one global cache block, and dynamic sections don't use global cache (`cacheScope: null`). Only paths without boundary or skipping global cache fall back to org scope.
The teaching version's cache only avoids redundant string assembly. CC's three-layer cache:
1. **lodash memoize**: `getSystemContext` and `getUserContext` cached per session (`context.ts`)
2. **Section registry cache**: `STATE.systemPromptSectionCache` caches dynamic section results, cleared on `/clear` or `/compact`
3. **API-level cache**: `splitSysPromptPrefix()` (`api.ts`) splits prompt into blocks with different cache scopes via boundary
### getUserContext vs getSystemContext
| | getSystemContext | getUserContext |
|---|---|---|
| Content | gitStatus, cacheBreaker | CLAUDE.md content, currentDate |
| Injection | appended to system prompt array | prepended as `<system-reminder>` user message |
| When skipped | custom system prompt | always runs |
### How modes change the prompt
- **CLAUDE_CODE_SIMPLE**: entire prompt is 2 lines
- **Proactive/KAIROS**: compact prompt replaces all standard sections
- **Coordinator**: coordinator-specific prompt fully replaces default
- **Agent mode**: agent-defined prompt replaces or appends to default
### Total size
Standard interactive mode system prompt core is ~20-30KB text. CLAUDE_CODE_SIMPLE is ~150 characters. User context (CLAUDE.md) and system context (git status) add on top.
</details>
<!-- translation-sync: zh@v1, en@v1, ja@v1 -->