mirror of
https://github.com/shareAI-lab/analysis_claude_code.git
synced 2026-05-07 00:36:18 +08:00
better doc
This commit is contained in:
@@ -1,27 +1,14 @@
|
||||
# s05: Skills
|
||||
|
||||
> Two-layer skill injection avoids system prompt bloat by putting skill names in the system prompt (cheap) and full skill bodies in tool_result (on demand).
|
||||
`s01 > s02 > s03 > s04 > [ s05 ] s06 | s07 > s08 > s09 > s10 > s11 > s12`
|
||||
|
||||
## The Problem
|
||||
> *"Load on demand, not upfront"* -- inject knowledge via tool_result, not system prompt.
|
||||
|
||||
You want the agent to follow specific workflows for different domains:
|
||||
git conventions, testing patterns, code review checklists. The naive
|
||||
approach is to put everything in the system prompt. But the system prompt
|
||||
has limited effective attention -- too much text and the model starts
|
||||
ignoring parts of it.
|
||||
## Problem
|
||||
|
||||
If you have 10 skills at 2000 tokens each, that is 20,000 tokens of system
|
||||
prompt. The model pays attention to the beginning and end but skims the
|
||||
middle. Worse, most of those skills are irrelevant to any given task. A
|
||||
file editing task does not need the git workflow instructions.
|
||||
You want the agent to follow domain-specific workflows: git conventions, testing patterns, code review checklists. Putting everything in the system prompt wastes tokens on unused skills. 10 skills at 2000 tokens each = 20,000 tokens, most of which are irrelevant to any given task.
|
||||
|
||||
The two-layer approach solves this: Layer 1 puts short skill descriptions
|
||||
in the system prompt (~100 tokens per skill). Layer 2 loads the full skill
|
||||
body into a tool_result only when the model calls `load_skill`. The model
|
||||
learns what skills exist (cheap) and loads them on demand (only when
|
||||
relevant).
|
||||
|
||||
## The Solution
|
||||
## Solution
|
||||
|
||||
```
|
||||
System prompt (Layer 1 -- always present):
|
||||
@@ -38,11 +25,12 @@ When model calls load_skill("git"):
|
||||
| <skill name="git"> |
|
||||
| Full git workflow instructions... | ~2000 tokens
|
||||
| Step 1: ... |
|
||||
| Step 2: ... |
|
||||
| </skill> |
|
||||
+--------------------------------------+
|
||||
```
|
||||
|
||||
Layer 1: skill *names* in system prompt (cheap). Layer 2: full *body* via tool_result (on demand).
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Skill files live in `.skills/` as Markdown with YAML frontmatter.
|
||||
@@ -53,62 +41,7 @@ When model calls load_skill("git"):
|
||||
test.md # ---\n description: Testing patterns\n ---\n ...
|
||||
```
|
||||
|
||||
2. The SkillLoader parses frontmatter and separates metadata from body.
|
||||
|
||||
```python
|
||||
class SkillLoader:
|
||||
def _parse_frontmatter(self, text: str) -> tuple:
|
||||
match = re.match(
|
||||
r"^---\n(.*?)\n---\n(.*)", text, re.DOTALL
|
||||
)
|
||||
if not match:
|
||||
return {}, text
|
||||
meta = {}
|
||||
for line in match.group(1).strip().splitlines():
|
||||
if ":" in line:
|
||||
key, val = line.split(":", 1)
|
||||
meta[key.strip()] = val.strip()
|
||||
return meta, match.group(2).strip()
|
||||
```
|
||||
|
||||
3. Layer 1: `get_descriptions()` returns short lines for the system prompt.
|
||||
|
||||
```python
|
||||
def get_descriptions(self) -> str:
|
||||
lines = []
|
||||
for name, skill in self.skills.items():
|
||||
desc = skill["meta"].get("description", "No description")
|
||||
lines.append(f" - {name}: {desc}")
|
||||
return "\n".join(lines)
|
||||
|
||||
SYSTEM = f"""You are a coding agent at {WORKDIR}.
|
||||
Skills available:
|
||||
{SKILL_LOADER.get_descriptions()}"""
|
||||
```
|
||||
|
||||
4. Layer 2: `get_content()` returns the full body wrapped in `<skill>` tags.
|
||||
|
||||
```python
|
||||
def get_content(self, name: str) -> str:
|
||||
skill = self.skills.get(name)
|
||||
if not skill:
|
||||
return f"Error: Unknown skill '{name}'."
|
||||
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
|
||||
```
|
||||
|
||||
5. The `load_skill` tool is just another entry in the dispatch map.
|
||||
|
||||
```python
|
||||
TOOL_HANDLERS = {
|
||||
# ...base tools...
|
||||
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
|
||||
}
|
||||
```
|
||||
|
||||
## Key Code
|
||||
|
||||
The SkillLoader class (from `agents/s05_skill_loading.py`,
|
||||
lines 51-97):
|
||||
2. SkillLoader parses frontmatter, separates metadata from body.
|
||||
|
||||
```python
|
||||
class SkillLoader:
|
||||
@@ -117,9 +50,7 @@ class SkillLoader:
|
||||
for f in sorted(skills_dir.glob("*.md")):
|
||||
text = f.read_text()
|
||||
meta, body = self._parse_frontmatter(text)
|
||||
self.skills[f.stem] = {
|
||||
"meta": meta, "body": body
|
||||
}
|
||||
self.skills[f.stem] = {"meta": meta, "body": body}
|
||||
|
||||
def get_descriptions(self) -> str:
|
||||
lines = []
|
||||
@@ -132,10 +63,24 @@ class SkillLoader:
|
||||
skill = self.skills.get(name)
|
||||
if not skill:
|
||||
return f"Error: Unknown skill '{name}'."
|
||||
return (f"<skill name=\"{name}\">\n"
|
||||
f"{skill['body']}\n</skill>")
|
||||
return f"<skill name=\"{name}\">\n{skill['body']}\n</skill>"
|
||||
```
|
||||
|
||||
3. Layer 1 goes into the system prompt. Layer 2 is just another tool handler.
|
||||
|
||||
```python
|
||||
SYSTEM = f"""You are a coding agent at {WORKDIR}.
|
||||
Skills available:
|
||||
{SKILL_LOADER.get_descriptions()}"""
|
||||
|
||||
TOOL_HANDLERS = {
|
||||
# ...base tools...
|
||||
"load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
|
||||
}
|
||||
```
|
||||
|
||||
The model learns what skills exist (cheap) and loads them when relevant (expensive).
|
||||
|
||||
## What Changed From s04
|
||||
|
||||
| Component | Before (s04) | After (s05) |
|
||||
@@ -145,10 +90,6 @@ class SkillLoader:
|
||||
| Knowledge | None | .skills/*.md files |
|
||||
| Injection | None | Two-layer (system + result)|
|
||||
|
||||
## Design Rationale
|
||||
|
||||
Two-layer injection solves the attention budget problem. Putting all skill content in the system prompt wastes tokens on unused skills. Layer 1 (compact summaries) costs roughly 120 tokens total. Layer 2 (full content) loads on demand via tool_result. This scales to dozens of skills without degrading model attention quality. The key insight is that the model only needs to know what skills exist (cheap) to decide when to load one (expensive). This is the same lazy-loading principle used in software module systems.
|
||||
|
||||
## Try It
|
||||
|
||||
```sh
|
||||
@@ -156,8 +97,6 @@ cd learn-claude-code
|
||||
python agents/s05_skill_loading.py
|
||||
```
|
||||
|
||||
Example prompts to try:
|
||||
|
||||
1. `What skills are available?`
|
||||
2. `Load the agent-builder skill and follow its instructions`
|
||||
3. `I need to do a code review -- load the relevant skill first`
|
||||
|
||||
Reference in New Issue
Block a user