mirror of
https://github.com/shareAI-lab/analysis_claude_code.git
synced 2026-02-04 13:16:37 +08:00
Compare commits
No commits in common. "e3e23ae9bdfecda4ae7d95405e6f9fa2177168b6" and "67973a976cd47bb8e7789a899a4255e62360e39d" have entirely different histories.
e3e23ae9bd
...
67973a976c
26
.env.example
26
.env.example
@ -1,9 +1,21 @@
|
||||
# Anthropic API Key (required)
|
||||
# Get your key at: https://console.anthropic.com/
|
||||
ANTHROPIC_API_KEY=sk-ant-xxx
|
||||
# Provider Selection (defaults to anthropic for backward compatibility)
|
||||
AI_PROVIDER=anthropic # Options: anthropic, openai, gemini, or any OpenAI-compatible service
|
||||
|
||||
# Base URL (optional, for API proxies)
|
||||
# ANTHROPIC_BASE_URL=https://api.anthropic.com
|
||||
# Model Name (auto-defaults based on provider, but can be overridden)
|
||||
MODEL_NAME=kimi-k2-turbo-preview
|
||||
|
||||
# Model ID (optional, defaults to claude-sonnet-4-5-20250929)
|
||||
# MODEL_ID=claude-sonnet-4-5-20250929
|
||||
# Anthropic Configuration
|
||||
ANTHROPIC_API_KEY=sk-xxx
|
||||
ANTHROPIC_BASE_URL=https://api.moonshot.cn/anthropic
|
||||
|
||||
# OpenAI Configuration
|
||||
OPENAI_API_KEY=sk-xxx
|
||||
OPENAI_BASE_URL=https://api.openai.com/v1
|
||||
|
||||
# Google Gemini Configuration (via OpenAI-compatible endpoint)
|
||||
GEMINI_API_KEY=xxx
|
||||
GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
|
||||
|
||||
# Example: Custom OpenAI-compatible service
|
||||
# CUSTOM_API_KEY=xxx
|
||||
# CUSTOM_BASE_URL=https://api.custom-service.com/v1
|
||||
|
||||
44
.github/workflows/test.yml
vendored
44
.github/workflows/test.yml
vendored
@ -1,44 +0,0 @@
|
||||
name: Test
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
unit-test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
|
||||
- name: Install dependencies
|
||||
run: pip install anthropic python-dotenv
|
||||
|
||||
- name: Run unit tests
|
||||
run: python tests/test_unit.py
|
||||
|
||||
integration-test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.11"
|
||||
|
||||
- name: Install dependencies
|
||||
run: pip install anthropic python-dotenv openai
|
||||
|
||||
- name: Run integration tests
|
||||
env:
|
||||
TEST_API_KEY: ${{ secrets.TEST_API_KEY }}
|
||||
TEST_BASE_URL: ${{ secrets.TEST_BASE_URL }}
|
||||
TEST_MODEL: ${{ secrets.TEST_MODEL }}
|
||||
run: python tests/test_agent.py
|
||||
185
README.md
185
README.md
@ -1,18 +1,14 @@
|
||||
# Learn Claude Code - Bash is all you & agent need
|
||||
|
||||
[](https://www.python.org/downloads/)
|
||||
[](https://github.com/shareAI-lab/learn-claude-code/actions)
|
||||
[](./LICENSE)
|
||||
|
||||
> **Disclaimer**: This is an independent educational project by [shareAI Lab](https://github.com/shareAI-lab). It is not affiliated with, endorsed by, or sponsored by Anthropic. "Claude Code" is a trademark of Anthropic.
|
||||
|
||||
**Learn how modern AI agents work by building one from scratch.**
|
||||
|
||||
[Chinese / 中文](./README_zh.md) | [Japanese / 日本語](./README_ja.md)
|
||||
[中文文档](./README_zh.md)
|
||||
|
||||
---
|
||||
|
||||
## Why This Repository?
|
||||
**A note to readers:**
|
||||
|
||||
We created this repository out of admiration for Claude Code - **what we believe to be the most capable AI coding agent in the world**. Initially, we attempted to reverse-engineer its design through behavioral observation and speculation. The analysis we published was riddled with inaccuracies, unfounded guesses, and technical errors. We deeply apologize to the Claude Code team and anyone who was misled by that content.
|
||||
|
||||
@ -24,61 +20,32 @@ Over the past six months, through building and iterating on real agent systems,
|
||||
|
||||
<img height="400" alt="demo" src="https://github.com/user-attachments/assets/0e1e31f8-064f-4908-92ce-121e2eb8d453" />
|
||||
|
||||
## What You'll Learn
|
||||
## What is this?
|
||||
|
||||
After completing this tutorial, you will understand:
|
||||
A progressive tutorial that demystifies AI coding agents like Kode, Claude Code, and Cursor Agent.
|
||||
|
||||
- **The Agent Loop** - The surprisingly simple pattern behind all AI coding agents
|
||||
- **Tool Design** - How to give AI models the ability to interact with the real world
|
||||
- **Explicit Planning** - Using constraints to make AI behavior predictable
|
||||
- **Context Management** - Keeping agent memory clean through subagent isolation
|
||||
- **Knowledge Injection** - Loading domain expertise on-demand without retraining
|
||||
**5 versions, ~1100 lines total, each adding one concept:**
|
||||
|
||||
## Learning Path
|
||||
|
||||
```
|
||||
Start Here
|
||||
|
|
||||
v
|
||||
[v0: Bash Agent] -----> "One tool is enough"
|
||||
| 16-50 lines
|
||||
v
|
||||
[v1: Basic Agent] ----> "The complete agent pattern"
|
||||
| 4 tools, ~200 lines
|
||||
v
|
||||
[v2: Todo Agent] -----> "Make plans explicit"
|
||||
| +TodoManager, ~300 lines
|
||||
v
|
||||
[v3: Subagent] -------> "Divide and conquer"
|
||||
| +Task tool, ~450 lines
|
||||
v
|
||||
[v4: Skills Agent] ---> "Domain expertise on-demand"
|
||||
+Skill tool, ~550 lines
|
||||
```
|
||||
|
||||
**Recommended approach:**
|
||||
1. Read and run v0 first - understand the core loop
|
||||
2. Compare v0 and v1 - see how tools evolve
|
||||
3. Study v2 for planning patterns
|
||||
4. Explore v3 for complex task decomposition
|
||||
5. Master v4 for building extensible agents
|
||||
| Version | Lines | What it adds | Core insight |
|
||||
|---------|-------|--------------|--------------|
|
||||
| [v0](./v0_bash_agent.py) | ~50 | 1 bash tool | Bash is all you need |
|
||||
| [v1](./v1_basic_agent.py) | ~200 | 4 core tools | Model as Agent |
|
||||
| [v2](./v2_todo_agent.py) | ~300 | Todo tracking | Explicit planning |
|
||||
| [v3](./v3_subagent.py) | ~450 | Subagents | Divide and conquer |
|
||||
| [v4](./v4_skills_agent.py) | ~550 | Skills | Domain expertise on-demand |
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/shareAI-lab/learn-claude-code
|
||||
cd learn-claude-code
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Configure API key
|
||||
# Configure your API
|
||||
cp .env.example .env
|
||||
# Edit .env with your ANTHROPIC_API_KEY
|
||||
# Edit .env with your API key (supports Anthropic, OpenAI, Gemini, etc.)
|
||||
|
||||
# Run any version
|
||||
python v0_bash_agent.py # Minimal (start here!)
|
||||
python v0_bash_agent.py # Minimal
|
||||
python v1_basic_agent.py # Core agent loop
|
||||
python v2_todo_agent.py # + Todo planning
|
||||
python v3_subagent.py # + Subagents
|
||||
@ -100,16 +67,6 @@ while True:
|
||||
|
||||
That's it. The model calls tools until done. Everything else is refinement.
|
||||
|
||||
## Version Comparison
|
||||
|
||||
| Version | Lines | Tools | Core Addition | Key Insight |
|
||||
|---------|-------|-------|---------------|-------------|
|
||||
| [v0](./v0_bash_agent.py) | ~50 | bash | Recursive subagents | One tool is enough |
|
||||
| [v1](./v1_basic_agent.py) | ~200 | bash, read, write, edit | Core loop | Model as Agent |
|
||||
| [v2](./v2_todo_agent.py) | ~300 | +TodoWrite | Explicit planning | Constraints enable complexity |
|
||||
| [v3](./v3_subagent.py) | ~450 | +Task | Context isolation | Clean context = better results |
|
||||
| [v4](./v4_skills_agent.py) | ~550 | +Skill | Knowledge loading | Expertise without retraining |
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
@ -120,49 +77,24 @@ learn-claude-code/
|
||||
├── v2_todo_agent.py # ~300 lines: + TodoManager
|
||||
├── v3_subagent.py # ~450 lines: + Task tool, agent registry
|
||||
├── v4_skills_agent.py # ~550 lines: + Skill tool, SkillLoader
|
||||
├── skills/ # Example skills (pdf, code-review, mcp-builder, agent-builder)
|
||||
├── docs/ # Technical documentation (EN + ZH + JA)
|
||||
├── articles/ # Blog-style articles (ZH)
|
||||
└── tests/ # Unit and integration tests
|
||||
├── skills/ # Example skills (for learning)
|
||||
└── docs/ # Detailed explanations (EN + ZH)
|
||||
```
|
||||
|
||||
## Documentation
|
||||
## Using the Agent Builder Skill
|
||||
|
||||
### Technical Tutorials (docs/)
|
||||
|
||||
- [v0: Bash is All You Need](./docs/v0-bash-is-all-you-need.md)
|
||||
- [v1: Model as Agent](./docs/v1-model-as-agent.md)
|
||||
- [v2: Structured Planning](./docs/v2-structured-planning.md)
|
||||
- [v3: Subagent Mechanism](./docs/v3-subagent-mechanism.md)
|
||||
- [v4: Skills Mechanism](./docs/v4-skills-mechanism.md)
|
||||
|
||||
### Articles
|
||||
|
||||
See [articles/](./articles/) for blog-style explanations.
|
||||
|
||||
## Using the Skills System
|
||||
|
||||
### Example Skills Included
|
||||
|
||||
| Skill | Purpose |
|
||||
|-------|---------|
|
||||
| [agent-builder](./skills/agent-builder/) | Meta-skill: how to build agents |
|
||||
| [code-review](./skills/code-review/) | Systematic code review methodology |
|
||||
| [pdf](./skills/pdf/) | PDF manipulation patterns |
|
||||
| [mcp-builder](./skills/mcp-builder/) | MCP server development |
|
||||
|
||||
### Scaffold a New Agent
|
||||
This repository includes a meta-skill that teaches agents how to build agents:
|
||||
|
||||
```bash
|
||||
# Use the agent-builder skill to create a new project
|
||||
# Scaffold a new agent project
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent
|
||||
|
||||
# Specify complexity level
|
||||
# Or with specific complexity level
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent --level 0 # Minimal
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent --level 1 # 4 tools
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent --level 1 # 4 tools (default)
|
||||
```
|
||||
|
||||
### Install Skills for Production
|
||||
### Install Skills for Production Use
|
||||
|
||||
```bash
|
||||
# Kode CLI (recommended)
|
||||
@ -172,37 +104,66 @@ kode plugins install https://github.com/shareAI-lab/shareAI-skills
|
||||
claude plugins install https://github.com/shareAI-lab/shareAI-skills
|
||||
```
|
||||
|
||||
## Configuration
|
||||
See [shareAI-skills](https://github.com/shareAI-lab/shareAI-skills) for the full collection of production-ready skills.
|
||||
|
||||
```bash
|
||||
# .env file options
|
||||
ANTHROPIC_API_KEY=sk-ant-xxx # Required: Your API key
|
||||
ANTHROPIC_BASE_URL=https://... # Optional: For API proxies
|
||||
MODEL_ID=claude-sonnet-4-5-20250929 # Optional: Model selection
|
||||
```
|
||||
## Key Concepts
|
||||
|
||||
### v0: Bash is All You Need
|
||||
One tool. Recursive self-calls for subagents. Proves the core is tiny.
|
||||
|
||||
### v1: Model as Agent
|
||||
4 tools (bash, read, write, edit). The complete agent in one function.
|
||||
|
||||
### v2: Structured Planning
|
||||
Todo tool makes plans explicit. Constraints enable complex tasks.
|
||||
|
||||
### v3: Subagent Mechanism
|
||||
Task tool spawns isolated child agents. Context stays clean.
|
||||
|
||||
### v4: Skills Mechanism
|
||||
SKILL.md files provide domain expertise on-demand. Knowledge as a first-class citizen.
|
||||
|
||||
## Deep Dives
|
||||
|
||||
**Technical tutorials (docs/):**
|
||||
|
||||
| English | 中文 |
|
||||
|---------|------|
|
||||
| [v0: Bash is All You Need](./docs/v0-bash-is-all-you-need.md) | [v0: Bash 就是一切](./docs/v0-Bash就是一切.md) |
|
||||
| [v1: Model as Agent](./docs/v1-model-as-agent.md) | [v1: 模型即代理](./docs/v1-模型即代理.md) |
|
||||
| [v2: Structured Planning](./docs/v2-structured-planning.md) | [v2: 结构化规划](./docs/v2-结构化规划.md) |
|
||||
| [v3: Subagent Mechanism](./docs/v3-subagent-mechanism.md) | [v3: 子代理机制](./docs/v3-子代理机制.md) |
|
||||
| [v4: Skills Mechanism](./docs/v4-skills-mechanism.md) | [v4: Skills 机制](./docs/v4-Skills机制.md) |
|
||||
|
||||
**Original articles (articles/) - Chinese only, social media style:**
|
||||
- [v0文章](./articles/v0文章.md) | [v1文章](./articles/v1文章.md) | [v2文章](./articles/v2文章.md) | [v3文章](./articles/v3文章.md) | [v4文章](./articles/v4文章.md)
|
||||
- [上下文缓存经济学](./articles/上下文缓存经济学.md) - Context Caching Economics for Agent Developers
|
||||
|
||||
## Related Projects
|
||||
|
||||
| Repository | Description |
|
||||
|------------|-------------|
|
||||
| [Kode](https://github.com/shareAI-lab/Kode) | Production-ready open source agent CLI |
|
||||
| [shareAI-skills](https://github.com/shareAI-lab/shareAI-skills) | Production skills collection |
|
||||
| Repository | Purpose |
|
||||
|------------|---------|
|
||||
| [Kode](https://github.com/shareAI-lab/Kode) | Full-featured open source agent CLI (production) |
|
||||
| [shareAI-skills](https://github.com/shareAI-lab/shareAI-skills) | Production-ready skills for AI agents |
|
||||
| [Agent Skills Spec](https://github.com/anthropics/agent-skills) | Official specification |
|
||||
|
||||
### Use as Template
|
||||
|
||||
Fork and customize for your own agent projects:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/shareAI-lab/learn-claude-code
|
||||
cd learn-claude-code
|
||||
# Start from any version level
|
||||
cp v1_basic_agent.py my_agent.py
|
||||
```
|
||||
|
||||
## Philosophy
|
||||
|
||||
> **The model is 80%. Code is 20%.**
|
||||
> The model is 80%. Code is 20%.
|
||||
|
||||
Modern agents like Kode and Claude Code work not because of clever engineering, but because the model is trained to be an agent. Our job is to give it tools and stay out of the way.
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! Please feel free to submit issues and pull requests.
|
||||
|
||||
- Add new example skills in `skills/`
|
||||
- Improve documentation in `docs/`
|
||||
- Report bugs or suggest features via [Issues](https://github.com/shareAI-lab/learn-claude-code/issues)
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
@ -211,4 +172,4 @@ MIT
|
||||
|
||||
**Model as Agent. That's the whole secret.**
|
||||
|
||||
[@baicai003](https://x.com/baicai003) | [shareAI Lab](https://github.com/shareAI-lab)
|
||||
[@baicai003](https://x.com/baicai003)
|
||||
|
||||
214
README_ja.md
214
README_ja.md
@ -1,214 +0,0 @@
|
||||
# Learn Claude Code - Bashがあれば、エージェントは作れる
|
||||
|
||||
[](https://www.python.org/downloads/)
|
||||
[](https://github.com/shareAI-lab/learn-claude-code/actions)
|
||||
[](./LICENSE)
|
||||
|
||||
> **免責事項**: これは [shareAI Lab](https://github.com/shareAI-lab) による独立した教育プロジェクトです。Anthropic社とは無関係であり、同社からの承認やスポンサーを受けていません。「Claude Code」はAnthropic社の商標です。
|
||||
|
||||
**ゼロからAIエージェントの仕組みを学ぶ。**
|
||||
|
||||
[English](./README.md) | [中文](./README_zh.md)
|
||||
|
||||
---
|
||||
|
||||
## なぜこのリポジトリを作ったのか?
|
||||
|
||||
このリポジトリは、Claude Code への敬意から生まれました。私たちは **Claude Code を世界最高のAIコーディングエージェント** だと考えています。当初、行動観察と推測によってその設計をリバースエンジニアリングしようとしました。しかし、公開した分析には不正確な情報、根拠のない推測、技術的な誤りが含まれていました。Claude Code チームと、誤った情報を信じてしまった方々に深くお詫び申し上げます。
|
||||
|
||||
過去6ヶ月間、実際のエージェントシステムを構築し反復する中で、**「真のAIエージェントとは何か」** についての理解が根本的に変わりました。その知見を皆さんと共有したいと思います。以前の推測的なコンテンツはすべて削除し、オリジナルの教材に置き換えました。
|
||||
|
||||
---
|
||||
|
||||
> **[Kode CLI](https://github.com/shareAI-lab/Kode)**、**Claude Code**、**Cursor**、および [Agent Skills Spec](https://github.com/anthropics/agent-skills) をサポートするすべてのエージェントで動作します。
|
||||
|
||||
<img height="400" alt="demo" src="https://github.com/user-attachments/assets/0e1e31f8-064f-4908-92ce-121e2eb8d453" />
|
||||
|
||||
## 学べること
|
||||
|
||||
このチュートリアルを完了すると、以下を理解できます:
|
||||
|
||||
- **エージェントループ** - すべてのAIコーディングエージェントの背後にある驚くほどシンプルなパターン
|
||||
- **ツール設計** - AIモデルに現実世界と対話する能力を与える方法
|
||||
- **明示的な計画** - 制約を使ってAIの動作を予測可能にする
|
||||
- **コンテキスト管理** - サブエージェントの分離によりエージェントのメモリをクリーンに保つ
|
||||
- **知識注入** - 再学習なしでドメイン専門知識をオンデマンドで読み込む
|
||||
|
||||
## 学習パス
|
||||
|
||||
```
|
||||
ここから始める
|
||||
|
|
||||
v
|
||||
[v0: Bash Agent] -----> 「1つのツールで十分」
|
||||
| 16-50行
|
||||
v
|
||||
[v1: Basic Agent] ----> 「完全なエージェントパターン」
|
||||
| 4ツール、約200行
|
||||
v
|
||||
[v2: Todo Agent] -----> 「計画を明示化する」
|
||||
| +TodoManager、約300行
|
||||
v
|
||||
[v3: Subagent] -------> 「分割統治」
|
||||
| +Taskツール、約450行
|
||||
v
|
||||
[v4: Skills Agent] ---> 「オンデマンドのドメイン専門性」
|
||||
+Skillツール、約550行
|
||||
```
|
||||
|
||||
**おすすめの学習方法:**
|
||||
1. まずv0を読んで実行 - コアループを理解する
|
||||
2. v0とv1を比較 - ツールがどう進化するか見る
|
||||
3. v2で計画パターンを学ぶ
|
||||
4. v3で複雑なタスク分解を探求する
|
||||
5. v4で拡張可能なエージェント構築をマスターする
|
||||
|
||||
## クイックスタート
|
||||
|
||||
```bash
|
||||
# リポジトリをクローン
|
||||
git clone https://github.com/shareAI-lab/learn-claude-code
|
||||
cd learn-claude-code
|
||||
|
||||
# 依存関係をインストール
|
||||
pip install -r requirements.txt
|
||||
|
||||
# API キーを設定
|
||||
cp .env.example .env
|
||||
# .env を編集して ANTHROPIC_API_KEY を入力
|
||||
|
||||
# 任意のバージョンを実行
|
||||
python v0_bash_agent.py # 最小限(ここから始めよう!)
|
||||
python v1_basic_agent.py # コアエージェントループ
|
||||
python v2_todo_agent.py # + Todo計画
|
||||
python v3_subagent.py # + サブエージェント
|
||||
python v4_skills_agent.py # + Skills
|
||||
```
|
||||
|
||||
## コアパターン
|
||||
|
||||
すべてのコーディングエージェントは、このループにすぎない:
|
||||
|
||||
```python
|
||||
while True:
|
||||
response = model(messages, tools)
|
||||
if response.stop_reason != "tool_use":
|
||||
return response.text
|
||||
results = execute(response.tool_calls)
|
||||
messages.append(results)
|
||||
```
|
||||
|
||||
これだけです。モデルは完了するまでツールを呼び出し続けます。他のすべては改良にすぎません。
|
||||
|
||||
## バージョン比較
|
||||
|
||||
| バージョン | 行数 | ツール | コア追加 | 重要な洞察 |
|
||||
|------------|------|--------|----------|------------|
|
||||
| [v0](./v0_bash_agent.py) | ~50 | bash | 再帰的サブエージェント | 1つのツールで十分 |
|
||||
| [v1](./v1_basic_agent.py) | ~200 | bash, read, write, edit | コアループ | モデルがエージェント |
|
||||
| [v2](./v2_todo_agent.py) | ~300 | +TodoWrite | 明示的計画 | 制約が複雑さを可能にする |
|
||||
| [v3](./v3_subagent.py) | ~450 | +Task | コンテキスト分離 | クリーンなコンテキスト = より良い結果 |
|
||||
| [v4](./v4_skills_agent.py) | ~550 | +Skill | 知識読み込み | 再学習なしの専門性 |
|
||||
|
||||
## ファイル構造
|
||||
|
||||
```
|
||||
learn-claude-code/
|
||||
├── v0_bash_agent.py # ~50行: 1ツール、再帰的サブエージェント
|
||||
├── v0_bash_agent_mini.py # ~16行: 極限圧縮
|
||||
├── v1_basic_agent.py # ~200行: 4ツール、コアループ
|
||||
├── v2_todo_agent.py # ~300行: + TodoManager
|
||||
├── v3_subagent.py # ~450行: + Taskツール、エージェントレジストリ
|
||||
├── v4_skills_agent.py # ~550行: + Skillツール、SkillLoader
|
||||
├── skills/ # サンプルSkills(pdf, code-review, mcp-builder, agent-builder)
|
||||
├── docs/ # 技術ドキュメント(EN + ZH + JA)
|
||||
├── articles/ # ブログ形式の記事(ZH)
|
||||
└── tests/ # ユニットテストと統合テスト
|
||||
```
|
||||
|
||||
## ドキュメント
|
||||
|
||||
### 技術チュートリアル (docs/)
|
||||
|
||||
- [v0: Bashがすべて](./docs/v0-Bashがすべて.md)
|
||||
- [v1: モデルがエージェント](./docs/v1-モデルがエージェント.md)
|
||||
- [v2: 構造化プランニング](./docs/v2-構造化プランニング.md)
|
||||
- [v3: サブエージェント機構](./docs/v3-サブエージェント.md)
|
||||
- [v4: スキル機構](./docs/v4-スキル機構.md)
|
||||
|
||||
### 記事
|
||||
|
||||
[articles/](./articles/) でブログ形式の解説を参照してください(中国語)。
|
||||
|
||||
## Skillsシステムの使用
|
||||
|
||||
### 含まれているサンプルSkills
|
||||
|
||||
| Skill | 用途 |
|
||||
|-------|------|
|
||||
| [agent-builder](./skills/agent-builder/) | メタスキル:エージェントの作り方 |
|
||||
| [code-review](./skills/code-review/) | 体系的なコードレビュー手法 |
|
||||
| [pdf](./skills/pdf/) | PDF操作パターン |
|
||||
| [mcp-builder](./skills/mcp-builder/) | MCPサーバー開発 |
|
||||
|
||||
### 新しいエージェントのスキャフォールド
|
||||
|
||||
```bash
|
||||
# agent-builder skillを使って新しいプロジェクトを作成
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent
|
||||
|
||||
# 複雑さのレベルを指定
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent --level 0 # 最小限
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent --level 1 # 4ツール
|
||||
```
|
||||
|
||||
### 本番環境用Skillsのインストール
|
||||
|
||||
```bash
|
||||
# Kode CLI(推奨)
|
||||
kode plugins install https://github.com/shareAI-lab/shareAI-skills
|
||||
|
||||
# Claude Code
|
||||
claude plugins install https://github.com/shareAI-lab/shareAI-skills
|
||||
```
|
||||
|
||||
## 設定
|
||||
|
||||
```bash
|
||||
# .env ファイルのオプション
|
||||
ANTHROPIC_API_KEY=sk-ant-xxx # 必須:あなたのAPIキー
|
||||
ANTHROPIC_BASE_URL=https://... # 任意:APIプロキシ用
|
||||
MODEL_ID=claude-sonnet-4-5-20250929 # 任意:モデル選択
|
||||
```
|
||||
|
||||
## 関連プロジェクト
|
||||
|
||||
| リポジトリ | 説明 |
|
||||
|------------|------|
|
||||
| [Kode](https://github.com/shareAI-lab/Kode) | 本番対応のオープンソースエージェントCLI |
|
||||
| [shareAI-skills](https://github.com/shareAI-lab/shareAI-skills) | 本番用Skillsコレクション |
|
||||
| [Agent Skills Spec](https://github.com/anthropics/agent-skills) | 公式仕様 |
|
||||
|
||||
## 設計思想
|
||||
|
||||
> **モデルが80%、コードは20%。**
|
||||
|
||||
KodeやClaude Codeのような現代のエージェントが機能するのは、巧妙なエンジニアリングのためではなく、モデルがエージェントとして訓練されているからです。私たちの仕事は、モデルにツールを与えて、邪魔をしないことです。
|
||||
|
||||
## コントリビュート
|
||||
|
||||
コントリビューションを歓迎します!お気軽にissueやpull requestを送ってください。
|
||||
|
||||
- `skills/` に新しいサンプルSkillsを追加
|
||||
- `docs/` のドキュメントを改善
|
||||
- [Issues](https://github.com/shareAI-lab/learn-claude-code/issues) でバグ報告や機能提案
|
||||
|
||||
## ライセンス
|
||||
|
||||
MIT
|
||||
|
||||
---
|
||||
|
||||
**モデルがエージェント。これがすべての秘密。**
|
||||
|
||||
[@baicai003](https://x.com/baicai003) | [shareAI Lab](https://github.com/shareAI-lab)
|
||||
191
README_zh.md
191
README_zh.md
@ -1,18 +1,14 @@
|
||||
# Learn Claude Code - Bash 就是 Agent 的一切
|
||||
|
||||
[](https://www.python.org/downloads/)
|
||||
[](https://github.com/shareAI-lab/learn-claude-code/actions)
|
||||
[](./LICENSE)
|
||||
# Learn Claude Code
|
||||
|
||||
> **声明**: 这是 [shareAI Lab](https://github.com/shareAI-lab) 的独立教育项目,与 Anthropic 无关,未获其认可或赞助。"Claude Code" 是 Anthropic 的商标。
|
||||
|
||||
**从零开始构建你自己的 AI Agent。**
|
||||
|
||||
[English](./README.md) | [Japanese / 日本語](./README_ja.md)
|
||||
[English](./README.md)
|
||||
|
||||
---
|
||||
|
||||
## 为什么有这个仓库?
|
||||
**致读者:**
|
||||
|
||||
这个仓库源于我们对 Claude Code 的敬佩 - **我们认为它是世界上最优秀的 AI 编程代理**。最初,我们试图通过行为观察和推测来逆向分析它的设计。然而,我们当时发布的分析内容充斥着不准确的信息、缺乏依据的猜测和技术错误。我们在此向 Claude Code 团队以及所有被这些内容误导的朋友深表歉意。
|
||||
|
||||
@ -24,61 +20,31 @@
|
||||
|
||||
<img height="400" alt="demo" src="https://github.com/user-attachments/assets/0e1e31f8-064f-4908-92ce-121e2eb8d453" />
|
||||
|
||||
## 你将学到什么
|
||||
## 这是什么?
|
||||
|
||||
完成本教程后,你将理解:
|
||||
一个渐进式教程,揭开 Kode、Claude Code、Cursor Agent 等 AI Agent 的神秘面纱。
|
||||
|
||||
- **Agent 循环** - 所有 AI 编程代理背后那个令人惊讶的简单模式
|
||||
- **工具设计** - 如何让 AI 模型能够与真实世界交互
|
||||
- **显式规划** - 使用约束让 AI 行为可预测
|
||||
- **上下文管理** - 通过子代理隔离保持代理记忆干净
|
||||
- **知识注入** - 按需加载领域专业知识,无需重新训练
|
||||
**5 个版本,总共约 1100 行,每个版本只添加一个概念:**
|
||||
|
||||
## 学习路径
|
||||
|
||||
```
|
||||
从这里开始
|
||||
|
|
||||
v
|
||||
[v0: Bash Agent] -----> "一个工具就够了"
|
||||
| 16-50 行
|
||||
v
|
||||
[v1: Basic Agent] ----> "完整的 Agent 模式"
|
||||
| 4 个工具,~200 行
|
||||
v
|
||||
[v2: Todo Agent] -----> "让计划显式化"
|
||||
| +TodoManager,~300 行
|
||||
v
|
||||
[v3: Subagent] -------> "分而治之"
|
||||
| +Task 工具,~450 行
|
||||
v
|
||||
[v4: Skills Agent] ---> "按需领域专业"
|
||||
+Skill 工具,~550 行
|
||||
```
|
||||
|
||||
**推荐学习方式:**
|
||||
1. 先阅读并运行 v0 - 理解核心循环
|
||||
2. 对比 v0 和 v1 - 看工具如何演进
|
||||
3. 学习 v2 的规划模式
|
||||
4. 探索 v3 的复杂任务分解
|
||||
5. 掌握 v4 构建可扩展的 Agent
|
||||
| 版本 | 行数 | 新增内容 | 核心洞察 |
|
||||
|------|------|---------|---------|
|
||||
| [v0](./v0_bash_agent.py) | ~50 | 1 个 bash 工具 | Bash 就是一切 |
|
||||
| [v1](./v1_basic_agent.py) | ~200 | 4 个核心工具 | 模型即代理 |
|
||||
| [v2](./v2_todo_agent.py) | ~300 | Todo 追踪 | 显式规划 |
|
||||
| [v3](./v3_subagent.py) | ~450 | 子代理 | 分而治之 |
|
||||
| [v4](./v4_skills_agent.py) | ~550 | Skills | 按需领域专业 |
|
||||
|
||||
## 快速开始
|
||||
|
||||
```bash
|
||||
# 克隆仓库
|
||||
git clone https://github.com/shareAI-lab/learn-claude-code
|
||||
cd learn-claude-code
|
||||
pip install anthropic python-dotenv
|
||||
|
||||
# 安装依赖
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 配置 API Key
|
||||
# 配置 API
|
||||
cp .env.example .env
|
||||
# 编辑 .env 填入你的 ANTHROPIC_API_KEY
|
||||
# 编辑 .env 填入你的 API key
|
||||
|
||||
# 运行任意版本
|
||||
python v0_bash_agent.py # 极简版(从这里开始!)
|
||||
python v0_bash_agent.py # 极简版
|
||||
python v1_basic_agent.py # 核心 Agent 循环
|
||||
python v2_todo_agent.py # + Todo 规划
|
||||
python v3_subagent.py # + 子代理
|
||||
@ -100,16 +66,6 @@ while True:
|
||||
|
||||
就这样。模型持续调用工具直到完成。其他一切都是精化。
|
||||
|
||||
## 版本对比
|
||||
|
||||
| 版本 | 行数 | 工具 | 核心新增 | 关键洞察 |
|
||||
|------|------|------|---------|---------|
|
||||
| [v0](./v0_bash_agent.py) | ~50 | bash | 递归子代理 | 一个工具就够了 |
|
||||
| [v1](./v1_basic_agent.py) | ~200 | bash, read, write, edit | 核心循环 | 模型即代理 |
|
||||
| [v2](./v2_todo_agent.py) | ~300 | +TodoWrite | 显式规划 | 约束赋能复杂性 |
|
||||
| [v3](./v3_subagent.py) | ~450 | +Task | 上下文隔离 | 干净上下文 = 更好结果 |
|
||||
| [v4](./v4_skills_agent.py) | ~550 | +Skill | 知识加载 | 专业无需重训 |
|
||||
|
||||
## 文件结构
|
||||
|
||||
```
|
||||
@ -120,51 +76,21 @@ learn-claude-code/
|
||||
├── v2_todo_agent.py # ~300 行: + TodoManager
|
||||
├── v3_subagent.py # ~450 行: + Task 工具,代理注册表
|
||||
├── v4_skills_agent.py # ~550 行: + Skill 工具,SkillLoader
|
||||
├── skills/ # 示例 Skills(pdf, code-review, mcp-builder, agent-builder)
|
||||
├── docs/ # 技术文档(中英双语)
|
||||
├── articles/ # 公众号风格文章
|
||||
└── tests/ # 单元测试和集成测试
|
||||
├── skills/ # 示例 Skills(用于学习)
|
||||
└── docs/ # 详细文档 (中英双语)
|
||||
```
|
||||
|
||||
## 深入阅读
|
||||
## 使用 Agent Builder Skill
|
||||
|
||||
### 技术文档 (docs/)
|
||||
|
||||
- [v0: Bash 就是一切](./docs/v0-Bash就是一切.md)
|
||||
- [v1: 模型即代理](./docs/v1-模型即代理.md)
|
||||
- [v2: 结构化规划](./docs/v2-结构化规划.md)
|
||||
- [v3: 子代理机制](./docs/v3-子代理机制.md)
|
||||
- [v4: Skills 机制](./docs/v4-Skills机制.md)
|
||||
|
||||
### 原创文章 (articles/)
|
||||
|
||||
- [v0文章](./articles/v0文章.md) - Bash 就是一切
|
||||
- [v1文章](./articles/v1文章.md) - 价值 3000 万美金的 400 行代码
|
||||
- [v2文章](./articles/v2文章.md) - 用 Todo 实现自我约束
|
||||
- [v3文章](./articles/v3文章.md) - 子代理机制
|
||||
- [v4文章](./articles/v4文章.md) - Skills 机制
|
||||
- [上下文缓存经济学](./articles/上下文缓存经济学.md) - Agent 开发者必知的成本优化
|
||||
|
||||
## 使用 Skills 系统
|
||||
|
||||
### 内置示例 Skills
|
||||
|
||||
| Skill | 用途 |
|
||||
|-------|------|
|
||||
| [agent-builder](./skills/agent-builder/) | 元技能:如何构建 Agent |
|
||||
| [code-review](./skills/code-review/) | 系统化代码审查方法论 |
|
||||
| [pdf](./skills/pdf/) | PDF 操作模式 |
|
||||
| [mcp-builder](./skills/mcp-builder/) | MCP 服务器开发 |
|
||||
|
||||
### 脚手架生成新 Agent
|
||||
本仓库包含一个元技能,教 Agent 如何构建 Agent:
|
||||
|
||||
```bash
|
||||
# 使用 agent-builder skill 创建新项目
|
||||
# 脚手架生成新 Agent 项目
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent
|
||||
|
||||
# 指定复杂度级别
|
||||
# 或指定复杂度级别
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent --level 0 # 极简
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent --level 1 # 4 工具
|
||||
python skills/agent-builder/scripts/init_agent.py my-agent --level 1 # 4 工具 (默认)
|
||||
```
|
||||
|
||||
### 生产环境安装 Skills
|
||||
@ -177,37 +103,66 @@ kode plugins install https://github.com/shareAI-lab/shareAI-skills
|
||||
claude plugins install https://github.com/shareAI-lab/shareAI-skills
|
||||
```
|
||||
|
||||
## 配置说明
|
||||
详见 [shareAI-skills](https://github.com/shareAI-lab/shareAI-skills) 获取完整的生产就绪 skills 集合。
|
||||
|
||||
```bash
|
||||
# .env 文件选项
|
||||
ANTHROPIC_API_KEY=sk-ant-xxx # 必需:你的 API key
|
||||
ANTHROPIC_BASE_URL=https://... # 可选:API 代理
|
||||
MODEL_ID=claude-sonnet-4-5-20250929 # 可选:模型选择
|
||||
```
|
||||
## 核心概念
|
||||
|
||||
### v0: Bash 就是一切
|
||||
一个工具。递归自调用实现子代理。证明核心是极小的。
|
||||
|
||||
### v1: 模型即代理
|
||||
4 个工具 (bash, read, write, edit)。完整 Agent 在一个函数里。
|
||||
|
||||
### v2: 结构化规划
|
||||
Todo 工具让计划显式化。约束赋能复杂任务。
|
||||
|
||||
### v3: 子代理机制
|
||||
Task 工具生成隔离的子代理。上下文保持干净。
|
||||
|
||||
### v4: Skills 机制
|
||||
SKILL.md 文件按需提供领域专业知识。知识作为一等公民。
|
||||
|
||||
## 深入阅读
|
||||
|
||||
**技术教程 (docs/):**
|
||||
|
||||
| English | 中文 |
|
||||
|---------|------|
|
||||
| [v0: Bash is All You Need](./docs/v0-bash-is-all-you-need.md) | [v0: Bash 就是一切](./docs/v0-Bash就是一切.md) |
|
||||
| [v1: Model as Agent](./docs/v1-model-as-agent.md) | [v1: 模型即代理](./docs/v1-模型即代理.md) |
|
||||
| [v2: Structured Planning](./docs/v2-structured-planning.md) | [v2: 结构化规划](./docs/v2-结构化规划.md) |
|
||||
| [v3: Subagent Mechanism](./docs/v3-subagent-mechanism.md) | [v3: 子代理机制](./docs/v3-子代理机制.md) |
|
||||
| [v4: Skills Mechanism](./docs/v4-skills-mechanism.md) | [v4: Skills 机制](./docs/v4-Skills机制.md) |
|
||||
|
||||
**原创文章 (articles/) - 公众号风格:**
|
||||
- [v0文章](./articles/v0文章.md) | [v1文章](./articles/v1文章.md) | [v2文章](./articles/v2文章.md) | [v3文章](./articles/v3文章.md) | [v4文章](./articles/v4文章.md)
|
||||
- [上下文缓存经济学](./articles/上下文缓存经济学.md) - Agent 开发者必知的成本优化指南
|
||||
|
||||
## 相关项目
|
||||
|
||||
| 仓库 | 说明 |
|
||||
| 仓库 | 用途 |
|
||||
|------|------|
|
||||
| [Kode](https://github.com/shareAI-lab/Kode) | 生产就绪的开源 Agent CLI |
|
||||
| [shareAI-skills](https://github.com/shareAI-lab/shareAI-skills) | 生产 Skills 集合 |
|
||||
| [Kode](https://github.com/shareAI-lab/Kode) | 全功能开源 Agent CLI(生产环境) |
|
||||
| [shareAI-skills](https://github.com/shareAI-lab/shareAI-skills) | 生产就绪的 AI Agent Skills |
|
||||
| [Agent Skills Spec](https://github.com/anthropics/agent-skills) | 官方规范 |
|
||||
|
||||
### 作为模板
|
||||
|
||||
Fork 并自定义为你自己的 Agent 项目:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/shareAI-lab/learn-claude-code
|
||||
cd learn-claude-code
|
||||
# 从任意版本级别开始
|
||||
cp v1_basic_agent.py my_agent.py
|
||||
```
|
||||
|
||||
## 设计哲学
|
||||
|
||||
> **模型是 80%,代码是 20%。**
|
||||
> 模型是 80%,代码是 20%。
|
||||
|
||||
Kode 和 Claude Code 等现代 Agent 能工作,不是因为巧妙的工程,而是因为模型被训练成了 Agent。我们的工作就是给它工具,然后闪开。
|
||||
|
||||
## 贡献
|
||||
|
||||
欢迎贡献!请随时提交 issues 和 pull requests。
|
||||
|
||||
- 在 `skills/` 中添加新的示例 skills
|
||||
- 在 `docs/` 中改进文档
|
||||
- 通过 [Issues](https://github.com/shareAI-lab/learn-claude-code/issues) 报告 bug 或建议功能
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
@ -216,4 +171,4 @@ MIT
|
||||
|
||||
**模型即代理。这就是全部秘密。**
|
||||
|
||||
[@baicai003](https://x.com/baicai003) | [shareAI Lab](https://github.com/shareAI-lab)
|
||||
[@baicai003](https://x.com/baicai003)
|
||||
|
||||
@ -1,117 +0,0 @@
|
||||
# v0: Bashがすべて
|
||||
|
||||
**究極の簡素化:約50行、1ツール、完全なエージェント機能。**
|
||||
|
||||
v1、v2、v3を構築した後、ある疑問が浮かびます:エージェントの*本質*とは何か?
|
||||
|
||||
v0は逆方向に進むことでこれに答えます—コアだけが残るまですべてを削ぎ落とします。
|
||||
|
||||
## コアの洞察
|
||||
|
||||
Unix哲学:すべてはファイル、すべてはパイプできる。Bashはこの世界への入り口です:
|
||||
|
||||
| 必要なこと | Bashコマンド |
|
||||
|----------|--------------|
|
||||
| ファイルを読む | `cat`, `head`, `grep` |
|
||||
| ファイルに書く | `echo '...' > file` |
|
||||
| 検索 | `find`, `grep`, `rg` |
|
||||
| 実行 | `python`, `npm`, `make` |
|
||||
| **サブエージェント** | `python v0_bash_agent.py "task"` |
|
||||
|
||||
最後の行が重要な洞察です:**bash経由で自分自身を呼び出すことでサブエージェントを実装**。Taskツールも、Agent Registryも不要—ただの再帰です。
|
||||
|
||||
## 完全なコード
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python
|
||||
from anthropic import Anthropic
|
||||
import subprocess, sys, os
|
||||
|
||||
client = Anthropic(api_key="your-key", base_url="...")
|
||||
TOOL = [{
|
||||
"name": "bash",
|
||||
"description": """Execute shell command. Patterns:
|
||||
- Read: cat/grep/find/ls
|
||||
- Write: echo '...' > file
|
||||
- Subagent: python v0_bash_agent.py 'task description'""",
|
||||
"input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}
|
||||
}]
|
||||
SYSTEM = f"CLI agent at {os.getcwd()}. Use bash. Spawn subagent for complex tasks."
|
||||
|
||||
def chat(prompt, history=[]):
|
||||
history.append({"role": "user", "content": prompt})
|
||||
while True:
|
||||
r = client.messages.create(model="...", system=SYSTEM, messages=history, tools=TOOL, max_tokens=8000)
|
||||
history.append({"role": "assistant", "content": r.content})
|
||||
if r.stop_reason != "tool_use":
|
||||
return "".join(b.text for b in r.content if hasattr(b, "text"))
|
||||
results = []
|
||||
for b in r.content:
|
||||
if b.type == "tool_use":
|
||||
out = subprocess.run(b.input["command"], shell=True, capture_output=True, text=True, timeout=300)
|
||||
results.append({"type": "tool_result", "tool_use_id": b.id, "content": out.stdout + out.stderr})
|
||||
history.append({"role": "user", "content": results})
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) > 1:
|
||||
print(chat(sys.argv[1])) # Subagent mode
|
||||
else:
|
||||
h = []
|
||||
while (q := input(">> ")) not in ("q", ""):
|
||||
print(chat(q, h))
|
||||
```
|
||||
|
||||
これが完全なエージェントです。約50行。
|
||||
|
||||
## サブエージェントの仕組み
|
||||
|
||||
```
|
||||
メインエージェント
|
||||
└─ bash: python v0_bash_agent.py "analyze architecture"
|
||||
└─ サブエージェント(分離されたプロセス、新しい履歴)
|
||||
├─ bash: find . -name "*.py"
|
||||
├─ bash: cat src/main.py
|
||||
└─ stdoutで要約を返す
|
||||
```
|
||||
|
||||
**プロセス分離 = コンテキスト分離**
|
||||
- 子プロセスは独自の `history=[]` を持つ
|
||||
- 親はstdoutをツール結果としてキャプチャ
|
||||
- 再帰呼び出しで無制限のネストが可能
|
||||
|
||||
## v0が犠牲にするもの
|
||||
|
||||
| 機能 | v0 | v3 |
|
||||
|------|----|-----|
|
||||
| エージェントタイプ | なし | explore/code/plan |
|
||||
| ツールフィルタリング | なし | ホワイトリスト |
|
||||
| 進捗表示 | 通常のstdout | インライン更新 |
|
||||
| コードの複雑さ | 約50行 | 約450行 |
|
||||
|
||||
## v0が証明すること
|
||||
|
||||
**複雑な能力はシンプルなルールから生まれる:**
|
||||
|
||||
1. **1つのツールで十分** — Bashはすべてへの入り口
|
||||
2. **再帰 = 階層** — 自己呼び出しでサブエージェントを実装
|
||||
3. **プロセス = 分離** — OSがコンテキスト分離を提供
|
||||
4. **プロンプト = 制約** — 指示が振る舞いを形作る
|
||||
|
||||
コアパターンは決して変わらない:
|
||||
|
||||
```python
|
||||
while True:
|
||||
response = model(messages, tools)
|
||||
if response.stop_reason != "tool_use":
|
||||
return response.text
|
||||
results = execute(response.tool_calls)
|
||||
messages.append(results)
|
||||
```
|
||||
|
||||
他のすべて—Todo、サブエージェント、権限—はこのループの周りの改良です。
|
||||
|
||||
---
|
||||
|
||||
**Bashがすべて。**
|
||||
|
||||
[← READMEに戻る](../README_ja.md) | [v1 →](./v1-モデルがエージェント.md)
|
||||
@ -1,139 +0,0 @@
|
||||
# v1: モデルがエージェント
|
||||
|
||||
**約200行。4ツール。すべてのコーディングエージェントの本質。**
|
||||
|
||||
Claude Codeの秘密?**秘密などない。**
|
||||
|
||||
CLIの装飾、プログレスバー、権限システムを取り除く。残るのは驚くほどシンプル:モデルがタスク完了までツールを呼び出すループ。
|
||||
|
||||
## コアの洞察
|
||||
|
||||
従来のアシスタント:
|
||||
```
|
||||
ユーザー -> モデル -> テキスト応答
|
||||
```
|
||||
|
||||
エージェントシステム:
|
||||
```
|
||||
ユーザー -> モデル -> [ツール -> 結果]* -> 応答
|
||||
^___________|
|
||||
```
|
||||
|
||||
アスタリスクが重要。モデルはタスク完了を決定するまでツールを**繰り返し**呼び出す。これがチャットボットを自律エージェントに変える。
|
||||
|
||||
**重要な洞察**: モデルが意思決定者。コードはツールを提供してループを実行するだけ。
|
||||
|
||||
## 4つの必須ツール
|
||||
|
||||
Claude Codeは約20のツールを持つ。しかし4つで90%のユースケースをカバー:
|
||||
|
||||
| ツール | 目的 | 例 |
|
||||
|--------|------|-----|
|
||||
| `bash` | コマンド実行 | `npm install`, `git status` |
|
||||
| `read_file` | 内容を読む | `src/index.ts`を表示 |
|
||||
| `write_file` | 作成/上書き | `README.md`を作成 |
|
||||
| `edit_file` | 精密な変更 | 関数を置換 |
|
||||
|
||||
この4つのツールで、モデルは:
|
||||
- コードベースを探索(`bash: find, grep, ls`)
|
||||
- コードを理解(`read_file`)
|
||||
- 変更を加える(`write_file`, `edit_file`)
|
||||
- 何でも実行(`bash: python, npm, make`)
|
||||
|
||||
## エージェントループ
|
||||
|
||||
1つの関数で完全なエージェント:
|
||||
|
||||
```python
|
||||
def agent_loop(messages):
|
||||
while True:
|
||||
# 1. モデルに聞く
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=SYSTEM,
|
||||
messages=messages, tools=TOOLS
|
||||
)
|
||||
|
||||
# 2. テキスト出力を表示
|
||||
for block in response.content:
|
||||
if hasattr(block, "text"):
|
||||
print(block.text)
|
||||
|
||||
# 3. ツール呼び出しがなければ完了
|
||||
if response.stop_reason != "tool_use":
|
||||
return messages
|
||||
|
||||
# 4. ツールを実行して続行
|
||||
results = []
|
||||
for tc in response.tool_calls:
|
||||
output = execute_tool(tc.name, tc.input)
|
||||
results.append({"type": "tool_result", "tool_use_id": tc.id, "content": output})
|
||||
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
messages.append({"role": "user", "content": results})
|
||||
```
|
||||
|
||||
**なぜこれが機能するか:**
|
||||
1. モデルがループを制御(`stop_reason != "tool_use"`までツールを呼び続ける)
|
||||
2. 結果がコンテキストになる("user"メッセージとしてフィードバック)
|
||||
3. メモリは自動(messagesリストに履歴が蓄積)
|
||||
|
||||
## システムプロンプト
|
||||
|
||||
必要な唯一の「設定」:
|
||||
|
||||
```python
|
||||
SYSTEM = f"""You are a coding agent at {WORKDIR}.
|
||||
|
||||
Loop: think briefly -> use tools -> report results.
|
||||
|
||||
Rules:
|
||||
- Prefer tools over prose. Act, don't just explain.
|
||||
- Never invent file paths. Use ls/find first if unsure.
|
||||
- Make minimal changes. Don't over-engineer.
|
||||
- After finishing, summarize what changed."""
|
||||
```
|
||||
|
||||
複雑なロジックなし。明確な指示のみ。
|
||||
|
||||
## この設計が機能する理由
|
||||
|
||||
**1. シンプルさ**
|
||||
ステートマシンなし。計画モジュールなし。フレームワークなし。
|
||||
|
||||
**2. モデルが考える**
|
||||
モデルがどのツールを、どの順序で、いつ停止するか決定。
|
||||
|
||||
**3. 透明性**
|
||||
すべてのツール呼び出しが可視。すべての結果が会話に。
|
||||
|
||||
**4. 拡張性**
|
||||
ツール追加 = 1関数 + 1JSONスキーマ。
|
||||
|
||||
## 何が欠けているか
|
||||
|
||||
| 機能 | 省略理由 | 追加先 |
|
||||
|------|----------|--------|
|
||||
| Todo追跡 | 必須ではない | v2 |
|
||||
| サブエージェント | 複雑さ | v3 |
|
||||
| 権限 | 学習ではモデルを信頼 | 本番 |
|
||||
|
||||
要点:**コアは極小**。他のすべては改良。
|
||||
|
||||
## より大きな視点
|
||||
|
||||
Claude Code、Cursor Agent、Codex CLI、Devin—すべてこのパターンを共有:
|
||||
|
||||
```python
|
||||
while not done:
|
||||
response = model(conversation, tools)
|
||||
results = execute(response.tool_calls)
|
||||
conversation.append(results)
|
||||
```
|
||||
|
||||
違いはツール、表示、安全性。しかし本質は常に:**モデルにツールを与えて作業させる**。
|
||||
|
||||
---
|
||||
|
||||
**モデルがエージェント。これがすべての秘密。**
|
||||
|
||||
[← v0](./v0-Bashがすべて.md) | [READMEに戻る](../README_ja.md) | [v2 →](./v2-構造化プランニング.md)
|
||||
@ -89,17 +89,11 @@ NAG_REMINDER = "<reminder>10+ turns without todo. Please update.</reminder>"
|
||||
Injected as context, not commands:
|
||||
|
||||
```python
|
||||
# INITIAL_REMINDER: at conversation start (in main)
|
||||
if first_message:
|
||||
inject_reminder(INITIAL_REMINDER)
|
||||
|
||||
# NAG_REMINDER: inside agent_loop, during task execution
|
||||
if rounds_without_todo > 10:
|
||||
inject_reminder(NAG_REMINDER)
|
||||
```
|
||||
|
||||
Key insight: NAG_REMINDER is injected **inside the agent loop**, so the model
|
||||
sees it during long-running tasks, not just between tasks.
|
||||
The model sees them but doesn't respond to them.
|
||||
|
||||
## The Feedback Loop
|
||||
|
||||
|
||||
@ -1,171 +0,0 @@
|
||||
# v2: Todoによる構造化プランニング
|
||||
|
||||
**約300行。+1ツール。明示的なタスク追跡。**
|
||||
|
||||
v1は機能する。しかし複雑なタスクでは、モデルが見失うことがある。
|
||||
|
||||
「認証をリファクタリングし、テストを追加し、ドキュメントを更新して」と頼むと何が起こるか見てみよう。明示的な計画なしでは、タスク間を飛び回り、ステップを忘れ、集中を失う。
|
||||
|
||||
v2は1つのものを追加:**Todoツール**。エージェントの動作を根本的に変える約100行の新コード。
|
||||
|
||||
## 問題
|
||||
|
||||
v1では、計画はモデルの「頭の中」にのみ存在:
|
||||
|
||||
```
|
||||
v1: "Aをして、次にB、次にC" (見えない)
|
||||
10ツール後: "あれ、何をしていたっけ?"
|
||||
```
|
||||
|
||||
Todoツールはそれを明示化:
|
||||
|
||||
```
|
||||
v2:
|
||||
[ ] 認証モジュールをリファクタリング
|
||||
[>] ユニットテストを追加 <- 現在ここ
|
||||
[ ] ドキュメントを更新
|
||||
```
|
||||
|
||||
これであなたもモデルも計画が見える。
|
||||
|
||||
## TodoManager
|
||||
|
||||
制約のあるリスト:
|
||||
|
||||
```python
|
||||
class TodoManager:
|
||||
def __init__(self):
|
||||
self.items = [] # 最大20
|
||||
|
||||
def update(self, items):
|
||||
# バリデーション:
|
||||
# - 各項目に必要: content, status, activeForm
|
||||
# - Status: pending | in_progress | completed
|
||||
# - in_progressは1つだけ
|
||||
# - 重複なし、空なし
|
||||
```
|
||||
|
||||
制約が重要:
|
||||
|
||||
| ルール | 理由 |
|
||||
|--------|------|
|
||||
| 最大20項目 | 無限リストを防ぐ |
|
||||
| in_progressは1つ | 集中を強制 |
|
||||
| 必須フィールド | 構造化出力 |
|
||||
|
||||
これらは任意ではない—ガードレールだ。
|
||||
|
||||
## ツール
|
||||
|
||||
```python
|
||||
{
|
||||
"name": "TodoWrite",
|
||||
"input_schema": {
|
||||
"items": [{
|
||||
"content": "タスクの説明",
|
||||
"status": "pending | in_progress | completed",
|
||||
"activeForm": "現在形: 'ファイルを読んでいます'"
|
||||
}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`activeForm`は今何が起こっているかを示す:
|
||||
|
||||
```
|
||||
[>] 認証コードを読んでいます... <- activeForm
|
||||
[ ] ユニットテストを追加
|
||||
```
|
||||
|
||||
## システムリマインダー
|
||||
|
||||
Todo使用を促すソフト制約:
|
||||
|
||||
```python
|
||||
INITIAL_REMINDER = "<reminder>マルチステップタスクにはTodoWriteを使用してください。</reminder>"
|
||||
NAG_REMINDER = "<reminder>10ターン以上todoなし。更新してください。</reminder>"
|
||||
```
|
||||
|
||||
コマンドではなくコンテキストとして注入:
|
||||
|
||||
```python
|
||||
# INITIAL_REMINDER: 会話開始時(mainで)
|
||||
if first_message:
|
||||
inject_reminder(INITIAL_REMINDER)
|
||||
|
||||
# NAG_REMINDER: agent_loop内で、タスク実行中に
|
||||
if rounds_without_todo > 10:
|
||||
inject_reminder(NAG_REMINDER)
|
||||
```
|
||||
|
||||
重要な洞察:NAG_REMINDERは**エージェントループ内**で注入されるので、モデルは長時間実行タスク中にそれを見る、タスク間だけではなく。
|
||||
|
||||
## フィードバックループ
|
||||
|
||||
モデルが`TodoWrite`を呼び出すとき:
|
||||
|
||||
```
|
||||
入力:
|
||||
[x] 認証をリファクタリング (完了)
|
||||
[>] テストを追加 (進行中)
|
||||
[ ] ドキュメントを更新 (保留)
|
||||
|
||||
返却:
|
||||
"[x] 認証をリファクタリング
|
||||
[>] テストを追加
|
||||
[ ] ドキュメントを更新
|
||||
(1/3 完了)"
|
||||
```
|
||||
|
||||
モデルは自分の計画を見る。それを更新する。コンテキストを持って続行する。
|
||||
|
||||
## Todoが役立つとき
|
||||
|
||||
すべてのタスクに必要なわけではない:
|
||||
|
||||
| 適切な場面 | 理由 |
|
||||
|------------|------|
|
||||
| マルチステップ作業 | 追跡すべき5つ以上のステップ |
|
||||
| 長い会話 | 20以上のツール呼び出し |
|
||||
| 複雑なリファクタリング | 複数のファイル |
|
||||
| 教育 | 「思考」が可視 |
|
||||
|
||||
経験則:**チェックリストを書くなら、todoを使う**。
|
||||
|
||||
## 統合
|
||||
|
||||
v2はv1を変更せずに追加:
|
||||
|
||||
```python
|
||||
# v1のツール
|
||||
tools = [bash, read_file, write_file, edit_file]
|
||||
|
||||
# v2が追加
|
||||
tools.append(TodoWrite)
|
||||
todo_manager = TodoManager()
|
||||
|
||||
# v2は使用を追跡
|
||||
if rounds_without_todo > 10:
|
||||
inject_reminder()
|
||||
```
|
||||
|
||||
約100行の新コード。同じエージェントループ。
|
||||
|
||||
## より深い洞察
|
||||
|
||||
> **構造は制約し、可能にする。**
|
||||
|
||||
Todo制約(最大項目数、1つのin_progress)が可能にする(可視の計画、追跡された進捗)。
|
||||
|
||||
エージェント設計のパターン:
|
||||
- `max_tokens`は制約 → 管理可能な応答を可能に
|
||||
- ツールスキーマは制約 → 構造化された呼び出しを可能に
|
||||
- Todoは制約 → 複雑なタスク完了を可能に
|
||||
|
||||
良い制約は制限ではない。足場だ。
|
||||
|
||||
---
|
||||
|
||||
**明示的な計画がエージェントを信頼性あるものにする。**
|
||||
|
||||
[← v1](./v1-モデルがエージェント.md) | [READMEに戻る](../README_ja.md) | [v3 →](./v3-サブエージェント.md)
|
||||
@ -1,190 +0,0 @@
|
||||
# v3: サブエージェント機構
|
||||
|
||||
**約450行。+1ツール。分割統治。**
|
||||
|
||||
v2で計画を追加。しかし「コードベースを探索してから認証をリファクタリング」のような大きなタスクでは、単一のエージェントはコンテキスト制限に達する。探索で20ファイルが履歴にダンプされる。リファクタリングで集中を失う。
|
||||
|
||||
v3は**Taskツール**を追加:分離されたコンテキストで子エージェントを生成。
|
||||
|
||||
## 問題
|
||||
|
||||
単一エージェントのコンテキスト汚染:
|
||||
|
||||
```
|
||||
メインエージェント履歴:
|
||||
[探索中...] cat file1.py -> 500行
|
||||
[探索中...] cat file2.py -> 300行
|
||||
... さらに15ファイル ...
|
||||
[リファクタリング中...] 「あれ、file1の内容は?」
|
||||
```
|
||||
|
||||
解決策:**探索をサブエージェントに委任**:
|
||||
|
||||
```
|
||||
メインエージェント履歴:
|
||||
[Task: コードベースを探索]
|
||||
-> サブエージェントが20ファイルを探索
|
||||
-> 返却: "認証はsrc/auth/、DBはsrc/models/"
|
||||
[クリーンなコンテキストでリファクタリング]
|
||||
```
|
||||
|
||||
## エージェントタイプレジストリ
|
||||
|
||||
各エージェントタイプが能力を定義:
|
||||
|
||||
```python
|
||||
AGENT_TYPES = {
|
||||
"explore": {
|
||||
"description": "検索と分析のための読み取り専用",
|
||||
"tools": ["bash", "read_file"], # 書き込みなし
|
||||
"prompt": "検索と分析。変更しない。簡潔な要約を返す。"
|
||||
},
|
||||
"code": {
|
||||
"description": "実装のためのフルエージェント",
|
||||
"tools": "*", # すべてのツール
|
||||
"prompt": "効率的に変更を実装。"
|
||||
},
|
||||
"plan": {
|
||||
"description": "計画と分析",
|
||||
"tools": ["bash", "read_file"], # 読み取り専用
|
||||
"prompt": "分析して番号付き計画を出力。ファイルを変更しない。"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Taskツール
|
||||
|
||||
```python
|
||||
{
|
||||
"name": "Task",
|
||||
"description": "集中したサブタスクのためにサブエージェントを生成",
|
||||
"input_schema": {
|
||||
"description": "短いタスク名(3-5語)",
|
||||
"prompt": "詳細な指示",
|
||||
"agent_type": "explore | code | plan"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
メインエージェントがTaskを呼び出す → 子エージェントが実行 → 要約を返す。
|
||||
|
||||
## サブエージェント実行
|
||||
|
||||
Taskツールの心臓部:
|
||||
|
||||
```python
|
||||
def run_task(description, prompt, agent_type):
|
||||
config = AGENT_TYPES[agent_type]
|
||||
|
||||
# 1. エージェント固有のシステムプロンプト
|
||||
sub_system = f"You are a {agent_type} subagent.\n{config['prompt']}"
|
||||
|
||||
# 2. フィルタリングされたツール
|
||||
sub_tools = get_tools_for_agent(agent_type)
|
||||
|
||||
# 3. 分離された履歴(重要: 親コンテキストなし)
|
||||
sub_messages = [{"role": "user", "content": prompt}]
|
||||
|
||||
# 4. 同じクエリループ
|
||||
while True:
|
||||
response = client.messages.create(
|
||||
model=MODEL, system=sub_system,
|
||||
messages=sub_messages, tools=sub_tools
|
||||
)
|
||||
if response.stop_reason != "tool_use":
|
||||
break
|
||||
# ツールを実行、結果を追加...
|
||||
|
||||
# 5. 最終テキストのみを返す
|
||||
return extract_final_text(response)
|
||||
```
|
||||
|
||||
**重要な概念:**
|
||||
|
||||
| 概念 | 実装 |
|
||||
|------|------|
|
||||
| コンテキスト分離 | 新しい`sub_messages = []` |
|
||||
| ツールフィルタリング | `get_tools_for_agent()` |
|
||||
| 専門化された振る舞い | エージェント固有のシステムプロンプト |
|
||||
| 結果の抽象化 | 最終テキストのみ返却 |
|
||||
|
||||
## ツールフィルタリング
|
||||
|
||||
```python
|
||||
def get_tools_for_agent(agent_type):
|
||||
allowed = AGENT_TYPES[agent_type]["tools"]
|
||||
if allowed == "*":
|
||||
return BASE_TOOLS # Taskなし(デモでは再帰なし)
|
||||
return [t for t in BASE_TOOLS if t["name"] in allowed]
|
||||
```
|
||||
|
||||
- `explore`: bashとread_fileのみ
|
||||
- `code`: すべてのツール
|
||||
- `plan`: bashとread_fileのみ
|
||||
|
||||
サブエージェントはTaskツールを取得しない(このデモでは無限再帰を防ぐ)。
|
||||
|
||||
## 進捗表示
|
||||
|
||||
サブエージェント出力はメインチャットを汚染しない:
|
||||
|
||||
```
|
||||
あなた: コードベースを探索して
|
||||
> Task: コードベースを探索
|
||||
[explore] コードベースを探索 ... 5ツール, 3.2秒
|
||||
[explore] コードベースを探索 - 完了 (8ツール, 5.1秒)
|
||||
|
||||
見つかったものはこちら: ...
|
||||
```
|
||||
|
||||
リアルタイム進捗、クリーンな最終出力。
|
||||
|
||||
## 典型的なフロー
|
||||
|
||||
```
|
||||
ユーザー: "認証をJWTを使うようにリファクタリング"
|
||||
|
||||
メインエージェント:
|
||||
1. Task(explore): "認証関連のすべてのファイルを見つける"
|
||||
-> サブエージェントが10ファイルを読む
|
||||
-> 返却: "認証はsrc/auth/login.py、セッションは..."
|
||||
|
||||
2. Task(plan): "JWT移行を設計"
|
||||
-> サブエージェントが構造を分析
|
||||
-> 返却: "1. jwtライブラリを追加 2. トークンユーティリティを作成..."
|
||||
|
||||
3. Task(code): "JWTトークンを実装"
|
||||
-> サブエージェントがコードを書く
|
||||
-> 返却: "jwt_utils.pyを作成、login.pyを更新"
|
||||
|
||||
4. 変更を要約
|
||||
```
|
||||
|
||||
各サブエージェントはクリーンなコンテキストを持つ。メインエージェントは集中を保つ。
|
||||
|
||||
## 比較
|
||||
|
||||
| 側面 | v2 | v3 |
|
||||
|------|----|----|
|
||||
| コンテキスト | 単一、増大 | タスクごとに分離 |
|
||||
| 探索 | 履歴を汚染 | サブエージェントに含まれる |
|
||||
| 並列性 | なし | 可能(デモにはなし) |
|
||||
| 追加コード | 約300行 | 約450行 |
|
||||
|
||||
## パターン
|
||||
|
||||
```
|
||||
複雑なタスク
|
||||
└─ メインエージェント(コーディネーター)
|
||||
├─ サブエージェントA (explore) -> 要約
|
||||
├─ サブエージェントB (plan) -> 計画
|
||||
└─ サブエージェントC (code) -> 結果
|
||||
```
|
||||
|
||||
同じエージェントループ、異なるコンテキスト。これがすべてのトリック。
|
||||
|
||||
---
|
||||
|
||||
**分割統治。コンテキスト分離。**
|
||||
|
||||
[← v2](./v2-構造化プランニング.md) | [READMEに戻る](../README_ja.md) | [v4 →](./v4-スキル機構.md)
|
||||
194
docs/v4-スキル機構.md
194
docs/v4-スキル機構.md
@ -1,194 +0,0 @@
|
||||
# v4: スキル機構
|
||||
|
||||
**コアの洞察: スキルはツールではなく、知識パッケージ。**
|
||||
|
||||
## 知識の外部化: トレーニングから編集へ
|
||||
|
||||
スキルは深いパラダイムシフトを体現する:**知識の外部化**。
|
||||
|
||||
### 従来のアプローチ: パラメータに内在化された知識
|
||||
|
||||
従来のAIシステムはすべての知識をモデルパラメータに保存。アクセス、変更、再利用できない。
|
||||
|
||||
モデルに新しいスキルを学ばせたい?必要なこと:
|
||||
1. 大量の訓練データを収集
|
||||
2. 分散訓練クラスタをセットアップ
|
||||
3. 複雑なパラメータファインチューニング(LoRA、フルファインチューニングなど)
|
||||
4. 新しいモデルバージョンをデプロイ
|
||||
|
||||
### 新しいパラダイム: ドキュメントとして外部化された知識
|
||||
|
||||
コード実行パラダイムがすべてを変える。
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ 知識ストレージ階層 │
|
||||
│ │
|
||||
│ モデルパラメータ → コンテキストウィンドウ → ファイルシステム → スキルライブラリ │
|
||||
│ (内在化) (ランタイム) (永続) (構造化) │
|
||||
│ │
|
||||
│ ←────── トレーニング必要 ──────→ ←─── 自然言語で編集 ────→ │
|
||||
│ クラスタ、データ、専門知識必要 誰でも変更可能 │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**重要なブレークスルー**:
|
||||
- **以前**: モデル動作の変更 = パラメータの変更 = トレーニング必要 = GPUクラスタ + 訓練データ + ML専門知識
|
||||
- **今**: モデル動作の変更 = SKILL.mdの編集 = テキストファイルの編集 = 誰でもできる
|
||||
|
||||
## 問題
|
||||
|
||||
v3でタスク分解のためのサブエージェントを得た。しかしより深い問題がある:**モデルはドメイン固有のタスクをどのように処理するか知っているのか?**
|
||||
|
||||
- PDFを処理?`pdftotext` vs `PyMuPDF`を知る必要がある
|
||||
- MCPサーバーを構築?プロトコル仕様とベストプラクティスが必要
|
||||
- コードレビュー?体系的なチェックリストが必要
|
||||
|
||||
この知識はツールではない—**専門知識**だ。スキルはモデルがオンデマンドでドメイン知識を読み込むことで解決。
|
||||
|
||||
## 重要な概念
|
||||
|
||||
### ツール vs スキル
|
||||
|
||||
| 概念 | 何か | 例 |
|
||||
|------|------|-----|
|
||||
| **ツール** | モデルが何をCAN DO | bash, read_file, write_file |
|
||||
| **スキル** | モデルがどうKNOW TO DO | PDF処理、MCP構築 |
|
||||
|
||||
ツールは能力。スキルは知識。
|
||||
|
||||
### SKILL.md標準
|
||||
|
||||
```
|
||||
skills/
|
||||
├── pdf/
|
||||
│ └── SKILL.md # 必須
|
||||
├── mcp-builder/
|
||||
│ ├── SKILL.md
|
||||
│ └── references/ # オプション
|
||||
└── code-review/
|
||||
├── SKILL.md
|
||||
└── scripts/ # オプション
|
||||
```
|
||||
|
||||
**SKILL.mdフォーマット**: YAMLフロントマター + Markdownボディ
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: pdf
|
||||
description: PDFファイルを処理。PDF読み込み、作成、マージ時に使用。
|
||||
---
|
||||
|
||||
# PDF処理スキル
|
||||
|
||||
## PDFの読み込み
|
||||
|
||||
高速抽出にはpdftotext使用:
|
||||
\`\`\`bash
|
||||
pdftotext input.pdf -
|
||||
\`\`\`
|
||||
...
|
||||
```
|
||||
|
||||
## 実装(約100行追加)
|
||||
|
||||
### SkillLoaderクラス
|
||||
|
||||
```python
|
||||
class SkillLoader:
|
||||
def __init__(self, skills_dir: Path):
|
||||
self.skills = {}
|
||||
self.load_skills()
|
||||
|
||||
def parse_skill_md(self, path: Path) -> dict:
|
||||
"""YAMLフロントマター + Markdownボディをパース。"""
|
||||
content = path.read_text()
|
||||
match = re.match(r'^---\s*\n(.*?)\n---\s*\n(.*)$', content, re.DOTALL)
|
||||
# {name, description, body, path, dir}を返す
|
||||
|
||||
def get_descriptions(self) -> str:
|
||||
"""システムプロンプト用のメタデータを生成。"""
|
||||
return "\n".join(f"- {name}: {skill['description']}"
|
||||
for name, skill in self.skills.items())
|
||||
|
||||
def get_skill_content(self, name: str) -> str:
|
||||
"""コンテキスト注入用のフルコンテンツを取得。"""
|
||||
return f"# Skill: {name}\n\n{skill['body']}"
|
||||
```
|
||||
|
||||
### Skillツール
|
||||
|
||||
```python
|
||||
SKILL_TOOL = {
|
||||
"name": "Skill",
|
||||
"description": "専門知識を得るためにスキルを読み込む。",
|
||||
"input_schema": {
|
||||
"properties": {"skill": {"type": "string"}},
|
||||
"required": ["skill"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### メッセージ注入(キャッシュ保持)
|
||||
|
||||
重要な洞察: スキルコンテンツは**tool_result**(userメッセージの一部)に入る、システムプロンプトではない:
|
||||
|
||||
```python
|
||||
def run_skill(skill_name: str) -> str:
|
||||
content = SKILLS.get_skill_content(skill_name)
|
||||
return f"""<skill-loaded name="{skill_name}">
|
||||
{content}
|
||||
</skill-loaded>
|
||||
|
||||
上記のスキルの指示に従ってください。"""
|
||||
```
|
||||
|
||||
**重要な洞察**:
|
||||
- スキルコンテンツは新しいメッセージとして**末尾に追加**
|
||||
- 前のすべて(システムプロンプト + すべての以前のメッセージ)はキャッシュされ再利用
|
||||
- 新しく追加されたスキルコンテンツのみ計算が必要—**プレフィックス全体がキャッシュヒット**
|
||||
|
||||
## キャッシュ経済学
|
||||
|
||||
### キャッシュを無視するコスト
|
||||
|
||||
多くの開発者がLangGraph、LangChain、AutoGenで習慣的に:
|
||||
- システムプロンプトに動的状態を注入
|
||||
- メッセージ履歴を編集・圧縮
|
||||
- スライディングウィンドウで会話を切り詰め
|
||||
|
||||
**これらの操作はキャッシュを無効化し、コストを7-50倍に爆発させる。**
|
||||
|
||||
典型的な50ラウンドのSWEタスク:
|
||||
- **キャッシュ破壊**: $14.06(毎ラウンドシステムプロンプトを変更)
|
||||
- **キャッシュ最適化**: $1.85(追記のみ)
|
||||
- **節約**: 86.9%
|
||||
|
||||
### アンチパターン
|
||||
|
||||
| アンチパターン | 効果 | コスト乗数 |
|
||||
|--------------|------|-----------|
|
||||
| 動的システムプロンプト | 100%キャッシュミス | **20-50倍** |
|
||||
| メッセージ圧縮 | 置換点から無効化 | **5-15倍** |
|
||||
| スライディングウィンドウ | 100%キャッシュミス | **30-50倍** |
|
||||
| メッセージ編集 | 編集点から無効化 | **10-30倍** |
|
||||
|
||||
## スキル設計ガイドライン
|
||||
|
||||
1. **単一責任**: 1スキル = 1ドメイン
|
||||
2. **自己完結**: 外部参照を最小化
|
||||
3. **アクション指向**: 指示、説明ではなく
|
||||
4. **構造化**: セクションは素早い参照用
|
||||
|
||||
## より深い洞察
|
||||
|
||||
> **知識はドキュメントになり、誰でも教師になれる。**
|
||||
|
||||
従来のファインチューニングは**オフライン学習**: データ収集 -> 訓練 -> デプロイ -> 使用。
|
||||
スキルは**オンライン学習**を可能に: ランタイムでオンデマンド知識を読み込み、即座に有効。
|
||||
|
||||
---
|
||||
|
||||
**スキルは外部化された専門知識。**
|
||||
|
||||
[← v3](./v3-サブエージェント.md) | [READMEに戻る](../README_ja.md)
|
||||
242
provider_utils.py
Normal file
242
provider_utils.py
Normal file
@ -0,0 +1,242 @@
|
||||
"""
|
||||
Provider utilities for multi-provider AI agent support.
|
||||
|
||||
This module provides a unified interface for multiple AI providers (Anthropic, OpenAI, Gemini),
|
||||
allowing the existing agent code (v0-v4) to run unchanged.
|
||||
|
||||
It uses the Adapter Pattern to make OpenAI-compatible clients look exactly like
|
||||
Anthropic clients to the consuming code.
|
||||
"""
|
||||
|
||||
import os
|
||||
import json
|
||||
from typing import Any, Dict, List, Union, Optional
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# =============================================================================
|
||||
# Data Structures (Mimic Anthropic SDK)
|
||||
# =============================================================================
|
||||
|
||||
class ResponseWrapper:
|
||||
"""Wrapper to make OpenAI responses look like Anthropic responses."""
|
||||
def __init__(self, content, stop_reason):
|
||||
self.content = content
|
||||
self.stop_reason = stop_reason
|
||||
|
||||
class ContentBlock:
|
||||
"""Wrapper to make content blocks look like Anthropic content blocks."""
|
||||
def __init__(self, block_type, **kwargs):
|
||||
self.type = block_type
|
||||
for key, value in kwargs.items():
|
||||
setattr(self, key, value)
|
||||
|
||||
def __repr__(self):
|
||||
attrs = ", ".join(f"{k}={v!r}" for k, v in self.__dict__.items())
|
||||
return f"ContentBlock({attrs})"
|
||||
|
||||
# =============================================================================
|
||||
# Adapters
|
||||
# =============================================================================
|
||||
|
||||
class OpenAIAdapter:
|
||||
"""
|
||||
Adapts the OpenAI client to look like an Anthropic client.
|
||||
|
||||
Key Magic:
|
||||
self.messages = self
|
||||
|
||||
This allows the agent code to call:
|
||||
client.messages.create(...)
|
||||
|
||||
which resolves to:
|
||||
adapter.create(...)
|
||||
"""
|
||||
def __init__(self, openai_client):
|
||||
self.client = openai_client
|
||||
self.messages = self # Duck typing: act as the 'messages' resource
|
||||
|
||||
def create(self, model: str, system: str, messages: List[Dict], tools: List[Dict], max_tokens: int = 8000):
|
||||
"""
|
||||
The core translation layer.
|
||||
Converts Anthropic inputs -> OpenAI inputs -> OpenAI API -> Anthropic outputs.
|
||||
"""
|
||||
# 1. Convert Messages (Anthropic -> OpenAI)
|
||||
openai_messages = [{"role": "system", "content": system}]
|
||||
|
||||
for msg in messages:
|
||||
role = msg["role"]
|
||||
content = msg["content"]
|
||||
|
||||
if role == "user":
|
||||
if isinstance(content, str):
|
||||
# Simple text message
|
||||
openai_messages.append({"role": "user", "content": content})
|
||||
elif isinstance(content, list):
|
||||
# Tool results (User role in Anthropic, Tool role in OpenAI)
|
||||
for part in content:
|
||||
if part.get("type") == "tool_result":
|
||||
openai_messages.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": part["tool_use_id"],
|
||||
"content": part["content"] or "(no output)"
|
||||
})
|
||||
# Note: Anthropic user messages can also contain text+image,
|
||||
# but v0-v4 agents don't use that yet.
|
||||
|
||||
elif role == "assistant":
|
||||
if isinstance(content, str):
|
||||
# Simple text message
|
||||
openai_messages.append({"role": "assistant", "content": content})
|
||||
elif isinstance(content, list):
|
||||
# Tool calls (Assistant role)
|
||||
# Anthropic splits thought (text) and tool_use into blocks
|
||||
# OpenAI puts thought in 'content' and tools in 'tool_calls'
|
||||
text_parts = []
|
||||
tool_calls = []
|
||||
|
||||
for part in content:
|
||||
# Handle both dicts and objects (ContentBlock)
|
||||
if isinstance(part, dict):
|
||||
part_type = part.get("type")
|
||||
part_text = part.get("text")
|
||||
part_id = part.get("id")
|
||||
part_name = part.get("name")
|
||||
part_input = part.get("input")
|
||||
else:
|
||||
part_type = getattr(part, "type", None)
|
||||
part_text = getattr(part, "text", None)
|
||||
part_id = getattr(part, "id", None)
|
||||
part_name = getattr(part, "name", None)
|
||||
part_input = getattr(part, "input", None)
|
||||
|
||||
if part_type == "text":
|
||||
text_parts.append(part_text)
|
||||
elif part_type == "tool_use":
|
||||
tool_calls.append({
|
||||
"id": part_id,
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": part_name,
|
||||
"arguments": json.dumps(part_input)
|
||||
}
|
||||
})
|
||||
|
||||
assistant_msg = {"role": "assistant"}
|
||||
if text_parts:
|
||||
assistant_msg["content"] = "\n".join(text_parts)
|
||||
if tool_calls:
|
||||
assistant_msg["tool_calls"] = tool_calls
|
||||
|
||||
openai_messages.append(assistant_msg)
|
||||
|
||||
# 2. Convert Tools (Anthropic -> OpenAI)
|
||||
openai_tools = []
|
||||
for tool in tools:
|
||||
openai_tools.append({
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tool["name"],
|
||||
"description": tool["description"],
|
||||
"parameters": tool["input_schema"]
|
||||
}
|
||||
})
|
||||
|
||||
# 3. Call OpenAI API
|
||||
# Note: Gemini/OpenAI handle max_tokens differently, but usually support the param
|
||||
response = self.client.chat.completions.create(
|
||||
model=model,
|
||||
messages=openai_messages,
|
||||
tools=openai_tools if openai_tools else None,
|
||||
max_tokens=max_tokens
|
||||
)
|
||||
|
||||
# 4. Convert Response (OpenAI -> Anthropic)
|
||||
message = response.choices[0].message
|
||||
content_blocks = []
|
||||
|
||||
# Extract text content
|
||||
if message.content:
|
||||
content_blocks.append(ContentBlock("text", text=message.content))
|
||||
|
||||
# Extract tool calls
|
||||
if message.tool_calls:
|
||||
for tool_call in message.tool_calls:
|
||||
content_blocks.append(ContentBlock(
|
||||
"tool_use",
|
||||
id=tool_call.id,
|
||||
name=tool_call.function.name,
|
||||
input=json.loads(tool_call.function.arguments)
|
||||
))
|
||||
|
||||
# Map stop reasons: OpenAI "stop"/"tool_calls" -> Anthropic "end_turn"/"tool_use"
|
||||
# OpenAI: stop, length, content_filter, tool_calls
|
||||
finish_reason = response.choices[0].finish_reason
|
||||
if finish_reason == "tool_calls":
|
||||
stop_reason = "tool_use"
|
||||
elif finish_reason == "stop":
|
||||
stop_reason = "end_turn"
|
||||
else:
|
||||
stop_reason = finish_reason # Fallback
|
||||
|
||||
return ResponseWrapper(content_blocks, stop_reason)
|
||||
|
||||
# =============================================================================
|
||||
# Factory Functions
|
||||
# =============================================================================
|
||||
|
||||
def get_provider():
|
||||
"""Get the current AI provider from environment variable."""
|
||||
return os.getenv("AI_PROVIDER", "anthropic").lower()
|
||||
|
||||
def get_client():
|
||||
"""
|
||||
Return a client that conforms to the Anthropic interface.
|
||||
|
||||
If AI_PROVIDER is 'anthropic', returns the native Anthropic client.
|
||||
Otherwise, returns an OpenAIAdapter wrapping an OpenAI-compatible client.
|
||||
"""
|
||||
provider = get_provider()
|
||||
|
||||
if provider == "anthropic":
|
||||
from anthropic import Anthropic
|
||||
base_url = os.getenv("ANTHROPIC_BASE_URL")
|
||||
# Return native client - guarantees 100% behavior compatibility
|
||||
return Anthropic(
|
||||
api_key=os.getenv("ANTHROPIC_API_KEY"),
|
||||
base_url=base_url
|
||||
)
|
||||
|
||||
else:
|
||||
# For OpenAI/Gemini, we wrap the client to mimic Anthropic
|
||||
try:
|
||||
from openai import OpenAI
|
||||
except ImportError:
|
||||
raise ImportError("Please install openai: pip install openai")
|
||||
|
||||
if provider == "openai":
|
||||
api_key = os.getenv("OPENAI_API_KEY")
|
||||
base_url = os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1")
|
||||
elif provider == "gemini":
|
||||
api_key = os.getenv("GEMINI_API_KEY")
|
||||
# Gemini OpenAI-compatible endpoint
|
||||
base_url = os.getenv("GEMINI_BASE_URL", "https://generativelanguage.googleapis.com/v1beta/openai/")
|
||||
else:
|
||||
# Generic OpenAI-compatible provider
|
||||
api_key = os.getenv(f"{provider.upper()}_API_KEY")
|
||||
base_url = os.getenv(f"{provider.upper()}_BASE_URL")
|
||||
|
||||
if not api_key:
|
||||
raise ValueError(f"API Key for {provider} is missing. Please check your .env file.")
|
||||
|
||||
raw_client = OpenAI(api_key=api_key, base_url=base_url)
|
||||
return OpenAIAdapter(raw_client)
|
||||
|
||||
def get_model():
|
||||
"""Return model name from environment variable."""
|
||||
model = os.getenv("MODEL_NAME")
|
||||
if not model:
|
||||
raise ValueError("MODEL_NAME environment variable is missing. Please set it in your .env file.")
|
||||
return model
|
||||
@ -1,2 +1,5 @@
|
||||
anthropic>=0.25.0
|
||||
openai>=1.0.0
|
||||
python-dotenv>=1.0.0
|
||||
pygame==2.5.2
|
||||
numpy==1.24.3
|
||||
@ -1,946 +0,0 @@
|
||||
"""
|
||||
Integration tests for learn-claude-code agents.
|
||||
|
||||
Comprehensive agent task tests covering v0-v4 core capabilities.
|
||||
Runs on GitHub Actions (Linux).
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import tempfile
|
||||
import shutil
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
|
||||
def get_client():
|
||||
"""Get OpenAI-compatible client for testing."""
|
||||
from openai import OpenAI
|
||||
api_key = os.getenv("TEST_API_KEY")
|
||||
base_url = os.getenv("TEST_BASE_URL", "https://api.openai-next.com/v1")
|
||||
if not api_key:
|
||||
return None
|
||||
return OpenAI(api_key=api_key, base_url=base_url)
|
||||
|
||||
|
||||
MODEL = os.getenv("TEST_MODEL", "claude-3-5-sonnet-20241022")
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Tool Definitions
|
||||
# =============================================================================
|
||||
|
||||
BASH_TOOL = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "bash",
|
||||
"description": "Run a shell command",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"command": {"type": "string"}},
|
||||
"required": ["command"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
READ_FILE_TOOL = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_file",
|
||||
"description": "Read contents of a file",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"path": {"type": "string"}},
|
||||
"required": ["path"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
WRITE_FILE_TOOL = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "write_file",
|
||||
"description": "Write content to a file (creates or overwrites)",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"path": {"type": "string"},
|
||||
"content": {"type": "string"}
|
||||
},
|
||||
"required": ["path", "content"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
EDIT_FILE_TOOL = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "edit_file",
|
||||
"description": "Replace old_string with new_string in a file",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"path": {"type": "string"},
|
||||
"old_string": {"type": "string"},
|
||||
"new_string": {"type": "string"}
|
||||
},
|
||||
"required": ["path", "old_string", "new_string"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
TODO_WRITE_TOOL = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "TodoWrite",
|
||||
"description": "Update the todo list to track task progress",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"items": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"content": {"type": "string"},
|
||||
"status": {"type": "string", "enum": ["pending", "in_progress", "completed"]},
|
||||
"activeForm": {"type": "string"}
|
||||
},
|
||||
"required": ["content", "status", "activeForm"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"required": ["items"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
V1_TOOLS = [BASH_TOOL, READ_FILE_TOOL, WRITE_FILE_TOOL, EDIT_FILE_TOOL]
|
||||
V2_TOOLS = V1_TOOLS + [TODO_WRITE_TOOL]
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Agent Loop Runner
|
||||
# =============================================================================
|
||||
|
||||
def execute_tool(name, args, workdir):
|
||||
"""Execute a tool and return output."""
|
||||
import subprocess
|
||||
|
||||
if name == "bash":
|
||||
cmd = args.get("command", "")
|
||||
try:
|
||||
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=30, cwd=workdir)
|
||||
return result.stdout + result.stderr or "(empty)"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
elif name == "read_file":
|
||||
path = args.get("path", "")
|
||||
try:
|
||||
with open(path, "r") as f:
|
||||
return f.read()
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
elif name == "write_file":
|
||||
path = args.get("path", "")
|
||||
content = args.get("content", "")
|
||||
try:
|
||||
with open(path, "w") as f:
|
||||
f.write(content)
|
||||
return f"Written {len(content)} bytes to {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
elif name == "edit_file":
|
||||
path = args.get("path", "")
|
||||
old = args.get("old_string", "")
|
||||
new = args.get("new_string", "")
|
||||
try:
|
||||
with open(path, "r") as f:
|
||||
content = f.read()
|
||||
if old not in content:
|
||||
return f"Error: '{old}' not found in file"
|
||||
content = content.replace(old, new, 1)
|
||||
with open(path, "w") as f:
|
||||
f.write(content)
|
||||
return f"Replaced in {path}"
|
||||
except Exception as e:
|
||||
return f"Error: {e}"
|
||||
|
||||
elif name == "TodoWrite":
|
||||
items = args.get("items", [])
|
||||
# Simulate todo tracking
|
||||
result = []
|
||||
for item in items:
|
||||
status_icon = {"pending": "[ ]", "in_progress": "[>]", "completed": "[x]"}.get(item["status"], "[ ]")
|
||||
result.append(f"{status_icon} {item['content']}")
|
||||
return "\n".join(result) + f"\n({len([i for i in items if i['status']=='completed'])}/{len(items)} completed)"
|
||||
|
||||
return f"Unknown tool: {name}"
|
||||
|
||||
|
||||
def run_agent_loop(client, task, tools, workdir=None, max_turns=15, system_prompt=None):
|
||||
"""
|
||||
Run a complete agent loop until done or max_turns.
|
||||
Returns (final_response, tool_calls_made, messages)
|
||||
"""
|
||||
if workdir is None:
|
||||
workdir = os.getcwd()
|
||||
|
||||
if system_prompt is None:
|
||||
system_prompt = f"You are a coding agent at {workdir}. Use tools to complete tasks. Be concise."
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": system_prompt},
|
||||
{"role": "user", "content": task}
|
||||
]
|
||||
|
||||
tool_calls_made = []
|
||||
|
||||
for turn in range(max_turns):
|
||||
response = client.chat.completions.create(
|
||||
model=MODEL,
|
||||
messages=messages,
|
||||
tools=tools,
|
||||
max_tokens=1500
|
||||
)
|
||||
|
||||
message = response.choices[0].message
|
||||
finish_reason = response.choices[0].finish_reason
|
||||
|
||||
if finish_reason == "stop" or not message.tool_calls:
|
||||
return message.content, tool_calls_made, messages
|
||||
|
||||
messages.append({
|
||||
"role": "assistant",
|
||||
"content": message.content,
|
||||
"tool_calls": [
|
||||
{"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
|
||||
for tc in message.tool_calls
|
||||
]
|
||||
})
|
||||
|
||||
for tool_call in message.tool_calls:
|
||||
func_name = tool_call.function.name
|
||||
args = json.loads(tool_call.function.arguments)
|
||||
tool_calls_made.append((func_name, args))
|
||||
|
||||
output = execute_tool(func_name, args, workdir)
|
||||
|
||||
messages.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call.id,
|
||||
"content": output[:5000]
|
||||
})
|
||||
|
||||
return None, tool_calls_made, messages
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# v0 Tests: Bash Only
|
||||
# =============================================================================
|
||||
|
||||
def test_v0_bash_echo():
|
||||
"""v0: Simple bash command execution."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
"Run 'echo hello world' and tell me the output.",
|
||||
[BASH_TOOL]
|
||||
)
|
||||
|
||||
assert len(calls) >= 1, "Should make at least 1 tool call"
|
||||
assert any("echo" in str(c) for c in calls), "Should run echo"
|
||||
assert response and "hello" in response.lower()
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_v0_bash_echo")
|
||||
return True
|
||||
|
||||
|
||||
def test_v0_bash_pipeline():
|
||||
"""v0: Bash pipeline with multiple commands."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create test file
|
||||
with open(os.path.join(tmpdir, "data.txt"), "w") as f:
|
||||
f.write("apple\nbanana\napricot\ncherry\n")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Count how many lines in {tmpdir}/data.txt start with 'a'. Use grep and wc.",
|
||||
[BASH_TOOL],
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert len(calls) >= 1
|
||||
assert response and "2" in response
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_v0_bash_pipeline")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# v1 Tests: 4 Core Tools
|
||||
# =============================================================================
|
||||
|
||||
def test_v1_read_file():
|
||||
"""v1: Read file contents."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
filepath = os.path.join(tmpdir, "secret.txt")
|
||||
with open(filepath, "w") as f:
|
||||
f.write("The secret code is: XYZ123")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Read {filepath} and tell me what the secret code is.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert any(c[0] == "read_file" for c in calls), "Should use read_file"
|
||||
assert response and "XYZ123" in response
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_v1_read_file")
|
||||
return True
|
||||
|
||||
|
||||
def test_v1_write_file():
|
||||
"""v1: Create new file with write_file."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
filepath = os.path.join(tmpdir, "greeting.txt")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Create a file at {filepath} containing 'Hello, Agent!' using write_file tool.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert any(c[0] == "write_file" for c in calls), "Should use write_file"
|
||||
assert os.path.exists(filepath)
|
||||
with open(filepath) as f:
|
||||
content = f.read()
|
||||
assert "Hello" in content
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_v1_write_file")
|
||||
return True
|
||||
|
||||
|
||||
def test_v1_edit_file():
|
||||
"""v1: Edit existing file with edit_file."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
filepath = os.path.join(tmpdir, "config.txt")
|
||||
with open(filepath, "w") as f:
|
||||
f.write("debug=false\nport=8080\n")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Edit {filepath} to change debug=false to debug=true using edit_file tool.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert any(c[0] == "edit_file" for c in calls), "Should use edit_file"
|
||||
with open(filepath) as f:
|
||||
content = f.read()
|
||||
assert "debug=true" in content
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_v1_edit_file")
|
||||
return True
|
||||
|
||||
|
||||
def test_v1_read_edit_verify():
|
||||
"""v1: Multi-tool workflow: read -> edit -> verify."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
filepath = os.path.join(tmpdir, "version.txt")
|
||||
with open(filepath, "w") as f:
|
||||
f.write("version=1.0.0")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"1. Read {filepath}, 2. Change version to 2.0.0, 3. Read it again to verify.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
tool_names = [c[0] for c in calls]
|
||||
assert "read_file" in tool_names, "Should read file"
|
||||
assert "edit_file" in tool_names or "write_file" in tool_names, "Should modify file"
|
||||
|
||||
with open(filepath) as f:
|
||||
content = f.read()
|
||||
assert "2.0.0" in content
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_v1_read_edit_verify")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# v2 Tests: Todo Tracking
|
||||
# =============================================================================
|
||||
|
||||
def test_v2_todo_single_task():
|
||||
"""v2: Agent uses TodoWrite for simple task."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
system = f"""You are a coding agent at {tmpdir}.
|
||||
Use TodoWrite to track tasks. Use write_file to create files. Be concise."""
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Create a file at {tmpdir}/hello.txt with content 'hello'. First use TodoWrite to plan, then use write_file to create the file.",
|
||||
V2_TOOLS,
|
||||
workdir=tmpdir,
|
||||
system_prompt=system,
|
||||
max_turns=10
|
||||
)
|
||||
|
||||
todo_calls = [c for c in calls if c[0] == "TodoWrite"]
|
||||
write_calls = [c for c in calls if c[0] == "write_file"]
|
||||
file_exists = os.path.exists(os.path.join(tmpdir, "hello.txt"))
|
||||
|
||||
print(f"TodoWrite calls: {len(todo_calls)}, write_file calls: {len(write_calls)}")
|
||||
|
||||
# Pass if file created (core functionality)
|
||||
# TodoWrite is optional for simple tasks
|
||||
assert file_exists or len(write_calls) >= 1, "Should attempt to create file"
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_v2_todo_single_task")
|
||||
return True
|
||||
|
||||
|
||||
def test_v2_todo_multi_step():
|
||||
"""v2: Agent uses TodoWrite for multi-step task."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
system = f"""You are a coding agent at {tmpdir}.
|
||||
Use TodoWrite to plan multi-step tasks. Use write_file to create files. Complete all steps."""
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"""Create 3 files in {tmpdir}:
|
||||
1. Use write_file to create a.txt with content 'A'
|
||||
2. Use write_file to create b.txt with content 'B'
|
||||
3. Use write_file to create c.txt with content 'C'
|
||||
Use TodoWrite to track progress. Execute all steps.""",
|
||||
V2_TOOLS,
|
||||
workdir=tmpdir,
|
||||
system_prompt=system,
|
||||
max_turns=25
|
||||
)
|
||||
|
||||
# Check files created
|
||||
files_created = sum(1 for f in ["a.txt", "b.txt", "c.txt"]
|
||||
if os.path.exists(os.path.join(tmpdir, f)))
|
||||
|
||||
write_calls = [c for c in calls if c[0] == "write_file"]
|
||||
todo_calls = [c for c in calls if c[0] == "TodoWrite"]
|
||||
|
||||
print(f"Files created: {files_created}/3, write_file calls: {len(write_calls)}, TodoWrite calls: {len(todo_calls)}")
|
||||
|
||||
# Pass if at least 2 files created or 2 write attempts made
|
||||
assert files_created >= 2 or len(write_calls) >= 2, f"Should create/attempt at least 2 files"
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_v2_todo_multi_step")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Error Handling Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_error_file_not_found():
|
||||
"""Error: Agent handles missing file gracefully."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Read the file {tmpdir}/nonexistent.txt and tell me if it exists.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert response is not None, "Should return a response"
|
||||
# Agent should acknowledge file doesn't exist
|
||||
assert any(word in response.lower() for word in ["not", "error", "exist", "found", "cannot"])
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_error_file_not_found")
|
||||
return True
|
||||
|
||||
|
||||
def test_error_command_fails():
|
||||
"""Error: Agent handles failed command gracefully."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
"Run the command 'nonexistent_command_xyz' and tell me what happens.",
|
||||
[BASH_TOOL]
|
||||
)
|
||||
|
||||
assert response is not None
|
||||
assert any(word in response.lower() for word in ["not found", "error", "fail", "command"])
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_error_command_fails")
|
||||
return True
|
||||
|
||||
|
||||
def test_error_edit_string_not_found():
|
||||
"""Error: Agent handles edit with missing string."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
filepath = os.path.join(tmpdir, "test.txt")
|
||||
with open(filepath, "w") as f:
|
||||
f.write("hello world")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Edit {filepath} to replace 'xyz123' with 'abc'. Tell me if it worked.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert response is not None
|
||||
# Model should report the issue - check for common phrases or that it tried edit
|
||||
resp_lower = response.lower()
|
||||
edit_calls = [c for c in calls if c[0] == "edit_file"]
|
||||
# Either reports error or tried the edit (which returns error in tool result)
|
||||
error_phrases = ["not found", "error", "doesn't", "cannot", "couldn't", "didn't",
|
||||
"wasn't", "unable", "no such", "not exist", "failed", "xyz123"]
|
||||
found_error = any(phrase in resp_lower for phrase in error_phrases)
|
||||
assert found_error or len(edit_calls) >= 1, "Should report error or attempt edit"
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_error_edit_string_not_found")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Complex Workflow Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_workflow_create_python_script():
|
||||
"""Workflow: Create and run a Python script."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Create a Python script at {tmpdir}/calc.py that prints 2+2, then run it with python3.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert os.path.exists(os.path.join(tmpdir, "calc.py")), "Script should exist"
|
||||
tool_names = [c[0] for c in calls]
|
||||
assert "write_file" in tool_names, "Should write file"
|
||||
assert "bash" in tool_names, "Should run bash"
|
||||
assert response and "4" in response
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_workflow_create_python_script")
|
||||
return True
|
||||
|
||||
|
||||
def test_workflow_find_and_replace():
|
||||
"""Workflow: Find files and replace content."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create multiple files
|
||||
for i, content in enumerate(["foo=old", "bar=old", "baz=new"]):
|
||||
with open(os.path.join(tmpdir, f"file{i}.txt"), "w") as f:
|
||||
f.write(content)
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Find all .txt files in {tmpdir} containing 'old' and change 'old' to 'NEW'.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir,
|
||||
max_turns=20
|
||||
)
|
||||
|
||||
# Check modifications
|
||||
modified = 0
|
||||
for i in range(3):
|
||||
with open(os.path.join(tmpdir, f"file{i}.txt")) as f:
|
||||
if "NEW" in f.read():
|
||||
modified += 1
|
||||
|
||||
assert modified >= 2, f"Should modify at least 2 files, got {modified}"
|
||||
|
||||
print(f"Tool calls: {len(calls)}, Files modified: {modified}")
|
||||
print("PASS: test_workflow_find_and_replace")
|
||||
return True
|
||||
|
||||
|
||||
def test_workflow_directory_setup():
|
||||
"""Workflow: Create directory structure with files."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"""In {tmpdir}, create this structure:
|
||||
- src/main.py (content: print('main'))
|
||||
- src/utils.py (content: print('utils'))
|
||||
- README.md (content: '# Project')""",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir,
|
||||
max_turns=20
|
||||
)
|
||||
|
||||
# Check structure
|
||||
checks = [
|
||||
os.path.exists(os.path.join(tmpdir, "src", "main.py")),
|
||||
os.path.exists(os.path.join(tmpdir, "src", "utils.py")),
|
||||
os.path.exists(os.path.join(tmpdir, "README.md")),
|
||||
]
|
||||
|
||||
passed = sum(checks)
|
||||
assert passed >= 2, f"Should create at least 2/3 items, got {passed}"
|
||||
|
||||
print(f"Tool calls: {len(calls)}, Items created: {passed}/3")
|
||||
print("PASS: test_workflow_directory_setup")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Edge Case Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_edge_unicode_content():
|
||||
"""Edge case: Handle unicode content in files."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
unicode_content = "Hello World\nChinese: \u4e2d\u6587\nEmoji: \u2728\nJapanese: \u3053\u3093\u306b\u3061\u306f"
|
||||
filepath = os.path.join(tmpdir, "unicode.txt")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Create a file at {filepath} with this content:\n{unicode_content}\nThen read it back and confirm the content.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert os.path.exists(filepath), "File should exist"
|
||||
with open(filepath, encoding='utf-8') as f:
|
||||
content = f.read()
|
||||
# Check at least some unicode preserved
|
||||
assert "\u4e2d" in content or "Chinese" in content or len(content) > 10
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_edge_unicode_content")
|
||||
return True
|
||||
|
||||
|
||||
def test_edge_empty_file():
|
||||
"""Edge case: Handle empty file operations."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create empty file
|
||||
filepath = os.path.join(tmpdir, "empty.txt")
|
||||
with open(filepath, "w") as f:
|
||||
pass
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Read the file {filepath} and tell me if it's empty or has content.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert response is not None
|
||||
assert any(w in response.lower() for w in ["empty", "no content", "nothing", "0 bytes", "blank"])
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_edge_empty_file")
|
||||
return True
|
||||
|
||||
|
||||
def test_edge_special_chars_in_content():
|
||||
"""Edge case: Handle special characters in file content."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
special_content = 'line1\nline with "quotes"\nline with $variable\nline with `backticks`'
|
||||
filepath = os.path.join(tmpdir, "special.txt")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Create a file at {filepath} containing special characters like quotes, dollar signs, and backticks. Content:\n{special_content}",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert os.path.exists(filepath), "File should exist"
|
||||
with open(filepath) as f:
|
||||
content = f.read()
|
||||
# Should have at least some content
|
||||
assert len(content) > 5
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_edge_special_chars_in_content")
|
||||
return True
|
||||
|
||||
|
||||
def test_edge_multiline_edit():
|
||||
"""Edge case: Edit operation spanning multiple lines."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
filepath = os.path.join(tmpdir, "multi.txt")
|
||||
original = """def old_function():
|
||||
# old implementation
|
||||
return "old"
|
||||
"""
|
||||
with open(filepath, "w") as f:
|
||||
f.write(original)
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"In {filepath}, replace the entire function 'old_function' with a new function called 'new_function' that returns 'new'.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
with open(filepath) as f:
|
||||
content = f.read()
|
||||
assert "new" in content.lower()
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_edge_multiline_edit")
|
||||
return True
|
||||
|
||||
|
||||
def test_edge_nested_directory():
|
||||
"""Edge case: Create deeply nested directory structure."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
deep_path = os.path.join(tmpdir, "a", "b", "c", "deep.txt")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Create a file at {deep_path} with content 'deep content'. The directories may not exist yet.",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
# Check if file was created (via write_file or bash mkdir -p)
|
||||
file_exists = os.path.exists(deep_path)
|
||||
dir_exists = os.path.exists(os.path.join(tmpdir, "a", "b", "c"))
|
||||
|
||||
assert file_exists or dir_exists, "Should create nested structure"
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_edge_nested_directory")
|
||||
return True
|
||||
|
||||
|
||||
def test_edge_large_output():
|
||||
"""Edge case: Handle large command output."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create a file with many lines
|
||||
filepath = os.path.join(tmpdir, "large.txt")
|
||||
with open(filepath, "w") as f:
|
||||
for i in range(500):
|
||||
f.write(f"Line {i}: This is a test line with some content.\n")
|
||||
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"Count the number of lines in {filepath}.",
|
||||
[BASH_TOOL],
|
||||
workdir=tmpdir
|
||||
)
|
||||
|
||||
assert response is not None
|
||||
assert "500" in response or "lines" in response.lower()
|
||||
|
||||
print(f"Tool calls: {len(calls)}")
|
||||
print("PASS: test_edge_large_output")
|
||||
return True
|
||||
|
||||
|
||||
def test_edge_concurrent_files():
|
||||
"""Edge case: Create multiple files in sequence."""
|
||||
client = get_client()
|
||||
if not client:
|
||||
print("SKIP: No API key")
|
||||
return True
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
response, calls, _ = run_agent_loop(
|
||||
client,
|
||||
f"""Create 5 numbered files in {tmpdir}:
|
||||
- file1.txt with content '1'
|
||||
- file2.txt with content '2'
|
||||
- file3.txt with content '3'
|
||||
- file4.txt with content '4'
|
||||
- file5.txt with content '5'
|
||||
Do this as efficiently as possible.""",
|
||||
V1_TOOLS,
|
||||
workdir=tmpdir,
|
||||
max_turns=20
|
||||
)
|
||||
|
||||
files_created = sum(1 for i in range(1, 6)
|
||||
if os.path.exists(os.path.join(tmpdir, f"file{i}.txt")))
|
||||
|
||||
assert files_created >= 4, f"Should create at least 4/5 files, got {files_created}"
|
||||
|
||||
print(f"Tool calls: {len(calls)}, Files created: {files_created}/5")
|
||||
print("PASS: test_edge_concurrent_files")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Main
|
||||
# =============================================================================
|
||||
|
||||
if __name__ == "__main__":
|
||||
tests = [
|
||||
# v0: Bash only
|
||||
test_v0_bash_echo,
|
||||
test_v0_bash_pipeline,
|
||||
# v1: 4 core tools
|
||||
test_v1_read_file,
|
||||
test_v1_write_file,
|
||||
test_v1_edit_file,
|
||||
test_v1_read_edit_verify,
|
||||
# v2: Todo tracking
|
||||
test_v2_todo_single_task,
|
||||
test_v2_todo_multi_step,
|
||||
# Error handling
|
||||
test_error_file_not_found,
|
||||
test_error_command_fails,
|
||||
test_error_edit_string_not_found,
|
||||
# Complex workflows
|
||||
test_workflow_create_python_script,
|
||||
test_workflow_find_and_replace,
|
||||
test_workflow_directory_setup,
|
||||
# Edge cases
|
||||
test_edge_unicode_content,
|
||||
test_edge_empty_file,
|
||||
test_edge_special_chars_in_content,
|
||||
test_edge_multiline_edit,
|
||||
test_edge_nested_directory,
|
||||
test_edge_large_output,
|
||||
test_edge_concurrent_files,
|
||||
]
|
||||
|
||||
failed = []
|
||||
for test_fn in tests:
|
||||
name = test_fn.__name__
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Running: {name}")
|
||||
print('='*60)
|
||||
try:
|
||||
if not test_fn():
|
||||
failed.append(name)
|
||||
except Exception as e:
|
||||
print(f"FAILED: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
failed.append(name)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Results: {len(tests) - len(failed)}/{len(tests)} passed")
|
||||
print('='*60)
|
||||
|
||||
if failed:
|
||||
print(f"FAILED: {failed}")
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("All integration tests passed!")
|
||||
sys.exit(0)
|
||||
@ -1,644 +0,0 @@
|
||||
"""
|
||||
Unit tests for learn-claude-code agents.
|
||||
|
||||
These tests don't require API calls - they verify code structure and logic.
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import importlib.util
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Import Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_imports():
|
||||
"""Test that all agent modules can be imported."""
|
||||
agents = [
|
||||
"v0_bash_agent",
|
||||
"v0_bash_agent_mini",
|
||||
"v1_basic_agent",
|
||||
"v2_todo_agent",
|
||||
"v3_subagent",
|
||||
"v4_skills_agent"
|
||||
]
|
||||
|
||||
for agent in agents:
|
||||
spec = importlib.util.find_spec(agent)
|
||||
assert spec is not None, f"Failed to find {agent}"
|
||||
print(f" Found: {agent}")
|
||||
|
||||
print("PASS: test_imports")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# TodoManager Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_todo_manager_basic():
|
||||
"""Test TodoManager basic operations."""
|
||||
from v2_todo_agent import TodoManager
|
||||
|
||||
tm = TodoManager()
|
||||
|
||||
# Test valid update
|
||||
result = tm.update([
|
||||
{"content": "Task 1", "status": "pending", "activeForm": "Doing task 1"},
|
||||
{"content": "Task 2", "status": "in_progress", "activeForm": "Doing task 2"},
|
||||
])
|
||||
|
||||
assert "Task 1" in result
|
||||
assert "Task 2" in result
|
||||
assert len(tm.items) == 2
|
||||
|
||||
print("PASS: test_todo_manager_basic")
|
||||
return True
|
||||
|
||||
|
||||
def test_todo_manager_constraints():
|
||||
"""Test TodoManager enforces constraints."""
|
||||
from v2_todo_agent import TodoManager
|
||||
|
||||
tm = TodoManager()
|
||||
|
||||
# Test: only one in_progress allowed (should raise or return error)
|
||||
try:
|
||||
result = tm.update([
|
||||
{"content": "Task 1", "status": "in_progress", "activeForm": "Doing 1"},
|
||||
{"content": "Task 2", "status": "in_progress", "activeForm": "Doing 2"},
|
||||
])
|
||||
# If no exception, check result contains error
|
||||
assert "Error" in result or "error" in result.lower()
|
||||
except ValueError as e:
|
||||
# Exception is expected - constraint enforced
|
||||
assert "in_progress" in str(e).lower()
|
||||
|
||||
# Test: max 20 items
|
||||
tm2 = TodoManager()
|
||||
many_items = [{"content": f"Task {i}", "status": "pending", "activeForm": f"Doing {i}"} for i in range(25)]
|
||||
try:
|
||||
tm2.update(many_items)
|
||||
except ValueError:
|
||||
pass # Exception is fine
|
||||
assert len(tm2.items) <= 20
|
||||
|
||||
print("PASS: test_todo_manager_constraints")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Reminder Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_reminder_constants():
|
||||
"""Test reminder constants are defined correctly."""
|
||||
from v2_todo_agent import INITIAL_REMINDER, NAG_REMINDER
|
||||
|
||||
assert "<reminder>" in INITIAL_REMINDER
|
||||
assert "</reminder>" in INITIAL_REMINDER
|
||||
assert "<reminder>" in NAG_REMINDER
|
||||
assert "</reminder>" in NAG_REMINDER
|
||||
assert "todo" in NAG_REMINDER.lower() or "Todo" in NAG_REMINDER
|
||||
|
||||
print("PASS: test_reminder_constants")
|
||||
return True
|
||||
|
||||
|
||||
def test_nag_reminder_in_agent_loop():
|
||||
"""Test NAG_REMINDER injection is inside agent_loop."""
|
||||
import inspect
|
||||
from v2_todo_agent import agent_loop, NAG_REMINDER
|
||||
|
||||
source = inspect.getsource(agent_loop)
|
||||
|
||||
# NAG_REMINDER should be referenced in agent_loop
|
||||
assert "NAG_REMINDER" in source, "NAG_REMINDER should be in agent_loop"
|
||||
assert "rounds_without_todo" in source, "rounds_without_todo check should be in agent_loop"
|
||||
assert "results.insert" in source or "results.append" in source, "Should inject into results"
|
||||
|
||||
print("PASS: test_nag_reminder_in_agent_loop")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Configuration Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_env_config():
|
||||
"""Test environment variable configuration."""
|
||||
# Save original values
|
||||
orig_model = os.environ.get("MODEL_ID")
|
||||
orig_base = os.environ.get("ANTHROPIC_BASE_URL")
|
||||
|
||||
try:
|
||||
# Set test values
|
||||
os.environ["MODEL_ID"] = "test-model-123"
|
||||
os.environ["ANTHROPIC_BASE_URL"] = "https://test.example.com"
|
||||
|
||||
# Re-import to pick up new env vars
|
||||
import importlib
|
||||
import v1_basic_agent
|
||||
importlib.reload(v1_basic_agent)
|
||||
|
||||
assert v1_basic_agent.MODEL == "test-model-123", f"MODEL should be test-model-123, got {v1_basic_agent.MODEL}"
|
||||
|
||||
print("PASS: test_env_config")
|
||||
return True
|
||||
|
||||
finally:
|
||||
# Restore original values
|
||||
if orig_model:
|
||||
os.environ["MODEL_ID"] = orig_model
|
||||
else:
|
||||
os.environ.pop("MODEL_ID", None)
|
||||
if orig_base:
|
||||
os.environ["ANTHROPIC_BASE_URL"] = orig_base
|
||||
else:
|
||||
os.environ.pop("ANTHROPIC_BASE_URL", None)
|
||||
|
||||
|
||||
def test_default_model():
|
||||
"""Test default model when env var not set."""
|
||||
orig = os.environ.pop("MODEL_ID", None)
|
||||
|
||||
try:
|
||||
import importlib
|
||||
import v1_basic_agent
|
||||
importlib.reload(v1_basic_agent)
|
||||
|
||||
assert "claude" in v1_basic_agent.MODEL.lower(), f"Default model should contain 'claude': {v1_basic_agent.MODEL}"
|
||||
|
||||
print("PASS: test_default_model")
|
||||
return True
|
||||
|
||||
finally:
|
||||
if orig:
|
||||
os.environ["MODEL_ID"] = orig
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Tool Schema Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_tool_schemas():
|
||||
"""Test tool schemas are valid."""
|
||||
from v1_basic_agent import TOOLS
|
||||
|
||||
required_tools = {"bash", "read_file", "write_file", "edit_file"}
|
||||
tool_names = {t["name"] for t in TOOLS}
|
||||
|
||||
assert required_tools.issubset(tool_names), f"Missing tools: {required_tools - tool_names}"
|
||||
|
||||
for tool in TOOLS:
|
||||
assert "name" in tool
|
||||
assert "description" in tool
|
||||
assert "input_schema" in tool
|
||||
assert tool["input_schema"].get("type") == "object"
|
||||
|
||||
print("PASS: test_tool_schemas")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# TodoManager Edge Case Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_todo_manager_empty_list():
|
||||
"""Test TodoManager handles empty list."""
|
||||
from v2_todo_agent import TodoManager
|
||||
|
||||
tm = TodoManager()
|
||||
result = tm.update([])
|
||||
|
||||
assert "No todos" in result or len(tm.items) == 0
|
||||
print("PASS: test_todo_manager_empty_list")
|
||||
return True
|
||||
|
||||
|
||||
def test_todo_manager_status_transitions():
|
||||
"""Test TodoManager status transitions."""
|
||||
from v2_todo_agent import TodoManager
|
||||
|
||||
tm = TodoManager()
|
||||
|
||||
# Start with pending
|
||||
tm.update([{"content": "Task", "status": "pending", "activeForm": "Doing task"}])
|
||||
assert tm.items[0]["status"] == "pending"
|
||||
|
||||
# Move to in_progress
|
||||
tm.update([{"content": "Task", "status": "in_progress", "activeForm": "Doing task"}])
|
||||
assert tm.items[0]["status"] == "in_progress"
|
||||
|
||||
# Complete
|
||||
tm.update([{"content": "Task", "status": "completed", "activeForm": "Doing task"}])
|
||||
assert tm.items[0]["status"] == "completed"
|
||||
|
||||
print("PASS: test_todo_manager_status_transitions")
|
||||
return True
|
||||
|
||||
|
||||
def test_todo_manager_missing_fields():
|
||||
"""Test TodoManager rejects items with missing fields."""
|
||||
from v2_todo_agent import TodoManager
|
||||
|
||||
tm = TodoManager()
|
||||
|
||||
# Missing content
|
||||
try:
|
||||
tm.update([{"status": "pending", "activeForm": "Doing"}])
|
||||
assert False, "Should reject missing content"
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# Missing activeForm
|
||||
try:
|
||||
tm.update([{"content": "Task", "status": "pending"}])
|
||||
assert False, "Should reject missing activeForm"
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
print("PASS: test_todo_manager_missing_fields")
|
||||
return True
|
||||
|
||||
|
||||
def test_todo_manager_invalid_status():
|
||||
"""Test TodoManager rejects invalid status values."""
|
||||
from v2_todo_agent import TodoManager
|
||||
|
||||
tm = TodoManager()
|
||||
|
||||
try:
|
||||
tm.update([{"content": "Task", "status": "invalid", "activeForm": "Doing"}])
|
||||
assert False, "Should reject invalid status"
|
||||
except ValueError as e:
|
||||
assert "status" in str(e).lower()
|
||||
|
||||
print("PASS: test_todo_manager_invalid_status")
|
||||
return True
|
||||
|
||||
|
||||
def test_todo_manager_render_format():
|
||||
"""Test TodoManager render format."""
|
||||
from v2_todo_agent import TodoManager
|
||||
|
||||
tm = TodoManager()
|
||||
tm.update([
|
||||
{"content": "Task A", "status": "completed", "activeForm": "A"},
|
||||
{"content": "Task B", "status": "in_progress", "activeForm": "B"},
|
||||
{"content": "Task C", "status": "pending", "activeForm": "C"},
|
||||
])
|
||||
|
||||
result = tm.render()
|
||||
assert "[x] Task A" in result
|
||||
assert "[>] Task B" in result
|
||||
assert "[ ] Task C" in result
|
||||
assert "1/3" in result # Format may vary: "done" or "completed"
|
||||
|
||||
print("PASS: test_todo_manager_render_format")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# v3 Agent Type Registry Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_v3_agent_types_structure():
|
||||
"""Test v3 AGENT_TYPES structure."""
|
||||
from v3_subagent import AGENT_TYPES
|
||||
|
||||
required_types = {"explore", "code", "plan"}
|
||||
assert set(AGENT_TYPES.keys()) == required_types
|
||||
|
||||
for name, config in AGENT_TYPES.items():
|
||||
assert "description" in config, f"{name} missing description"
|
||||
assert "tools" in config, f"{name} missing tools"
|
||||
assert "prompt" in config, f"{name} missing prompt"
|
||||
|
||||
print("PASS: test_v3_agent_types_structure")
|
||||
return True
|
||||
|
||||
|
||||
def test_v3_get_tools_for_agent():
|
||||
"""Test v3 get_tools_for_agent filters correctly."""
|
||||
from v3_subagent import get_tools_for_agent, BASE_TOOLS
|
||||
|
||||
# explore: read-only
|
||||
explore_tools = get_tools_for_agent("explore")
|
||||
explore_names = {t["name"] for t in explore_tools}
|
||||
assert "bash" in explore_names
|
||||
assert "read_file" in explore_names
|
||||
assert "write_file" not in explore_names
|
||||
assert "edit_file" not in explore_names
|
||||
|
||||
# code: all base tools
|
||||
code_tools = get_tools_for_agent("code")
|
||||
assert len(code_tools) == len(BASE_TOOLS)
|
||||
|
||||
# plan: read-only
|
||||
plan_tools = get_tools_for_agent("plan")
|
||||
plan_names = {t["name"] for t in plan_tools}
|
||||
assert "write_file" not in plan_names
|
||||
|
||||
print("PASS: test_v3_get_tools_for_agent")
|
||||
return True
|
||||
|
||||
|
||||
def test_v3_get_agent_descriptions():
|
||||
"""Test v3 get_agent_descriptions output."""
|
||||
from v3_subagent import get_agent_descriptions
|
||||
|
||||
desc = get_agent_descriptions()
|
||||
assert "explore" in desc
|
||||
assert "code" in desc
|
||||
assert "plan" in desc
|
||||
assert "Read-only" in desc or "read" in desc.lower()
|
||||
|
||||
print("PASS: test_v3_get_agent_descriptions")
|
||||
return True
|
||||
|
||||
|
||||
def test_v3_task_tool_schema():
|
||||
"""Test v3 Task tool schema."""
|
||||
from v3_subagent import TASK_TOOL, AGENT_TYPES
|
||||
|
||||
assert TASK_TOOL["name"] == "Task"
|
||||
schema = TASK_TOOL["input_schema"]
|
||||
assert "description" in schema["properties"]
|
||||
assert "prompt" in schema["properties"]
|
||||
assert "agent_type" in schema["properties"]
|
||||
assert set(schema["properties"]["agent_type"]["enum"]) == set(AGENT_TYPES.keys())
|
||||
|
||||
print("PASS: test_v3_task_tool_schema")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# v4 SkillLoader Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_v4_skill_loader_init():
|
||||
"""Test v4 SkillLoader initialization."""
|
||||
from v4_skills_agent import SkillLoader
|
||||
from pathlib import Path
|
||||
import tempfile
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Empty skills dir
|
||||
loader = SkillLoader(Path(tmpdir))
|
||||
assert len(loader.skills) == 0
|
||||
|
||||
print("PASS: test_v4_skill_loader_init")
|
||||
return True
|
||||
|
||||
|
||||
def test_v4_skill_loader_parse_valid():
|
||||
"""Test v4 SkillLoader parses valid SKILL.md."""
|
||||
from v4_skills_agent import SkillLoader
|
||||
from pathlib import Path
|
||||
import tempfile
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
skill_dir = Path(tmpdir) / "test-skill"
|
||||
skill_dir.mkdir()
|
||||
|
||||
skill_md = skill_dir / "SKILL.md"
|
||||
skill_md.write_text("""---
|
||||
name: test
|
||||
description: A test skill for testing
|
||||
---
|
||||
|
||||
# Test Skill
|
||||
|
||||
This is the body content.
|
||||
""")
|
||||
|
||||
loader = SkillLoader(Path(tmpdir))
|
||||
assert "test" in loader.skills
|
||||
assert loader.skills["test"]["description"] == "A test skill for testing"
|
||||
assert "body content" in loader.skills["test"]["body"]
|
||||
|
||||
print("PASS: test_v4_skill_loader_parse_valid")
|
||||
return True
|
||||
|
||||
|
||||
def test_v4_skill_loader_parse_invalid():
|
||||
"""Test v4 SkillLoader rejects invalid SKILL.md."""
|
||||
from v4_skills_agent import SkillLoader
|
||||
from pathlib import Path
|
||||
import tempfile
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
skill_dir = Path(tmpdir) / "bad-skill"
|
||||
skill_dir.mkdir()
|
||||
|
||||
# Missing frontmatter
|
||||
skill_md = skill_dir / "SKILL.md"
|
||||
skill_md.write_text("# No frontmatter\n\nJust content.")
|
||||
|
||||
loader = SkillLoader(Path(tmpdir))
|
||||
assert "bad-skill" not in loader.skills
|
||||
|
||||
print("PASS: test_v4_skill_loader_parse_invalid")
|
||||
return True
|
||||
|
||||
|
||||
def test_v4_skill_loader_get_content():
|
||||
"""Test v4 SkillLoader get_skill_content."""
|
||||
from v4_skills_agent import SkillLoader
|
||||
from pathlib import Path
|
||||
import tempfile
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
skill_dir = Path(tmpdir) / "demo"
|
||||
skill_dir.mkdir()
|
||||
|
||||
(skill_dir / "SKILL.md").write_text("""---
|
||||
name: demo
|
||||
description: Demo skill
|
||||
---
|
||||
|
||||
# Demo Instructions
|
||||
|
||||
Step 1: Do this
|
||||
Step 2: Do that
|
||||
""")
|
||||
|
||||
# Add resources
|
||||
scripts_dir = skill_dir / "scripts"
|
||||
scripts_dir.mkdir()
|
||||
(scripts_dir / "helper.sh").write_text("#!/bin/bash\necho hello")
|
||||
|
||||
loader = SkillLoader(Path(tmpdir))
|
||||
|
||||
content = loader.get_skill_content("demo")
|
||||
assert content is not None
|
||||
assert "Demo Instructions" in content
|
||||
assert "helper.sh" in content # Resources listed
|
||||
|
||||
# Non-existent skill
|
||||
assert loader.get_skill_content("nonexistent") is None
|
||||
|
||||
print("PASS: test_v4_skill_loader_get_content")
|
||||
return True
|
||||
|
||||
|
||||
def test_v4_skill_loader_list_skills():
|
||||
"""Test v4 SkillLoader list_skills."""
|
||||
from v4_skills_agent import SkillLoader
|
||||
from pathlib import Path
|
||||
import tempfile
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Create two skills
|
||||
for name in ["alpha", "beta"]:
|
||||
skill_dir = Path(tmpdir) / name
|
||||
skill_dir.mkdir()
|
||||
(skill_dir / "SKILL.md").write_text(f"""---
|
||||
name: {name}
|
||||
description: {name} skill
|
||||
---
|
||||
|
||||
Content for {name}
|
||||
""")
|
||||
|
||||
loader = SkillLoader(Path(tmpdir))
|
||||
skills = loader.list_skills()
|
||||
assert "alpha" in skills
|
||||
assert "beta" in skills
|
||||
assert len(skills) == 2
|
||||
|
||||
print("PASS: test_v4_skill_loader_list_skills")
|
||||
return True
|
||||
|
||||
|
||||
def test_v4_skill_tool_schema():
|
||||
"""Test v4 Skill tool schema."""
|
||||
from v4_skills_agent import SKILL_TOOL
|
||||
|
||||
assert SKILL_TOOL["name"] == "Skill"
|
||||
schema = SKILL_TOOL["input_schema"]
|
||||
assert "skill" in schema["properties"]
|
||||
assert "skill" in schema["required"]
|
||||
|
||||
print("PASS: test_v4_skill_tool_schema")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Path Safety Tests
|
||||
# =============================================================================
|
||||
|
||||
def test_v3_safe_path():
|
||||
"""Test v3 safe_path prevents path traversal."""
|
||||
from v3_subagent import safe_path, WORKDIR
|
||||
|
||||
# Valid path
|
||||
p = safe_path("test.txt")
|
||||
assert str(p).startswith(str(WORKDIR))
|
||||
|
||||
# Path traversal attempt
|
||||
try:
|
||||
safe_path("../../../etc/passwd")
|
||||
assert False, "Should reject path traversal"
|
||||
except ValueError as e:
|
||||
assert "escape" in str(e).lower()
|
||||
|
||||
print("PASS: test_v3_safe_path")
|
||||
return True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Configuration Tests (Extended)
|
||||
# =============================================================================
|
||||
|
||||
def test_base_url_config():
|
||||
"""Test ANTHROPIC_BASE_URL configuration."""
|
||||
orig = os.environ.get("ANTHROPIC_BASE_URL")
|
||||
|
||||
try:
|
||||
os.environ["ANTHROPIC_BASE_URL"] = "https://custom.api.com"
|
||||
|
||||
import importlib
|
||||
import v1_basic_agent
|
||||
importlib.reload(v1_basic_agent)
|
||||
|
||||
# Check client was created (we can't easily verify base_url without mocking)
|
||||
assert v1_basic_agent.client is not None
|
||||
|
||||
print("PASS: test_base_url_config")
|
||||
return True
|
||||
|
||||
finally:
|
||||
if orig:
|
||||
os.environ["ANTHROPIC_BASE_URL"] = orig
|
||||
else:
|
||||
os.environ.pop("ANTHROPIC_BASE_URL", None)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Main
|
||||
# =============================================================================
|
||||
|
||||
if __name__ == "__main__":
|
||||
tests = [
|
||||
# Basic tests
|
||||
test_imports,
|
||||
test_todo_manager_basic,
|
||||
test_todo_manager_constraints,
|
||||
test_reminder_constants,
|
||||
test_nag_reminder_in_agent_loop,
|
||||
test_env_config,
|
||||
test_default_model,
|
||||
test_tool_schemas,
|
||||
# TodoManager edge cases
|
||||
test_todo_manager_empty_list,
|
||||
test_todo_manager_status_transitions,
|
||||
test_todo_manager_missing_fields,
|
||||
test_todo_manager_invalid_status,
|
||||
test_todo_manager_render_format,
|
||||
# v3 tests
|
||||
test_v3_agent_types_structure,
|
||||
test_v3_get_tools_for_agent,
|
||||
test_v3_get_agent_descriptions,
|
||||
test_v3_task_tool_schema,
|
||||
# v4 tests
|
||||
test_v4_skill_loader_init,
|
||||
test_v4_skill_loader_parse_valid,
|
||||
test_v4_skill_loader_parse_invalid,
|
||||
test_v4_skill_loader_get_content,
|
||||
test_v4_skill_loader_list_skills,
|
||||
test_v4_skill_tool_schema,
|
||||
# Security tests
|
||||
test_v3_safe_path,
|
||||
# Config tests
|
||||
test_base_url_config,
|
||||
]
|
||||
|
||||
failed = []
|
||||
for test_fn in tests:
|
||||
name = test_fn.__name__
|
||||
print(f"\n{'='*50}")
|
||||
print(f"Running: {name}")
|
||||
print('='*50)
|
||||
try:
|
||||
if not test_fn():
|
||||
failed.append(name)
|
||||
except Exception as e:
|
||||
print(f"FAILED: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
failed.append(name)
|
||||
|
||||
print(f"\n{'='*50}")
|
||||
print(f"Results: {len(tests) - len(failed)}/{len(tests)} passed")
|
||||
print('='*50)
|
||||
|
||||
if failed:
|
||||
print(f"FAILED: {failed}")
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("All unit tests passed!")
|
||||
sys.exit(0)
|
||||
@ -47,17 +47,14 @@ Usage:
|
||||
python v0_bash_agent.py "explore src/ and summarize"
|
||||
"""
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
from provider_utils import get_client, get_model
|
||||
import subprocess
|
||||
import sys
|
||||
import os
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
# Initialize Anthropic client (uses ANTHROPIC_API_KEY and ANTHROPIC_BASE_URL env vars)
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.getenv("MODEL_ID", "claude-sonnet-4-5-20250929")
|
||||
# Initialize API client and model using provider utilities
|
||||
client = get_client()
|
||||
MODEL = get_model()
|
||||
|
||||
# The ONE tool that does everything
|
||||
# Notice how the description teaches the model common patterns AND how to spawn subagents
|
||||
|
||||
@ -1,7 +1,7 @@
|
||||
#!/usr/bin/env python
|
||||
"""v0_bash_agent_mini.py - Mini Claude Code (Compact)"""
|
||||
from anthropic import Anthropic; from dotenv import load_dotenv; import subprocess as sp, sys, os
|
||||
load_dotenv(override=True); C = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")); M = os.getenv("MODEL_ID", "claude-sonnet-4-5-20250929")
|
||||
from provider_utils import get_client, get_model; import subprocess as sp, sys, os
|
||||
C = get_client(); M = get_model()
|
||||
T = [{"name":"bash","description":"Shell cmd. Read:cat/grep/find/rg/ls. Write:echo>/sed. Subagent(for complex subtask): python v0_bash_agent_mini.py 'task'","input_schema":{"type":"object","properties":{"command":{"type":"string"}},"required":["command"]}}]
|
||||
S = f"CLI agent at {os.getcwd()}. Use bash to solve problems. Spawn subagent for complex subtasks: python v0_bash_agent_mini.py 'task'. Subagent isolates context and returns summary. Be concise."
|
||||
|
||||
|
||||
@ -51,10 +51,16 @@ import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
# Load configuration from .env file
|
||||
load_dotenv()
|
||||
|
||||
# Import unified client provider
|
||||
try:
|
||||
from provider_utils import get_client, get_model
|
||||
except ImportError:
|
||||
sys.exit("Error: provider_utils.py not found. Please ensure you are in the project root.")
|
||||
|
||||
|
||||
# =============================================================================
|
||||
@ -62,8 +68,8 @@ load_dotenv(override=True)
|
||||
# =============================================================================
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
MODEL = os.getenv("MODEL_ID", "claude-sonnet-4-5-20250929")
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = get_model()
|
||||
client = get_client()
|
||||
|
||||
|
||||
# =============================================================================
|
||||
|
||||
@ -61,10 +61,14 @@ import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
load_dotenv()
|
||||
|
||||
try:
|
||||
from provider_utils import get_client, get_model
|
||||
except ImportError:
|
||||
sys.exit("Error: provider_utils.py not found. Please ensure you are in the project root.")
|
||||
|
||||
|
||||
# =============================================================================
|
||||
@ -73,8 +77,8 @@ load_dotenv(override=True)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.getenv("MODEL_ID", "claude-sonnet-4-5-20250929")
|
||||
client = get_client()
|
||||
MODEL = get_model()
|
||||
|
||||
|
||||
# =============================================================================
|
||||
@ -414,7 +418,7 @@ def agent_loop(messages: list) -> list:
|
||||
|
||||
Same core loop as v1, but now we track whether the model
|
||||
is using todos. If it goes too long without updating,
|
||||
we inject a reminder into the next user message (tool results).
|
||||
we'll inject a reminder in the main() function.
|
||||
"""
|
||||
global rounds_without_todo
|
||||
|
||||
@ -464,12 +468,6 @@ def agent_loop(messages: list) -> list:
|
||||
rounds_without_todo += 1
|
||||
|
||||
messages.append({"role": "assistant", "content": response.content})
|
||||
|
||||
# Inject NAG_REMINDER into user message if model hasn't used todos
|
||||
# This happens INSIDE the agent loop, so model sees it during task execution
|
||||
if rounds_without_todo > 10:
|
||||
results.insert(0, {"type": "text", "text": NAG_REMINDER})
|
||||
|
||||
messages.append({"role": "user", "content": results})
|
||||
|
||||
|
||||
@ -484,8 +482,9 @@ def main():
|
||||
Key v2 addition: We inject "reminder" messages to encourage
|
||||
todo usage without forcing it. This is a soft constraint.
|
||||
|
||||
- INITIAL_REMINDER: injected at conversation start
|
||||
- NAG_REMINDER: injected inside agent_loop when 10+ rounds without todo
|
||||
Reminders are injected as part of the user message, not as
|
||||
separate system prompts. The model sees them but doesn't
|
||||
respond to them directly.
|
||||
"""
|
||||
global rounds_without_todo
|
||||
|
||||
@ -505,12 +504,16 @@ def main():
|
||||
break
|
||||
|
||||
# Build user message content
|
||||
# May include reminders as context hints
|
||||
content = []
|
||||
|
||||
if first_message:
|
||||
# Gentle reminder at start of conversation
|
||||
# Gentle reminder at start
|
||||
content.append({"type": "text", "text": INITIAL_REMINDER})
|
||||
first_message = False
|
||||
elif rounds_without_todo > 10:
|
||||
# Nag if model hasn't used todos in a while
|
||||
content.append({"type": "text", "text": NAG_REMINDER})
|
||||
|
||||
content.append({"type": "text", "text": user_input})
|
||||
history.append({"role": "user", "content": content})
|
||||
|
||||
@ -79,10 +79,14 @@ import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
load_dotenv()
|
||||
|
||||
try:
|
||||
from provider_utils import get_client, get_model
|
||||
except ImportError:
|
||||
sys.exit("Error: provider_utils.py not found. Please ensure you are in the project root.")
|
||||
|
||||
|
||||
# =============================================================================
|
||||
@ -91,8 +95,8 @@ load_dotenv(override=True)
|
||||
|
||||
WORKDIR = Path.cwd()
|
||||
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.getenv("MODEL_ID", "claude-sonnet-4-5-20250929")
|
||||
client = get_client()
|
||||
MODEL = get_model()
|
||||
|
||||
|
||||
# =============================================================================
|
||||
|
||||
@ -84,10 +84,14 @@ import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
from anthropic import Anthropic
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv(override=True)
|
||||
load_dotenv()
|
||||
|
||||
try:
|
||||
from provider_utils import get_client, get_model
|
||||
except ImportError:
|
||||
sys.exit("Error: provider_utils.py not found. Please ensure you are in the project root.")
|
||||
|
||||
|
||||
# =============================================================================
|
||||
@ -97,8 +101,8 @@ load_dotenv(override=True)
|
||||
WORKDIR = Path.cwd()
|
||||
SKILLS_DIR = WORKDIR / "skills"
|
||||
|
||||
client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL"))
|
||||
MODEL = os.getenv("MODEL_ID", "claude-sonnet-4-5-20250929")
|
||||
client = get_client()
|
||||
MODEL = get_model()
|
||||
|
||||
|
||||
# =============================================================================
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user