refactor: organize agent harness courses

This commit is contained in:
Haoran
2026-06-16 00:10:35 +08:00
parent 20e7cbb72c
commit 8af5c24e46
491 changed files with 7961 additions and 564 deletions

View File

@@ -0,0 +1,200 @@
# s11: Trust and Execution Boundary — 加载有 trust执行靠容器
> *加载在 core 里管,执行交给容器。*
> **Pi 边界**:执行权限边界 —— 资源加载看 trust执行边界不内置、靠部署层 containerization。
[上一节s10](../s10_runtime_modes/) → `s11` → [下一节s12](../s12_package_distribution/)
---
## 问题
core 会接触本地项目要加载项目资料s08工具也会执行本地动作s04
这两件事**风险差很多**:加载一份资料只是读,执行一个动作可能改动系统。所以加载该有个开关——不可信的项目,连资料都别加载,防恶意 AGENTS.md 或扩展混进来。
但"执行"这件事Pi 的真实取舍和我们直觉不同:**它不在 core 里限制执行权限**。文件系统、进程、网络全开放,权限等于启动它的用户。真要隔离执行,靠部署层把整个进程关进容器。
s11 就把这两件事的真实分工摆出来:加载在 core 里用 trust 管,执行边界交给容器。
---
## 解决方案
两个层次,分工明确:
| 层 | 在哪 | 管什么 |
| --- | --- | --- |
| **加载** | core 内trust | 不可信项目不加载资料,防恶意资源 |
| **执行** | 部署层containerization | 整个进程关进沙箱/容器,限制文件/进程/网络 |
containerization 有三种 pattern见 Pi 的 `containerization.md`
```text
OpenShell 整个 pi 进程跑在策略控制的沙箱
Gondolin pi 留主机,工具执行路由到 Linux 微虚拟机
Plain Docker 整个 pi 进程跑在本地容器
```
> **重要**:教学版**不再发明** `ExecutionPolicy`/`Executor` 那种"core 内 dryRun/allow 开关"——它在 Pi 里没有对应物。core 内唯一能拦住执行的,是 s05 的 `beforeToolCall` hook按工具 allow/block。系统级的执行隔离整体推给容器。
---
## 工作原理
**先定信任开关。**
```ts
export type ProjectTrust = "trusted" | "untrusted";
```
**资源加载看 trust。** `load(trust)` 在 untrusted 时直接返回空——core 拿不到任何项目资料。
```ts
load(trust: ProjectTrust = "trusted"): ContextResource[] {
if (trust === "untrusted") return [];
return this.resources.map((r) => ({ ...r }));
}
```
`createTurnSnapshot` 把 trust 透传给 load所以拍快照时就决定了本轮装不装资料。
**执行不靠 core 管。** 这里没有 `ExecutionPolicy`、没有 `Executor``executeToolCall` 的签名回到 s05 的样子(无 policy 参数):
```ts
export function executeToolCall(registry, hooks, call): ToolResultMessage {
const before = hooks.beforeToolCall?.(call) ?? { type: "allow" };
if (before.type === "block") return { /* blocked */ };
// ... 真正执行 handler错误捕获 ...
}
```
唯一能拦住执行的,是 `beforeToolCall` hook——它是扩展层的、按工具的拦截不是系统级权限。要系统级隔离执行去部署层用容器。
> 这一节真正建立的是**执行权限边界**,而且是对齐 Pi 的真实取舍:**加载**在 core 里用 trust 管(防恶意资源),**执行**不在 core 里管,整体交给部署层 containerization。core 保持轻量,权限的"重活"推给容器——这正是 README 里说的"Pi 不内置 permission system"。
---
## 试一下
运行(默认 trusted
```sh
npm run s11
```
输出类似:
```text
s11: Trust and Execution Boundary
[resources]
AGENTS.md
[execution boundary]
Pi 不在 core 内限制执行权限。执行边界靠部署层 containerization
- OpenShell整个 pi 进程跑在策略控制的沙箱
- Gondolinpi 留主机,工具执行路由到 Linux 微虚拟机
- Plain Docker整个 pi 进程跑在本地容器
core 内唯一的执行拦截点是 s05 的 beforeToolCall hook。
```
不可信项目(不加载资料):
```sh
npm run s11 -- --trust untrusted
```
```text
[resources]
noneuntrusted不加载任何资料
```
观察重点trust 只管"加载不加载资料"执行边界那段说明清楚——core 里没有 dryRun/allow 开关,真要限制执行得用容器。
---
## 接入主线
s11 在 s10 上累积。相对 s10 的变更:
| 组件 | s10 | s11 |
| --- | --- | --- |
| 新增类型 | — | `ProjectTrust` |
| `ResourceLoader.load` | `load()` | **`load(trust)`**U1默认 trusted |
| `createTurnSnapshot` | `(state, registry, loader)` | 多一个 `trust`(默认 trusted |
| 执行权限 | 只有 hook | **trust 控加载;执行靠 containerizationcore 内不内置 permission** |
**焊接点**`loader.load(trust)` 决定 context 装不装资料;`createTurnSnapshot` 透传 trust。`executeToolCall` 保持 s05 的签名(无 policy——执行拦截只有 beforeToolCall hook系统级隔离交给容器。
> 注:本节移除了早期教学版的 `ExecutionPolicy`/`Executor`。它们是为了"自演示执行边界"而发明的,但 Pi 真实没有这层——保留会让内核和 Pi 不一致。
---
## 接下来
现在工具、命令、项目资料都是零散定义的。想复用一组能力,没有个清单说明"这包里有什么"。
下一节会把它们整理成一个带清单的包,方便整体分发和加载。
进入下一节:[s12](../s12_package_distribution/)。
---
<details>
<summary>Pi 源码溯源:不内置 permission靠 containerization</summary>
教学版用 trust 控加载。Pi 的真实情况值得特别说明——**它不内置 permission 系统**,权限边界靠外部容器化。
### 源码在哪
- `packages/coding-agent/docs/containerization.md` — 三种容器化方案(官方文档)
- `packages/coding-agent/src/core/project-trust.ts:45``resolveProjectTrusted`
- `packages/coding-agent/src/core/extensions/runner.ts` — trust 事件
- `packages/coding-agent/src/tools/bash.ts:66` — bash 执行(无权限检查)
### 核实Pi 确实不内置 permission
README 说"Pi 不内置 permission system"。源码证实:`createLocalBashOperations``bash.ts:66`)直接 `spawn(shell, ...)`**没有任何权限检查**——文件系统、进程、网络全开放,权限等于启动它的用户。
### 那 trust 管什么
Pi 的 `ProjectTrust``project-trust.ts:45`)只管**资源加载**,不管执行:
```ts
async function resolveProjectTrusted(options): Promise<boolean> {
if (options.trustOverride !== undefined) return options.trustOverride;
if (!hasProjectTrustInputs(options.cwd)) return true; // 没有可信任输入,直接信任
const { result } = await emitProjectTrustEvent(...); // 问扩展 hook
if (result) return result.trusted === "yes";
const decision = options.trustStore.get(options.cwd); // 查历史决策
if (decision !== null) return decision;
switch (options.defaultProjectTrust ?? "ask") { // 默认问用户
case "always": return true;
case "never": return false;
case "ask": break;
}
}
```
trust 决定"要不要加载这个项目的扩展/资源"(防恶意 AGENTS.md 或扩展),**不限制**加载之后的执行。
### 三种容器化方案
`containerization.md` 给三种 pattern
| 方案 | 怎么做 | 适用 |
| --- | --- | --- |
| **OpenShell** | 整个 pi 进程跑在策略控制的沙箱 | 想全面限制 |
| **Gondolin 扩展** | pi 留在主机,工具执行路由到 Linux 微虚拟机 | 想保护 provider auth |
| **Plain Docker** | 整个 pi 跑在本地容器 | 简单隔离 |
### beforeToolCall 是唯一的执行拦截点
Pi 唯一能拦截执行的,是 s05 的 `beforeToolCall` hook——扩展可以在那里 block 某个工具。但这是扩展层的、按工具的,不是 core 内置的、系统级的权限系统。
### 一句话
教学版用 trust 控加载,和 Pi 对齐;执行边界也对齐——**不内置**,靠 containerization 三方案在部署层做。早期教学版发明过 `ExecutionPolicy`但那是为了自演示Pi 真实没有,所以本节移除了它。
</details>

View File

@@ -0,0 +1,281 @@
// s11: Trust and Execution Boundary — mini Pi 的第 11 版
//
// 对齐 Pi 真实设计trust 控制资源加载;执行边界不内置 permission靠部署层 containerization。
// 词汇边界:本章新增 ProjectTrust / trust / trusted / untrustedcontainerization 三方案README 讲)。
// 关键:移除了教学版的 ExecutionPolicy/ExecutorPi 里没有executeToolCall 回到无 policys05 版本)。
declare const process: {
argv: string[];
exitCode?: number;
};
// ============ s11 新增:项目信任(控制资源加载)============
// 项目可不可信:决定要不要加载它的资料(防恶意 AGENTS.md / 扩展)。
export type ProjectTrust = "trusted" | "untrusted";
// —— 停止原因s04 起)——
export type StopReason = "stop" | "toolUse" | "error";
// —— 消息 ——
export type UserMessage = { role: "user"; content: string };
export type AssistantMessage = { role: "assistant"; content: string; stopReason: StopReason };
export type ToolResultMessage = { role: "toolResult"; toolCallId: string; content: string };
export type AgentMessage = UserMessage | AssistantMessage | ToolResultMessage;
// —— 会话历史s07 起)——
export type SessionEntry = { id: string; parentId: string | null; message: AgentMessage };
export class SessionTree {
private entries = new Map<string, SessionEntry>();
private activeLeafId: string | null = null;
private counter = 0;
append(message: AgentMessage): SessionEntry {
const entry = { id: `e${++this.counter}`, parentId: this.activeLeafId, message };
this.entries.set(entry.id, entry);
this.activeLeafId = entry.id;
return entry;
}
moveTo(entryId: string): void {
if (!this.entries.has(entryId)) throw new Error(`unknown entry: ${entryId}`);
this.activeLeafId = entryId;
}
currentPath(): AgentMessage[] {
const path: AgentMessage[] = [];
let cursor = this.activeLeafId;
while (cursor) {
const entry = this.entries.get(cursor);
if (!entry) break;
path.push(entry.message);
cursor = entry.parentId;
}
return path.reverse();
}
allEntries(): SessionEntry[] { return [...this.entries.values()]; }
}
export type AgentState = { session: SessionTree; model: string };
// —— 工具契约 ——
export type ToolSpec = { name: string; description: string; input: Record<string, string> };
export type ToolHandler = (input: Record<string, string>) => string;
export type ToolCall = { id: string; name: string; input: Record<string, string> };
export type Tool = { spec: ToolSpec; handler: ToolHandler };
export class ToolRegistry {
private tools = new Map<string, Tool>();
register(tool: Tool): void { this.tools.set(tool.spec.name, tool); }
getSpecs(): ToolSpec[] { return [...this.tools.values()].map((tool) => tool.spec); }
count(): number { return this.tools.size; }
run(call: ToolCall): string {
const tool = this.tools.get(call.name);
if (!tool) return `unknown tool: ${call.name}`;
return tool.handler(call.input);
}
}
// —— 上下文资源s08 起s11load 加 trust 参数U1——
export type ContextResource = { kind: "agents" | "skill" | "prompt"; name: string; content: string };
export class ResourceLoader {
constructor(private resources: ContextResource[]) {}
// [U1 升级] 加 trust 参数。untrusted → 不加载任何资料。默认 trusted。
load(trust: ProjectTrust = "trusted"): ContextResource[] {
if (trust === "untrusted") return [];
return this.resources.map((r) => ({ ...r }));
}
}
export function buildSystemPrompt(resources: ContextResource[]): string {
return resources.map((r) => `[${r.kind}:${r.name}]\n${r.content}`).join("\n\n");
}
// —— provider 对外 ——
export type ProviderMessage =
| { role: "user" | "assistant"; content: string }
| { role: "toolResult"; toolCallId: string; content: string };
export type ProviderInput = { systemPrompt: string; messages: ProviderMessage[]; tools: ToolSpec[] };
export type ProviderEvent =
| { type: "message_start" }
| { type: "text_delta"; text: string }
| { type: "tool_call"; call: ToolCall }
| { type: "message_end"; stopReason: StopReason };
export interface Provider { stream(input: ProviderInput): AsyncGenerator<ProviderEvent>; }
export type Output = { log(line: string): void };
export function createConsoleOutput(): Output { return { log: (line) => console.log(line) }; }
// —— s05 起:执行插口(无 policy——Pi 不内置执行权限)——
export type BeforeToolCallResult = { type: "allow" } | { type: "block"; reason: string };
export type ToolHooks = {
beforeToolCall?: (call: ToolCall) => BeforeToolCallResult;
afterToolCall?: (call: ToolCall, result: string) => string;
};
export function executeToolCall(registry: ToolRegistry, hooks: ToolHooks, call: ToolCall): ToolResultMessage {
const before = hooks.beforeToolCall?.(call) ?? { type: "allow" };
if (before.type === "block") {
return { role: "toolResult", toolCallId: call.id, content: `blocked: ${before.reason}` };
}
let result: string;
try { result = registry.run(call); }
catch (error) { result = `error: ${error instanceof Error ? error.message : String(error)}`; }
const finalResult = hooks.afterToolCall?.(call, result) ?? result;
return { role: "toolResult", toolCallId: call.id, content: finalResult };
}
// —— s06 起快照s11createTurnSnapshot 加 trust传给 load——
export type TurnSnapshot = { systemPrompt: string; messages: ProviderMessage[]; tools: ToolSpec[] };
function toProviderMessages(messages: AgentMessage[]): ProviderMessage[] {
return messages.map((message) => {
if (message.role === "toolResult") {
return { role: "toolResult", toolCallId: message.toolCallId, content: message.content };
}
return { role: message.role, content: message.content };
});
}
export function createTurnSnapshot(
state: AgentState, registry: ToolRegistry, loader: ResourceLoader, trust: ProjectTrust = "trusted",
): TurnSnapshot {
return {
systemPrompt: buildSystemPrompt(loader.load(trust)),
messages: toProviderMessages(state.session.currentPath()),
tools: registry.getSpecs(),
};
}
export function buildProviderInputFromSnapshot(snapshot: TurnSnapshot, state: AgentState): ProviderInput {
return {
systemPrompt: snapshot.systemPrompt,
messages: toProviderMessages(state.session.currentPath()),
tools: snapshot.tools,
};
}
export function createInitialState(model = "demo-small"): AgentState { return { session: new SessionTree(), model }; }
export function createUserMessage(content: string): UserMessage { return { role: "user", content }; }
const MAX_TURNS = 8;
export async function runEventedToolLoop(
state: AgentState, provider: Provider, registry: ToolRegistry,
hooks: ToolHooks, snapshot: TurnSnapshot, output: Output,
): Promise<AssistantMessage> {
let turns = 0;
while (true) {
turns += 1;
if (turns > MAX_TURNS) {
const stopped: AssistantMessage = { role: "assistant", content: "(达到最大轮次,停止)", stopReason: "stop" };
state.session.append(stopped);
return stopped;
}
const providerInput = buildProviderInputFromSnapshot(snapshot, state);
let content = "";
let stopReason: StopReason = "stop";
let sawToolCall = false;
for await (const event of provider.stream(providerInput)) {
if (event.type === "message_start") output.log("message_start");
else if (event.type === "text_delta") { output.log(`text_delta: ${event.text}`); content += event.text; }
else if (event.type === "tool_call") {
sawToolCall = true;
output.log(`tool_call: ${event.call.name}`);
const resultMessage = executeToolCall(registry, hooks, event.call);
state.session.append(resultMessage);
output.log(`tool_result: ${resultMessage.content}`);
} else if (event.type === "message_end") { stopReason = event.stopReason; output.log(`message_end: ${stopReason}`); }
}
if (!sawToolCall || stopReason !== "toolUse") {
const assistant: AssistantMessage = { role: "assistant", content, stopReason };
state.session.append(assistant);
return assistant;
}
}
}
// —— s09 起:扩展运行时(累积)——
export type RuntimeEvent = { type: "message"; content: string } | { type: "done" };
type EventHandler<T extends RuntimeEvent["type"]> = (event: Extract<RuntimeEvent, { type: T }>) => void;
export type Command = { name: string; run: () => string };
export type ExtensionAPI = {
on<T extends RuntimeEvent["type"]>(type: T, handler: EventHandler<T>): void;
registerTool(tool: Tool): void;
registerCommand(command: Command): void;
};
export type Extension = (api: ExtensionAPI) => void;
export class ExtensionRuntime {
private commands = new Map<string, Command>();
private handlers: { type: RuntimeEvent["type"]; handler: (event: RuntimeEvent) => void }[] = [];
constructor(private registry: ToolRegistry) {}
createApi(): ExtensionAPI {
return {
on: (type, handler) => { this.handlers.push({ type, handler: handler as (event: RuntimeEvent) => void }); },
registerTool: (tool) => { this.registry.register(tool); },
registerCommand: (command) => { this.commands.set(command.name, command); },
};
}
use(extension: Extension): void { extension(this.createApi()); }
emit(event: RuntimeEvent): void {
for (const { type, handler } of this.handlers) if (type === event.type) handler(event);
}
runCommand(name: string): string {
const command = this.commands.get(name);
if (!command) return `unknown command: ${name}`;
return command.run();
}
}
// —— s10 起:运行方式(累积)——
export function createDemoRuntimeEvents(input: string): RuntimeEvent[] {
return [{ type: "message", content: `收到:${input}` }, { type: "done" }];
}
export type RuntimeMode = { render(events: RuntimeEvent[]): void };
export class PrintMode implements RuntimeMode {
render(events: RuntimeEvent[]): void {
for (const event of events) if (event.type === "message") console.log(event.content);
}
}
export class JsonMode implements RuntimeMode {
render(events: RuntimeEvent[]): void {
for (const event of events) console.log(JSON.stringify(event));
}
}
// ============ 演示脚手架 ============
function readArg(name: string): string | undefined {
const index = process.argv.indexOf(name);
return index >= 0 ? process.argv[index + 1] : undefined;
}
function main(): void {
const output = createConsoleOutput();
const trust: ProjectTrust = readArg("--trust") === "untrusted" ? "untrusted" : "trusted";
const loader = new ResourceLoader([
{ kind: "agents", name: "AGENTS.md", content: "Use concise engineering explanations." },
]);
output.log("s11: Trust and Execution Boundary");
output.log("");
// 加载边界:看 trust。untrusted → 不加载资料(防恶意资源)。
const resources = loader.load(trust);
output.log("[resources]");
if (resources.length === 0) {
output.log("noneuntrusted不加载任何资料");
} else {
for (const resource of resources) {
output.log(resource.name);
}
}
output.log("");
// 执行边界:对齐 Pi——core 不内置 permission靠部署层 containerization。
output.log("[execution boundary]");
output.log("Pi 不在 core 内限制执行权限。执行边界靠部署层 containerization");
output.log("- OpenShell整个 pi 进程跑在策略控制的沙箱");
output.log("- Gondolinpi 留主机,工具执行路由到 Linux 微虚拟机");
output.log("- Plain Docker整个 pi 进程跑在本地容器");
output.log("core 内唯一的执行拦截点是 s05 的 beforeToolCall hook。");
output.log("");
}
try {
main();
} catch (error: unknown) {
console.error(error);
process.exitCode = 1;
}