fix: build s09 memory system prompt once per request

This commit is contained in:
gui-yue
2026-06-04 00:09:09 +08:00
parent ec9ea874e6
commit 8aa8adb346
4 changed files with 9 additions and 11 deletions

View File

@@ -76,9 +76,9 @@ def write_memory_file(name, mem_type, description, body):
### Loading: Two Paths ### Loading: Two Paths
**Path 1: Index in SYSTEM.** `build_system()` reads `MEMORY.md` every turn and injects the memory catalog into the SYSTEM prompt. The index in SYSTEM can be cached by prompt cache, avoiding resending it every turn. **Path 1: Index in SYSTEM.** `build_system()` reads `MEMORY.md` once at the start of each user request and injects the memory catalog into the SYSTEM prompt. Memory extraction and consolidation run only when the turn ends, so SYSTEM does not need to be rebuilt repeatedly within the same user request.
**Path 2: Relevant memories on demand.** Before each LLM call, `load_memories()` sends the recent conversation and the memory catalog (name + description) to the LLM as a lightweight side-query, selects relevant filenames, then reads and injects their contents. Capped at 5 to control cost. **Path 2: Relevant memories on demand.** At the start of each user request, `load_memories()` sends the recent conversation and the memory catalog (name + description) to the LLM as a lightweight side-query, selects relevant filenames, then reads and injects their contents. Capped at 5 to control cost.
```python ```python
def select_relevant_memories(messages, max_items=5): def select_relevant_memories(messages, max_items=5):

View File

@@ -76,9 +76,9 @@ def write_memory_file(name, mem_type, description, body):
### 読み込み2 つのパス ### 読み込み2 つのパス
**パス 1インデックスを SYSTEM に常駐。** `build_system()`毎ターン SYSTEM を再構築する際に `MEMORY.md` を読み込み、記憶カタログを注入。SYSTEM prompt 内のインデックスは prompt cache でキャッシュ可能で、毎ターン再送不要 **パス 1インデックスを SYSTEM に常駐。** `build_system()`各ユーザーリクエストの開始時に 1 回だけ `MEMORY.md` を読み込み、記憶カタログを SYSTEM prompt に注入。記憶の抽出と整理はターン終了時にだけ実行されるため、同じユーザーリクエスト内で SYSTEM を繰り返し再構築する必要はない
**パス 2関連記憶をオンデマンド注入。** LLM 呼び出し前`load_memories()` は最近の会話と記憶カタログname + descriptionを LLM に軽量 side-query として送信し、関連するファイル名を選択、ファイル内容を読み込んで注入。上限 5 件でコストを制御。 **パス 2関連記憶をオンデマンド注入。**ユーザーリクエストの開始時に`load_memories()` は最近の会話と記憶カタログname + descriptionを LLM に軽量 side-query として送信し、関連するファイル名を選択、ファイル内容を読み込んで注入。上限 5 件でコストを制御。
```python ```python
def select_relevant_memories(messages, max_items=5): def select_relevant_memories(messages, max_items=5):

View File

@@ -76,9 +76,9 @@ def write_memory_file(name, mem_type, description, body):
### 加载:两条路径 ### 加载:两条路径
**路径一:索引常驻 SYSTEM。** `build_system()` 每轮重建 SYSTEM 时读取 `MEMORY.md`,把记忆清单注入。SYSTEM prompt 中的索引可以被 prompt cache 缓存,不需要每轮重新发送 **路径一:索引常驻 SYSTEM。** `build_system()` 在每次用户请求开始时读取 `MEMORY.md`,把记忆清单注入。记忆提取和整理只在本轮结束时触发,因此同一轮用户请求中不需要重复重建 SYSTEM
**路径二:相关记忆按需注入。**轮调用前`load_memories()` 把最近对话和记忆目录name + description一起发给 LLM 做一次轻量 side-query选出相关的文件名再读文件内容临时注入到当前 user turn。最多 5 条,控制开销。 **路径二:相关记忆按需注入。**次用户请求开始时`load_memories()` 把最近对话和记忆目录name + description一起发给 LLM 做一次轻量 side-query选出相关的文件名再读文件内容临时注入到当前 user turn。最多 5 条,控制开销。
```python ```python
def select_relevant_memories(messages, max_items=5): def select_relevant_memories(messages, max_items=5):

View File

@@ -344,8 +344,6 @@ def build_system() -> str:
"When the user says 'remember' or expresses a clear preference, extract it as a memory." "When the user says 'remember' or expresses a clear preference, extract it as a memory."
) )
SYSTEM = build_system()
SUB_SYSTEM = ( SUB_SYSTEM = (
f"You are a coding agent at {WORKDIR}. " f"You are a coding agent at {WORKDIR}. "
"Complete the task you were given, then return a concise summary. " "Complete the task you were given, then return a concise summary. "
@@ -553,10 +551,10 @@ def agent_loop(messages: list):
# s09: inject relevant memory content into the current user turn # s09: inject relevant memory content into the current user turn
memories_content = load_memories(messages) memories_content = load_memories(messages)
memory_turn = len(messages) - 1 if messages and isinstance(messages[-1].get("content"), str) else None memory_turn = len(messages) - 1 if messages and isinstance(messages[-1].get("content"), str) else None
while True: # s09: build system once per user turn; memory is updated after the loop returns
# s09: rebuild system with current memory index
system = build_system() system = build_system()
while True:
# s09: save pre-compression snapshot for accurate memory extraction # s09: save pre-compression snapshot for accurate memory extraction
pre_compress = [m if isinstance(m, dict) else {"role": m.get("role",""), pre_compress = [m if isinstance(m, dict) else {"role": m.get("role",""),
"content": str(m.get("content",""))} for m in messages] "content": str(m.get("content",""))} for m in messages]