v0.15.1: PA routes all tool requests to expert, dashboard integration test

- PA prompt updated: routes ANY task needing tools (DB, UI, buttons, machines)
  to expert. Only social chat stays with PA.
- Expert descriptions include UI capabilities (buttons, machines, tables)
- Dashboard integration test: expert creates/replaces buttons, machines,
  tables — all persist correctly across queries
- v4-eras scores: fast 27/28, expert 23/23, dashboard 15/15, progress 11/11

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Nico 2026-03-29 17:28:44 +02:00
parent 1000411eb2
commit fda0d7cfce
2 changed files with 56 additions and 15 deletions

View File

@ -16,40 +16,47 @@ class PANode(Node):
max_context_tokens = 4000 max_context_tokens = 4000
SYSTEM = """You are the Personal Assistant (PA) — the user's companion in this cognitive runtime. SYSTEM = """You are the Personal Assistant (PA) — the user's companion in this cognitive runtime.
You manage the conversation and route domain-specific work to the right expert. You manage the user's dashboard and route work to domain experts.
Listener: {identity} on {channel} Listener: {identity} on {channel}
Available experts: Available experts:
{experts} {experts}
Experts have these tools:
- query_db SQL queries on their domain database
- emit_actions create buttons on the dashboard
- create_machine / add_state / reset_machine / destroy_machine interactive UI components
- set_state persistent key-value store
- emit_display formatted data display
YOUR JOB: YOUR JOB:
1. Understand what the user wants 1. Understand what the user wants
2. If it's a domain task: route to the right expert with a clear, self-contained job description 2. Route to the expert for ANY task that needs tools (DB, UI, buttons, machines, counters, reports)
3. If it's social/general: respond directly (no expert needed) 3. Only respond directly for social chat (greetings, thanks, bye, small talk)
Output ONLY valid JSON: Output ONLY valid JSON:
{{ {{
"expert": "eras | plankiste | none", "expert": "{expert_names} | none",
"job": "Self-contained task description for the expert. Include all context the expert needs — it has NO conversation history.", "job": "Self-contained task. Include ALL context — the expert has NO conversation history. Describe what to query, what UI to build, what the user expects to see.",
"thinking_message": "Short message shown to user while expert works (in user's language). e.g. 'Moment, ich schaue in der Datenbank nach...'", "thinking_message": "Short message for user while expert works, in their language",
"response_hint": "If expert=none, your direct response to the user.", "response_hint": "If expert=none, your direct response to the user.",
"language": "de | en | mixed" "language": "de | en | mixed"
}} }}
Rules: Rules:
- The expert has NO history. The job must be fully self-contained. - expert=none ONLY for social chat (hi, thanks, bye, how are you)
- Include relevant facts from memory in the job (e.g. "customer Kathrin Jager, ID 2"). - ANY request to create, build, show, query, investigate, count, list, describe route to expert
- thinking_message should be natural and in the user's language. - The job must be fully self-contained. Include relevant facts from memory.
- For greetings, thanks, general chat: expert=none, write response_hint directly. - thinking_message: natural, in user's language. e.g. "Moment, ich schaue nach..."
- For DB queries, reports, data analysis: route to the domain expert. - If the user mentions data, tables, customers, devices, buttons, counters expert
- When unsure which expert: expert=none, ask the user to clarify. - When unsure which expert: pick the one whose domain matches best
{memory_context}""" {memory_context}"""
EXPERT_DESCRIPTIONS = { EXPERT_DESCRIPTIONS = {
"eras": "eras — heating/energy customer database (eras2_production). Customers, devices, billing, consumption data.", "eras": "eras — heating/energy domain. Database: eras2_production (customers, devices, billing, consumption). Can also build dashboard UI (buttons, machines, counters, tables) for energy data workflows.",
"plankiste": "plankiste — Kita planning database (plankiste_test). Children, care schedules, offers, pricing.", "plankiste": "plankiste — Kita planning domain. Database: plankiste_test (children, care schedules, offers, pricing). Can build dashboard UI for education workflows and generate Angebote.",
} }
def __init__(self, send_hud): def __init__(self, send_hud):
@ -79,10 +86,11 @@ Rules:
if not expert_lines: if not expert_lines:
expert_lines.append("- (no experts available — handle everything directly)") expert_lines.append("- (no experts available — handle everything directly)")
expert_names = " | ".join(self._available_experts) if self._available_experts else "none"
messages = [ messages = [
{"role": "system", "content": self.SYSTEM.format( {"role": "system", "content": self.SYSTEM.format(
memory_context=memory_context, identity=identity, channel=channel, memory_context=memory_context, identity=identity, channel=channel,
experts="\n".join(expert_lines))}, experts="\n".join(expert_lines), expert_names=expert_names)},
] ]
# Summarize recent history (PA sees full context) # Summarize recent history (PA sees full context)

33
testcases/dashboard.md Normal file
View File

@ -0,0 +1,33 @@
# Dashboard Integration
Tests that experts can build UI on the shared dashboard:
buttons, machines, tables, state — all through the PA→Expert pipeline.
## Setup
- clear history
## Steps
### 1. Expert creates buttons
- send: create two buttons on my dashboard: Report and Export
- expect_actions: length >= 2
- expect_actions: any action contains "report" or "Report"
### 2. Buttons survive a query
- send: how many customers are there?
- expect_response: length > 5
- expect_actions: any action contains "report" or "Report"
### 3. Expert creates a machine
- send: create a navigation machine called "workflow" with initial state "start" showing a Next button that goes to "step2"
- expect_trace: has tool_call create_machine
### 4. Expert shows data table
- send: show me 5 customers in a table
- expect_trace: has tool_call
- expect_response: length > 10
### 5. Expert replaces buttons
- send: remove all buttons and create one button called Reset
- expect_actions: length >= 1
- expect_actions: any action contains "reset" or "Reset"