v0.15.1: PA routes all tool requests to expert, dashboard integration test

- PA prompt updated: routes ANY task needing tools (DB, UI, buttons, machines)
  to expert. Only social chat stays with PA.
- Expert descriptions include UI capabilities (buttons, machines, tables)
- Dashboard integration test: expert creates/replaces buttons, machines,
  tables — all persist correctly across queries
- v4-eras scores: fast 27/28, expert 23/23, dashboard 15/15, progress 11/11

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Nico 2026-03-29 17:28:44 +02:00
parent 1000411eb2
commit fda0d7cfce
2 changed files with 56 additions and 15 deletions

View File

@ -16,40 +16,47 @@ class PANode(Node):
max_context_tokens = 4000
SYSTEM = """You are the Personal Assistant (PA) — the user's companion in this cognitive runtime.
You manage the conversation and route domain-specific work to the right expert.
You manage the user's dashboard and route work to domain experts.
Listener: {identity} on {channel}
Available experts:
{experts}
Experts have these tools:
- query_db SQL queries on their domain database
- emit_actions create buttons on the dashboard
- create_machine / add_state / reset_machine / destroy_machine interactive UI components
- set_state persistent key-value store
- emit_display formatted data display
YOUR JOB:
1. Understand what the user wants
2. If it's a domain task: route to the right expert with a clear, self-contained job description
3. If it's social/general: respond directly (no expert needed)
2. Route to the expert for ANY task that needs tools (DB, UI, buttons, machines, counters, reports)
3. Only respond directly for social chat (greetings, thanks, bye, small talk)
Output ONLY valid JSON:
{{
"expert": "eras | plankiste | none",
"job": "Self-contained task description for the expert. Include all context the expert needs — it has NO conversation history.",
"thinking_message": "Short message shown to user while expert works (in user's language). e.g. 'Moment, ich schaue in der Datenbank nach...'",
"expert": "{expert_names} | none",
"job": "Self-contained task. Include ALL context — the expert has NO conversation history. Describe what to query, what UI to build, what the user expects to see.",
"thinking_message": "Short message for user while expert works, in their language",
"response_hint": "If expert=none, your direct response to the user.",
"language": "de | en | mixed"
}}
Rules:
- The expert has NO history. The job must be fully self-contained.
- Include relevant facts from memory in the job (e.g. "customer Kathrin Jager, ID 2").
- thinking_message should be natural and in the user's language.
- For greetings, thanks, general chat: expert=none, write response_hint directly.
- For DB queries, reports, data analysis: route to the domain expert.
- When unsure which expert: expert=none, ask the user to clarify.
- expert=none ONLY for social chat (hi, thanks, bye, how are you)
- ANY request to create, build, show, query, investigate, count, list, describe route to expert
- The job must be fully self-contained. Include relevant facts from memory.
- thinking_message: natural, in user's language. e.g. "Moment, ich schaue nach..."
- If the user mentions data, tables, customers, devices, buttons, counters expert
- When unsure which expert: pick the one whose domain matches best
{memory_context}"""
EXPERT_DESCRIPTIONS = {
"eras": "eras — heating/energy customer database (eras2_production). Customers, devices, billing, consumption data.",
"plankiste": "plankiste — Kita planning database (plankiste_test). Children, care schedules, offers, pricing.",
"eras": "eras — heating/energy domain. Database: eras2_production (customers, devices, billing, consumption). Can also build dashboard UI (buttons, machines, counters, tables) for energy data workflows.",
"plankiste": "plankiste — Kita planning domain. Database: plankiste_test (children, care schedules, offers, pricing). Can build dashboard UI for education workflows and generate Angebote.",
}
def __init__(self, send_hud):
@ -79,10 +86,11 @@ Rules:
if not expert_lines:
expert_lines.append("- (no experts available — handle everything directly)")
expert_names = " | ".join(self._available_experts) if self._available_experts else "none"
messages = [
{"role": "system", "content": self.SYSTEM.format(
memory_context=memory_context, identity=identity, channel=channel,
experts="\n".join(expert_lines))},
experts="\n".join(expert_lines), expert_names=expert_names)},
]
# Summarize recent history (PA sees full context)

33
testcases/dashboard.md Normal file
View File

@ -0,0 +1,33 @@
# Dashboard Integration
Tests that experts can build UI on the shared dashboard:
buttons, machines, tables, state — all through the PA→Expert pipeline.
## Setup
- clear history
## Steps
### 1. Expert creates buttons
- send: create two buttons on my dashboard: Report and Export
- expect_actions: length >= 2
- expect_actions: any action contains "report" or "Report"
### 2. Buttons survive a query
- send: how many customers are there?
- expect_response: length > 5
- expect_actions: any action contains "report" or "Report"
### 3. Expert creates a machine
- send: create a navigation machine called "workflow" with initial state "start" showing a Next button that goes to "step2"
- expect_trace: has tool_call create_machine
### 4. Expert shows data table
- send: show me 5 customers in a table
- expect_trace: has tool_call
- expect_response: length > 10
### 5. Expert replaces buttons
- send: remove all buttons and create one button called Reset
- expect_actions: length >= 1
- expect_actions: any action contains "reset" or "Reset"