v0.15.1: PA routes all tool requests to expert, dashboard integration test

- PA prompt updated: routes ANY task needing tools (DB, UI, buttons, machines) to expert. Only social chat stays with PA. - Expert descriptions include UI capabilities (buttons, machines, tables) - Dashboard integration test: expert creates/replaces buttons, machines, tables — all persist correctly across queries - v4-eras scores: fast 27/28, expert 23/23, dashboard 15/15, progress 11/11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:28:44 +02:00 · 2026-03-29 17:28:44 +02:00 · fda0d7cfce
commit fda0d7cfce
parent 1000411eb2
2 changed files with 56 additions and 15 deletions
--- a/agent/nodes/pa_v1.py
+++ b/agent/nodes/pa_v1.py
@ -16,40 +16,47 @@ class PANode(Node):
    max_context_tokens = 4000

    SYSTEM = """You are the Personal Assistant (PA) — the user's companion in this cognitive runtime.
-You manage the conversation and route domain-specific work to the right expert.
+You manage the user's dashboard and route work to domain experts.

 Listener: {identity} on {channel}

 Available experts:
 {experts}

+Experts have these tools:
+- query_db — SQL queries on their domain database
+- emit_actions — create buttons on the dashboard
+- create_machine / add_state / reset_machine / destroy_machine — interactive UI components
+- set_state — persistent key-value store
+- emit_display — formatted data display
+
 YOUR JOB:
 1. Understand what the user wants
-2. If it's a domain task: route to the right expert with a clear, self-contained job description
-3. If it's social/general: respond directly (no expert needed)
+2. Route to the expert for ANY task that needs tools (DB, UI, buttons, machines, counters, reports)
+3. Only respond directly for social chat (greetings, thanks, bye, small talk)

 Output ONLY valid JSON:
 {{
-  "expert": "eras | plankiste | none",
-  "job": "Self-contained task description for the expert. Include all context the expert needs — it has NO conversation history.",
-  "thinking_message": "Short message shown to user while expert works (in user's language). e.g. 'Moment, ich schaue in der Datenbank nach...'",
+  "expert": "{expert_names} | none",
+  "job": "Self-contained task. Include ALL context — the expert has NO conversation history. Describe what to query, what UI to build, what the user expects to see.",
+  "thinking_message": "Short message for user while expert works, in their language",
  "response_hint": "If expert=none, your direct response to the user.",
  "language": "de | en | mixed"
 }}

 Rules:
- The expert has NO history. The job must be fully self-contained.
- Include relevant facts from memory in the job (e.g. "customer Kathrin Jager, ID 2").
- thinking_message should be natural and in the user's language.
- For greetings, thanks, general chat: expert=none, write response_hint directly.
- For DB queries, reports, data analysis: route to the domain expert.
- When unsure which expert: expert=none, ask the user to clarify.
+- expert=none ONLY for social chat (hi, thanks, bye, how are you)
+- ANY request to create, build, show, query, investigate, count, list, describe → route to expert
+- The job must be fully self-contained. Include relevant facts from memory.
+- thinking_message: natural, in user's language. e.g. "Moment, ich schaue nach..."
+- If the user mentions data, tables, customers, devices, buttons, counters → expert
+- When unsure which expert: pick the one whose domain matches best

 {memory_context}"""

    EXPERT_DESCRIPTIONS = {
-        "eras": "eras — heating/energy customer database (eras2_production). Customers, devices, billing, consumption data.",
-        "plankiste": "plankiste — Kita planning database (plankiste_test). Children, care schedules, offers, pricing.",
+        "eras": "eras — heating/energy domain. Database: eras2_production (customers, devices, billing, consumption). Can also build dashboard UI (buttons, machines, counters, tables) for energy data workflows.",
+        "plankiste": "plankiste — Kita planning domain. Database: plankiste_test (children, care schedules, offers, pricing). Can build dashboard UI for education workflows and generate Angebote.",
    }

    def __init__(self, send_hud):
@ -79,10 +86,11 @@ Rules:
        if not expert_lines:
            expert_lines.append("- (no experts available — handle everything directly)")

+        expert_names = " | ".join(self._available_experts) if self._available_experts else "none"
        messages = [
            {"role": "system", "content": self.SYSTEM.format(
                memory_context=memory_context, identity=identity, channel=channel,
-                experts="\n".join(expert_lines))},
+                experts="\n".join(expert_lines), expert_names=expert_names)},
        ]

        # Summarize recent history (PA sees full context)
--- a/testcases/dashboard.md
+++ b/testcases/dashboard.md
@ -0,0 +1,33 @@
+# Dashboard Integration
+
+Tests that experts can build UI on the shared dashboard:
+buttons, machines, tables, state — all through the PA→Expert pipeline.
+
+## Setup
+- clear history
+
+## Steps
+
+### 1. Expert creates buttons
+- send: create two buttons on my dashboard: Report and Export
+- expect_actions: length >= 2
+- expect_actions: any action contains "report" or "Report"
+
+### 2. Buttons survive a query
+- send: how many customers are there?
+- expect_response: length > 5
+- expect_actions: any action contains "report" or "Report"
+
+### 3. Expert creates a machine
+- send: create a navigation machine called "workflow" with initial state "start" showing a Next button that goes to "step2"
+- expect_trace: has tool_call create_machine
+
+### 4. Expert shows data table
+- send: show me 5 customers in a table
+- expect_trace: has tool_call
+- expect_response: length > 10
+
+### 5. Expert replaces buttons
+- send: remove all buttons and create one button called Reset
+- expect_actions: length >= 1
+- expect_actions: any action contains "reset" or "Reset"