v0.15.1: PA routes all tool requests to expert, dashboard integration test
- PA prompt updated: routes ANY task needing tools (DB, UI, buttons, machines) to expert. Only social chat stays with PA. - Expert descriptions include UI capabilities (buttons, machines, tables) - Dashboard integration test: expert creates/replaces buttons, machines, tables — all persist correctly across queries - v4-eras scores: fast 27/28, expert 23/23, dashboard 15/15, progress 11/11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
1000411eb2
commit
fda0d7cfce
@ -16,40 +16,47 @@ class PANode(Node):
|
||||
max_context_tokens = 4000
|
||||
|
||||
SYSTEM = """You are the Personal Assistant (PA) — the user's companion in this cognitive runtime.
|
||||
You manage the conversation and route domain-specific work to the right expert.
|
||||
You manage the user's dashboard and route work to domain experts.
|
||||
|
||||
Listener: {identity} on {channel}
|
||||
|
||||
Available experts:
|
||||
{experts}
|
||||
|
||||
Experts have these tools:
|
||||
- query_db — SQL queries on their domain database
|
||||
- emit_actions — create buttons on the dashboard
|
||||
- create_machine / add_state / reset_machine / destroy_machine — interactive UI components
|
||||
- set_state — persistent key-value store
|
||||
- emit_display — formatted data display
|
||||
|
||||
YOUR JOB:
|
||||
1. Understand what the user wants
|
||||
2. If it's a domain task: route to the right expert with a clear, self-contained job description
|
||||
3. If it's social/general: respond directly (no expert needed)
|
||||
2. Route to the expert for ANY task that needs tools (DB, UI, buttons, machines, counters, reports)
|
||||
3. Only respond directly for social chat (greetings, thanks, bye, small talk)
|
||||
|
||||
Output ONLY valid JSON:
|
||||
{{
|
||||
"expert": "eras | plankiste | none",
|
||||
"job": "Self-contained task description for the expert. Include all context the expert needs — it has NO conversation history.",
|
||||
"thinking_message": "Short message shown to user while expert works (in user's language). e.g. 'Moment, ich schaue in der Datenbank nach...'",
|
||||
"expert": "{expert_names} | none",
|
||||
"job": "Self-contained task. Include ALL context — the expert has NO conversation history. Describe what to query, what UI to build, what the user expects to see.",
|
||||
"thinking_message": "Short message for user while expert works, in their language",
|
||||
"response_hint": "If expert=none, your direct response to the user.",
|
||||
"language": "de | en | mixed"
|
||||
}}
|
||||
|
||||
Rules:
|
||||
- The expert has NO history. The job must be fully self-contained.
|
||||
- Include relevant facts from memory in the job (e.g. "customer Kathrin Jager, ID 2").
|
||||
- thinking_message should be natural and in the user's language.
|
||||
- For greetings, thanks, general chat: expert=none, write response_hint directly.
|
||||
- For DB queries, reports, data analysis: route to the domain expert.
|
||||
- When unsure which expert: expert=none, ask the user to clarify.
|
||||
- expert=none ONLY for social chat (hi, thanks, bye, how are you)
|
||||
- ANY request to create, build, show, query, investigate, count, list, describe → route to expert
|
||||
- The job must be fully self-contained. Include relevant facts from memory.
|
||||
- thinking_message: natural, in user's language. e.g. "Moment, ich schaue nach..."
|
||||
- If the user mentions data, tables, customers, devices, buttons, counters → expert
|
||||
- When unsure which expert: pick the one whose domain matches best
|
||||
|
||||
{memory_context}"""
|
||||
|
||||
EXPERT_DESCRIPTIONS = {
|
||||
"eras": "eras — heating/energy customer database (eras2_production). Customers, devices, billing, consumption data.",
|
||||
"plankiste": "plankiste — Kita planning database (plankiste_test). Children, care schedules, offers, pricing.",
|
||||
"eras": "eras — heating/energy domain. Database: eras2_production (customers, devices, billing, consumption). Can also build dashboard UI (buttons, machines, counters, tables) for energy data workflows.",
|
||||
"plankiste": "plankiste — Kita planning domain. Database: plankiste_test (children, care schedules, offers, pricing). Can build dashboard UI for education workflows and generate Angebote.",
|
||||
}
|
||||
|
||||
def __init__(self, send_hud):
|
||||
@ -79,10 +86,11 @@ Rules:
|
||||
if not expert_lines:
|
||||
expert_lines.append("- (no experts available — handle everything directly)")
|
||||
|
||||
expert_names = " | ".join(self._available_experts) if self._available_experts else "none"
|
||||
messages = [
|
||||
{"role": "system", "content": self.SYSTEM.format(
|
||||
memory_context=memory_context, identity=identity, channel=channel,
|
||||
experts="\n".join(expert_lines))},
|
||||
experts="\n".join(expert_lines), expert_names=expert_names)},
|
||||
]
|
||||
|
||||
# Summarize recent history (PA sees full context)
|
||||
|
||||
33
testcases/dashboard.md
Normal file
33
testcases/dashboard.md
Normal file
@ -0,0 +1,33 @@
|
||||
# Dashboard Integration
|
||||
|
||||
Tests that experts can build UI on the shared dashboard:
|
||||
buttons, machines, tables, state — all through the PA→Expert pipeline.
|
||||
|
||||
## Setup
|
||||
- clear history
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Expert creates buttons
|
||||
- send: create two buttons on my dashboard: Report and Export
|
||||
- expect_actions: length >= 2
|
||||
- expect_actions: any action contains "report" or "Report"
|
||||
|
||||
### 2. Buttons survive a query
|
||||
- send: how many customers are there?
|
||||
- expect_response: length > 5
|
||||
- expect_actions: any action contains "report" or "Report"
|
||||
|
||||
### 3. Expert creates a machine
|
||||
- send: create a navigation machine called "workflow" with initial state "start" showing a Next button that goes to "step2"
|
||||
- expect_trace: has tool_call create_machine
|
||||
|
||||
### 4. Expert shows data table
|
||||
- send: show me 5 customers in a table
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 5. Expert replaces buttons
|
||||
- send: remove all buttons and create one button called Reset
|
||||
- expect_actions: length >= 1
|
||||
- expect_actions: any action contains "reset" or "Reset"
|
||||
Loading…
x
Reference in New Issue
Block a user