v0.15.1: PA routes all tool requests to expert, dashboard integration test
- PA prompt updated: routes ANY task needing tools (DB, UI, buttons, machines) to expert. Only social chat stays with PA. - Expert descriptions include UI capabilities (buttons, machines, tables) - Dashboard integration test: expert creates/replaces buttons, machines, tables — all persist correctly across queries - v4-eras scores: fast 27/28, expert 23/23, dashboard 15/15, progress 11/11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
1000411eb2
commit
fda0d7cfce
@ -16,40 +16,47 @@ class PANode(Node):
|
|||||||
max_context_tokens = 4000
|
max_context_tokens = 4000
|
||||||
|
|
||||||
SYSTEM = """You are the Personal Assistant (PA) — the user's companion in this cognitive runtime.
|
SYSTEM = """You are the Personal Assistant (PA) — the user's companion in this cognitive runtime.
|
||||||
You manage the conversation and route domain-specific work to the right expert.
|
You manage the user's dashboard and route work to domain experts.
|
||||||
|
|
||||||
Listener: {identity} on {channel}
|
Listener: {identity} on {channel}
|
||||||
|
|
||||||
Available experts:
|
Available experts:
|
||||||
{experts}
|
{experts}
|
||||||
|
|
||||||
|
Experts have these tools:
|
||||||
|
- query_db — SQL queries on their domain database
|
||||||
|
- emit_actions — create buttons on the dashboard
|
||||||
|
- create_machine / add_state / reset_machine / destroy_machine — interactive UI components
|
||||||
|
- set_state — persistent key-value store
|
||||||
|
- emit_display — formatted data display
|
||||||
|
|
||||||
YOUR JOB:
|
YOUR JOB:
|
||||||
1. Understand what the user wants
|
1. Understand what the user wants
|
||||||
2. If it's a domain task: route to the right expert with a clear, self-contained job description
|
2. Route to the expert for ANY task that needs tools (DB, UI, buttons, machines, counters, reports)
|
||||||
3. If it's social/general: respond directly (no expert needed)
|
3. Only respond directly for social chat (greetings, thanks, bye, small talk)
|
||||||
|
|
||||||
Output ONLY valid JSON:
|
Output ONLY valid JSON:
|
||||||
{{
|
{{
|
||||||
"expert": "eras | plankiste | none",
|
"expert": "{expert_names} | none",
|
||||||
"job": "Self-contained task description for the expert. Include all context the expert needs — it has NO conversation history.",
|
"job": "Self-contained task. Include ALL context — the expert has NO conversation history. Describe what to query, what UI to build, what the user expects to see.",
|
||||||
"thinking_message": "Short message shown to user while expert works (in user's language). e.g. 'Moment, ich schaue in der Datenbank nach...'",
|
"thinking_message": "Short message for user while expert works, in their language",
|
||||||
"response_hint": "If expert=none, your direct response to the user.",
|
"response_hint": "If expert=none, your direct response to the user.",
|
||||||
"language": "de | en | mixed"
|
"language": "de | en | mixed"
|
||||||
}}
|
}}
|
||||||
|
|
||||||
Rules:
|
Rules:
|
||||||
- The expert has NO history. The job must be fully self-contained.
|
- expert=none ONLY for social chat (hi, thanks, bye, how are you)
|
||||||
- Include relevant facts from memory in the job (e.g. "customer Kathrin Jager, ID 2").
|
- ANY request to create, build, show, query, investigate, count, list, describe → route to expert
|
||||||
- thinking_message should be natural and in the user's language.
|
- The job must be fully self-contained. Include relevant facts from memory.
|
||||||
- For greetings, thanks, general chat: expert=none, write response_hint directly.
|
- thinking_message: natural, in user's language. e.g. "Moment, ich schaue nach..."
|
||||||
- For DB queries, reports, data analysis: route to the domain expert.
|
- If the user mentions data, tables, customers, devices, buttons, counters → expert
|
||||||
- When unsure which expert: expert=none, ask the user to clarify.
|
- When unsure which expert: pick the one whose domain matches best
|
||||||
|
|
||||||
{memory_context}"""
|
{memory_context}"""
|
||||||
|
|
||||||
EXPERT_DESCRIPTIONS = {
|
EXPERT_DESCRIPTIONS = {
|
||||||
"eras": "eras — heating/energy customer database (eras2_production). Customers, devices, billing, consumption data.",
|
"eras": "eras — heating/energy domain. Database: eras2_production (customers, devices, billing, consumption). Can also build dashboard UI (buttons, machines, counters, tables) for energy data workflows.",
|
||||||
"plankiste": "plankiste — Kita planning database (plankiste_test). Children, care schedules, offers, pricing.",
|
"plankiste": "plankiste — Kita planning domain. Database: plankiste_test (children, care schedules, offers, pricing). Can build dashboard UI for education workflows and generate Angebote.",
|
||||||
}
|
}
|
||||||
|
|
||||||
def __init__(self, send_hud):
|
def __init__(self, send_hud):
|
||||||
@ -79,10 +86,11 @@ Rules:
|
|||||||
if not expert_lines:
|
if not expert_lines:
|
||||||
expert_lines.append("- (no experts available — handle everything directly)")
|
expert_lines.append("- (no experts available — handle everything directly)")
|
||||||
|
|
||||||
|
expert_names = " | ".join(self._available_experts) if self._available_experts else "none"
|
||||||
messages = [
|
messages = [
|
||||||
{"role": "system", "content": self.SYSTEM.format(
|
{"role": "system", "content": self.SYSTEM.format(
|
||||||
memory_context=memory_context, identity=identity, channel=channel,
|
memory_context=memory_context, identity=identity, channel=channel,
|
||||||
experts="\n".join(expert_lines))},
|
experts="\n".join(expert_lines), expert_names=expert_names)},
|
||||||
]
|
]
|
||||||
|
|
||||||
# Summarize recent history (PA sees full context)
|
# Summarize recent history (PA sees full context)
|
||||||
|
|||||||
33
testcases/dashboard.md
Normal file
33
testcases/dashboard.md
Normal file
@ -0,0 +1,33 @@
|
|||||||
|
# Dashboard Integration
|
||||||
|
|
||||||
|
Tests that experts can build UI on the shared dashboard:
|
||||||
|
buttons, machines, tables, state — all through the PA→Expert pipeline.
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
- clear history
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
### 1. Expert creates buttons
|
||||||
|
- send: create two buttons on my dashboard: Report and Export
|
||||||
|
- expect_actions: length >= 2
|
||||||
|
- expect_actions: any action contains "report" or "Report"
|
||||||
|
|
||||||
|
### 2. Buttons survive a query
|
||||||
|
- send: how many customers are there?
|
||||||
|
- expect_response: length > 5
|
||||||
|
- expect_actions: any action contains "report" or "Report"
|
||||||
|
|
||||||
|
### 3. Expert creates a machine
|
||||||
|
- send: create a navigation machine called "workflow" with initial state "start" showing a Next button that goes to "step2"
|
||||||
|
- expect_trace: has tool_call create_machine
|
||||||
|
|
||||||
|
### 4. Expert shows data table
|
||||||
|
- send: show me 5 customers in a table
|
||||||
|
- expect_trace: has tool_call
|
||||||
|
- expect_response: length > 10
|
||||||
|
|
||||||
|
### 5. Expert replaces buttons
|
||||||
|
- send: remove all buttons and create one button called Reset
|
||||||
|
- expect_actions: length >= 1
|
||||||
|
- expect_actions: any action contains "reset" or "Reset"
|
||||||
Loading…
x
Reference in New Issue
Block a user