v0.10.4: stateful UI engine — TDD counter test green (36/36)

RED->GREEN->REFACTOR cycle:
- UI node has state store (key-value), action bindings (op/var), and
  local action handlers (inc/dec/set/toggle — no LLM round-trip)
- Thinker self-model: knows its environment, that ACTIONS create real
  buttons, that UI handles state locally. Emits var/op payload for
  stateful actions.
- Thinker's context includes UI state so it can report current values
- /api/clear resets UI state, bindings, and controls
- Test runner: action_match for fuzzy action names, persistent actions
  across steps, _stream_text restored
- Counter test: 16/16 passed (create, read, inc, inc, dec, verify)
- Pub test: 20/20 passed (conversation, language switch, tool use, mood)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Nico 2026-03-28 15:50:37 +01:00
parent 3d71c651fc
commit 3f8886cbd2
8 changed files with 341 additions and 63 deletions

View File

@ -150,6 +150,9 @@ def register_routes(app):
if not _active_runtime: if not _active_runtime:
raise HTTPException(status_code=409, detail="No active session") raise HTTPException(status_code=409, detail="No active session")
_active_runtime.history.clear() _active_runtime.history.clear()
_active_runtime.ui_node.state.clear()
_active_runtime.ui_node.bindings.clear()
_active_runtime.ui_node.current_controls.clear()
return {"status": "cleared"} return {"status": "cleared"}
@app.get("/api/state") @app.get("/api/state")

View File

@ -42,13 +42,18 @@ Output ONLY valid JSON. No explanation, no markdown fences."""
"facts": [], "facts": [],
} }
def get_context_block(self, sensor_lines: list[str] = None) -> str: def get_context_block(self, sensor_lines: list[str] = None, ui_state: dict = None) -> str:
lines = sensor_lines or ["Sensors: (none)"] lines = sensor_lines or ["Sensors: (none)"]
lines.append("") lines.append("")
lines.append("Shared memory (from Memorizer):") lines.append("Shared memory (from Memorizer):")
for k, v in self.state.items(): for k, v in self.state.items():
if v: if v:
lines.append(f"- {k}: {v}") lines.append(f"- {k}: {v}")
if ui_state:
lines.append("")
lines.append("UI state (visible to user in workspace):")
for k, v in ui_state.items():
lines.append(f"- {k} = {v}")
return "\n".join(lines) return "\n".join(lines)
async def update(self, history: list[dict]): async def update(self, history: list[dict]):

View File

@ -24,28 +24,40 @@ TOOLS — write a ```python code block and it WILL be executed. Use print() for
- For math, databases, file ops, any computation: write python. NEVER describe code write it. - For math, databases, file ops, any computation: write python. NEVER describe code write it.
- For simple conversation: respond directly as text. - For simple conversation: respond directly as text.
YOUR ENVIRONMENT:
You are one node in a pipeline: Input (perceives) -> You (reason) -> Output (speaks) + UI (renders).
- Your text response goes to Output, which speaks it to the user.
- Your ACTIONS go to UI, which renders buttons/labels in a workspace panel.
- Button clicks come back to you as "ACTION: action_name".
- UI has a STATE STORE you can create variables and bind buttons to them.
- Simple actions (inc/dec/toggle) are handled by UI locally instant, no round-trip.
ACTIONS ALWAYS end your response with an ACTIONS: line containing a JSON array. ACTIONS ALWAYS end your response with an ACTIONS: line containing a JSON array.
The ACTIONS line MUST be the very last line of your response. The ACTIONS line MUST be the very last line of your response.
Format: ACTIONS: [json array of actions] Format: ACTIONS: [json array of actions]
Examples: STATEFUL ACTIONS to create UI state with buttons, include var/op in payload:
User asks about dog breeds: {{"label": "+1", "action": "increment", "payload": {{"var": "count", "op": "inc", "initial": 0}}}}
Here are three popular dog breeds: Golden Retriever, German Shepherd, and Poodle. {{"label": "-1", "action": "decrement", "payload": {{"var": "count", "op": "dec"}}}}
ACTIONS: [{{"label": "Golden Retriever", "action": "learn_breed", "payload": {{"breed": "Golden Retriever"}}}}, {{"label": "German Shepherd", "action": "learn_breed", "payload": {{"breed": "German Shepherd"}}}}, {{"label": "Poodle", "action": "learn_breed", "payload": {{"breed": "Poodle"}}}}] Ops: inc, dec, set, toggle. UI auto-creates the variable and a label showing its value.
User asks what time it is: SIMPLE ACTIONS for follow-ups that need your reasoning:
{{"label": "Learn More", "action": "learn_breed", "payload": {{"breed": "Poodle"}}}}
Examples:
Create a counter:
Counter created! Use the buttons to increment or decrement.
ACTIONS: [{{"label": "+1", "action": "increment", "payload": {{"var": "count", "op": "inc", "initial": 0}}}}, {{"label": "-1", "action": "decrement", "payload": {{"var": "count", "op": "dec"}}}}]
Simple conversation:
Es ist 14:30 Uhr. Es ist 14:30 Uhr.
ACTIONS: [] ACTIONS: []
After creating a database:
Done! Created 5 customers in the database.
ACTIONS: [{{"label": "Show All", "action": "show_all"}}, {{"label": "Add Customer", "action": "add_customer"}}]
Rules: Rules:
- ALWAYS include the ACTIONS: line, even if empty: ACTIONS: [] - ALWAYS include the ACTIONS: line, even if empty: ACTIONS: []
- Keep labels short (2-4 words), action is snake_case. - Keep labels short (2-4 words), action is snake_case.
- Only include meaningful actions empty array is fine for simple chat. - For state variables, use var/op in payload. UI handles the rest.
{memory_context}""" {memory_context}"""

View File

@ -1,8 +1,7 @@
"""UI Node: pure renderer — converts ThoughtResult actions + data into controls.""" """UI Node: stateful renderer — manages UI state, handles local actions, renders controls."""
import json import json
import logging import logging
import re
from .base import Node from .base import Node
from ..types import ThoughtResult from ..types import ThoughtResult
@ -17,31 +16,99 @@ class UINode(Node):
def __init__(self, send_hud): def __init__(self, send_hud):
super().__init__(send_hud) super().__init__(send_hud)
self.current_controls: list[dict] = [] self.current_controls: list[dict] = []
self.state: dict = {} # {"count": 0, "theme": "dark", ...}
self.bindings: dict = {} # {"increment": {"op": "inc", "var": "count"}, ...}
# --- State operations ---
def set_var(self, name: str, value) -> str:
self.state[name] = value
log.info(f"[ui] state: {name} = {value}")
return f"{name} = {value}"
def get_var(self, name: str):
return self.state.get(name)
def handle_local_action(self, action: str, payload: dict = None) -> str | None:
"""Try to handle an action locally. Returns result text or None if not local."""
binding = self.bindings.get(action)
if not binding:
return None
op = binding.get("op")
var = binding.get("var")
step = binding.get("step", 1)
if var not in self.state:
return None
if op == "inc":
self.state[var] += step
elif op == "dec":
self.state[var] -= step
elif op == "set" and payload and "value" in payload:
self.state[var] = payload["value"]
elif op == "toggle":
self.state[var] = not self.state[var]
else:
return None
log.info(f"[ui] local action {action}: {var} = {self.state[var]}")
return f"{var} is now {self.state[var]}"
# --- Action parsing from Thinker ---
def _parse_thinker_actions(self, actions: list[dict]) -> list[dict]:
"""Parse Thinker's actions, register bindings, return controls."""
controls = []
for action in actions:
act = action.get("action", "")
label = action.get("label", "Action")
payload = action.get("payload", {})
# Detect state-binding hints in payload
if "var" in payload and "op" in payload:
self.bindings[act] = {
"op": payload["op"],
"var": payload["var"],
"step": payload.get("step", 1),
}
# Auto-create state var if it doesn't exist
if payload["var"] not in self.state:
self.set_var(payload["var"], payload.get("initial", 0))
log.info(f"[ui] bound action '{act}' -> {payload['op']} {payload['var']}")
controls.append({
"type": "button",
"label": label,
"action": act,
"payload": payload,
})
return controls
# --- Table extraction ---
def _extract_table(self, tool_output: str) -> dict | None: def _extract_table(self, tool_output: str) -> dict | None:
"""Try to parse tabular data from tool output."""
if not tool_output: if not tool_output:
return None return None
lines = [l.strip() for l in tool_output.strip().split("\n") if l.strip()] lines = [l.strip() for l in tool_output.strip().split("\n") if l.strip()]
if len(lines) < 2: if len(lines) < 2:
return None return None
# Detect pipe-separated tables (e.g. "col1 | col2\nval1 | val2")
if " | " in lines[0]: if " | " in lines[0]:
columns = [c.strip() for c in lines[0].split(" | ")] columns = [c.strip() for c in lines[0].split(" | ")]
data = [] data = []
for line in lines[1:]: for line in lines[1:]:
if line.startswith("-") or line.startswith("="): if line.startswith("-") or line.startswith("="):
continue # separator line continue
vals = [v.strip() for v in line.split(" | ")] vals = [v.strip() for v in line.split(" | ")]
if len(vals) == len(columns): if len(vals) == len(columns):
data.append(dict(zip(columns, vals))) data.append(dict(zip(columns, vals)))
if data: if data:
return {"type": "table", "columns": columns, "data": data} return {"type": "table", "columns": columns, "data": data}
# Detect "Table: X" header format from sqlite wrapper
if lines[0].startswith("Table:"): if lines[0].startswith("Table:"):
table_name = lines[0].replace("Table:", "").strip()
if len(lines) >= 2 and " | " in lines[1]: if len(lines) >= 2 and " | " in lines[1]:
columns = [c.strip() for c in lines[1].split(" | ")] columns = [c.strip() for c in lines[1].split(" | ")]
data = [] data = []
@ -54,30 +121,34 @@ class UINode(Node):
return None return None
async def process(self, thought: ThoughtResult, history: list[dict], # --- Render controls ---
memory_context: str = "") -> list[dict]:
def _build_controls(self, thought: ThoughtResult) -> list[dict]:
controls = [] controls = []
# 1. Render actions from Thinker as buttons # 1. Parse actions from Thinker (registers bindings)
for action in thought.actions: if thought.actions:
controls.extend(self._parse_thinker_actions(thought.actions))
# 2. Add labels for bound state variables
for var, value in self.state.items():
controls.append({ controls.append({
"type": "button", "type": "label",
"label": action.get("label", "Action"), "id": f"var_{var}",
"action": action.get("action", "unknown"), "text": var,
"payload": action.get("payload", {}), "value": str(value),
}) })
# 2. Extract tables from tool output # 3. Extract tables from tool output
if thought.tool_output: if thought.tool_output:
table = self._extract_table(thought.tool_output) table = self._extract_table(thought.tool_output)
if table: if table:
controls.append(table) controls.append(table)
# 3. Add labels for key tool results (single-value outputs) # 4. Add label for short tool results (if no table and no state vars)
if thought.tool_used and thought.tool_output and not any(c["type"] == "table" for c in controls): if thought.tool_used and thought.tool_output and not any(c["type"] == "table" for c in controls):
output = thought.tool_output.strip() output = thought.tool_output.strip()
# Short single-line output → label if "\n" not in output and len(output) < 100 and not self.state:
if "\n" not in output and len(output) < 100:
controls.append({ controls.append({
"type": "label", "type": "label",
"id": "tool_result", "id": "tool_result",
@ -85,14 +156,46 @@ class UINode(Node):
"value": output, "value": output,
}) })
return controls
async def process(self, thought: ThoughtResult, history: list[dict],
memory_context: str = "") -> list[dict]:
controls = self._build_controls(thought)
if controls: if controls:
self.current_controls = controls self.current_controls = controls
await self.hud("controls", controls=controls) await self.hud("controls", controls=controls)
log.info(f"[ui] emitting {len(controls)} controls") log.info(f"[ui] emitting {len(controls)} controls")
else: else:
if self.current_controls: if self.current_controls:
# Keep previous controls visible
controls = self.current_controls controls = self.current_controls
await self.hud("decided", instruction="no new controls") await self.hud("decided", instruction="no new controls")
return controls return controls
async def process_local_action(self, action: str, payload: dict = None) -> tuple[str | None, list[dict]]:
"""Handle a local action. Returns (result_text, updated_controls) or (None, []) if not local."""
result = self.handle_local_action(action, payload)
if result is None:
return None, []
# Re-render controls with updated state
controls = []
# Re-add existing buttons
for ctrl in self.current_controls:
if ctrl["type"] == "button":
controls.append(ctrl)
# Update labels with current state
for var, value in self.state.items():
controls.append({
"type": "label",
"id": f"var_{var}",
"text": var,
"value": str(value),
})
self.current_controls = controls
await self.hud("controls", controls=controls)
log.info(f"[ui] re-rendered after local action: {action}")
return result, controls

View File

@ -57,6 +57,16 @@ class Runtime:
log.error(f"trace write error: {e}") log.error(f"trace write error: {e}")
self._broadcast(trace_entry) self._broadcast(trace_entry)
async def _stream_text(self, text: str):
"""Stream pre-formed text to the client as deltas."""
try:
chunk_size = 12
for i in range(0, len(text), chunk_size):
await self.ws.send_text(json.dumps({"type": "delta", "content": text[i:i + chunk_size]}))
await self.ws.send_text(json.dumps({"type": "done"}))
except Exception:
pass # WS may not be connected (e.g. API-only calls)
async def _run_output_and_ui(self, thought, mem_ctx): async def _run_output_and_ui(self, thought, mem_ctx):
"""Run Output and UI nodes in parallel. Returns (response_text, controls).""" """Run Output and UI nodes in parallel. Returns (response_text, controls)."""
output_task = asyncio.create_task( output_task = asyncio.create_task(
@ -75,16 +85,28 @@ class Runtime:
async def handle_action(self, action: str, data: dict = None): async def handle_action(self, action: str, data: dict = None):
"""Handle a structured UI action (button click etc.).""" """Handle a structured UI action (button click etc.)."""
self.sensor.note_user_activity()
# Try local UI action first (inc, dec, toggle — no LLM needed)
result, controls = await self.ui_node.process_local_action(action, data)
if result is not None:
# Local action handled — send controls update + short response
if controls:
await self.ws.send_text(json.dumps({"type": "controls", "controls": controls}))
await self._stream_text(result)
self.history.append({"role": "user", "content": f"[clicked {action}]"})
self.history.append({"role": "assistant", "content": result})
return
# Complex action — needs Thinker reasoning
action_desc = f"ACTION: {action}" action_desc = f"ACTION: {action}"
if data: if data:
action_desc += f" | data: {json.dumps(data)}" action_desc += f" | data: {json.dumps(data)}"
self.history.append({"role": "user", "content": action_desc}) self.history.append({"role": "user", "content": action_desc})
self.sensor.note_user_activity()
sensor_lines = self.sensor.get_context_lines() sensor_lines = self.sensor.get_context_lines()
mem_ctx = self.memorizer.get_context_block(sensor_lines=sensor_lines) mem_ctx = self.memorizer.get_context_block(sensor_lines=sensor_lines, ui_state=self.ui_node.state)
# Skip Input — this isn't speech, go straight to Thinker
command = Command(instruction=f"User clicked UI button: {action}", source_text=action_desc) command = Command(instruction=f"User clicked UI button: {action}", source_text=action_desc)
thought = await self.thinker.process(command, self.history, memory_context=mem_ctx) thought = await self.thinker.process(command, self.history, memory_context=mem_ctx)
@ -97,6 +119,18 @@ class Runtime:
self.history = self.history[-self.MAX_HISTORY:] self.history = self.history[-self.MAX_HISTORY:]
async def handle_message(self, text: str): async def handle_message(self, text: str):
# Detect ACTION: prefix from API/test runner
if text.startswith("ACTION:"):
parts = text.split("|", 1)
action = parts[0].replace("ACTION:", "").strip()
data = None
if len(parts) > 1:
try:
data = json.loads(parts[1].replace("data:", "").strip())
except (json.JSONDecodeError, Exception):
pass
return await self.handle_action(action, data)
envelope = Envelope( envelope = Envelope(
text=text, text=text,
user_id="nico", user_id="nico",
@ -108,7 +142,7 @@ class Runtime:
self.history.append({"role": "user", "content": text}) self.history.append({"role": "user", "content": text})
sensor_lines = self.sensor.get_context_lines() sensor_lines = self.sensor.get_context_lines()
mem_ctx = self.memorizer.get_context_block(sensor_lines=sensor_lines) mem_ctx = self.memorizer.get_context_block(sensor_lines=sensor_lines, ui_state=self.ui_node.state)
command = await self.input_node.process( command = await self.input_node.process(
envelope, self.history, memory_context=mem_ctx, envelope, self.history, memory_context=mem_ctx,

View File

@ -89,9 +89,13 @@ def _parse_command(text: str) -> dict | None:
if text.startswith("send:"): if text.startswith("send:"):
return {"type": "send", "text": text[5:].strip()} return {"type": "send", "text": text[5:].strip()}
# action: action_name # action: action_name OR action: first matching "pattern"
if text.startswith("action:"): if text.startswith("action:"):
return {"type": "action", "action": text[7:].strip()} val = text[7:].strip()
m = re.match(r'first matching "(.+)"', val)
if m:
return {"type": "action_match", "pattern": m.group(1)}
return {"type": "action", "action": val}
# expect_response: contains "foo" # expect_response: contains "foo"
if text.startswith("expect_response:"): if text.startswith("expect_response:"):
@ -142,13 +146,12 @@ class CogClient:
def _fetch_trace(self): def _fetch_trace(self):
r = self.client.get(f"{API}/trace?last=10", headers=HEADERS) r = self.client.get(f"{API}/trace?last=10", headers=HEADERS)
self.last_trace = r.json().get("lines", []) self.last_trace = r.json().get("lines", [])
# Extract actions from trace # Extract actions from trace — accumulate, don't replace
self.last_actions = []
for t in self.last_trace: for t in self.last_trace:
if t.get("event") == "controls": if t.get("event") == "controls":
for ctrl in t.get("controls", []): new_actions = [c for c in t.get("controls", []) if c.get("type") == "button"]
if ctrl.get("type") == "button": if new_actions:
self.last_actions.append(ctrl) self.last_actions = new_actions
def get_state(self) -> dict: def get_state(self) -> dict:
r = self.client.get(f"{API}/state", headers=HEADERS) r = self.client.get(f"{API}/state", headers=HEADERS)
@ -306,6 +309,26 @@ class CogTestRunner:
results.append({"step": step_name, "check": f"action: {cmd['action']}", "status": "FAIL", results.append({"step": step_name, "check": f"action: {cmd['action']}", "status": "FAIL",
"detail": str(e)}) "detail": str(e)})
elif cmd["type"] == "action_match":
# Find first action matching pattern in last_actions
pattern = cmd["pattern"].lower()
matched = None
for a in self.client.last_actions:
if pattern in a.get("action", "").lower() or pattern in a.get("label", "").lower():
matched = a["action"]
break
if matched:
try:
self.client.send_action(matched)
results.append({"step": step_name, "check": f"action: {matched}", "status": "PASS",
"detail": f"response: {self.client.last_response[:80]}"})
except Exception as e:
results.append({"step": step_name, "check": f"action: {matched}", "status": "FAIL",
"detail": str(e)})
else:
results.append({"step": step_name, "check": f"action matching '{pattern}'", "status": "FAIL",
"detail": f"no action matching '{pattern}' in {[a.get('action') for a in self.client.last_actions]}"})
elif cmd["type"] == "expect_response": elif cmd["type"] == "expect_response":
passed, detail = check_response(self.client.last_response, cmd["check"]) passed, detail = check_response(self.client.last_response, cmd["check"])
results.append({"step": step_name, "check": f"response: {cmd['check']}", results.append({"step": step_name, "check": f"response: {cmd['check']}",

View File

@ -23,15 +23,15 @@ and that UI handles local actions without round-tripping to Thinker.
- expect_response: contains "0" - expect_response: contains "0"
### 4. Increment ### 4. Increment
- action: increment - action: first matching "inc"
- expect_response: contains "1" - expect_response: contains "1"
### 5. Increment again ### 5. Increment again
- action: increment - action: first matching "inc"
- expect_response: contains "2" - expect_response: contains "2"
### 6. Decrement ### 6. Decrement
- action: decrement - action: first matching "dec"
- expect_response: contains "1" - expect_response: contains "1"
### 7. Verify memorizer tracks it ### 7. Verify memorizer tracks it

View File

@ -1,6 +1,104 @@
{ {
"timestamp": "2026-03-28 15:34:02", "timestamp": "2026-03-28 15:50:12",
"testcases": { "testcases": {
"Counter State": [
{
"step": "Setup",
"check": "clear",
"status": "PASS",
"detail": "cleared"
},
{
"step": "Create counter",
"check": "send: create a counter starting at 0 with incr",
"status": "PASS",
"detail": "response: Sure, here is a counter starting at 0. You can increment or decrement it using t"
},
{
"step": "Create counter",
"check": "response: contains \"counter\" or \"count\"",
"status": "PASS",
"detail": "found 'counter'"
},
{
"step": "Create counter",
"check": "actions: length >= 2",
"status": "PASS",
"detail": "2 actions >= 2"
},
{
"step": "Create counter",
"check": "actions: any action contains \"increment\" or \"inc\"",
"status": "PASS",
"detail": "found 'increment' in actions"
},
{
"step": "Create counter",
"check": "actions: any action contains \"decrement\" or \"dec\"",
"status": "PASS",
"detail": "found 'decrement' in actions"
},
{
"step": "Check state",
"check": "state: topic contains \"counter\" or \"count\" or \"button\"",
"status": "PASS",
"detail": "topic=javascript counter contains 'counter'"
},
{
"step": "Ask for current value",
"check": "send: what is the current count?",
"status": "PASS",
"detail": "response: The current count is 0.\n"
},
{
"step": "Ask for current value",
"check": "response: contains \"0\"",
"status": "PASS",
"detail": "found '0'"
},
{
"step": "Increment",
"check": "action: increment",
"status": "PASS",
"detail": "response: count is now 1"
},
{
"step": "Increment",
"check": "response: contains \"1\"",
"status": "PASS",
"detail": "found '1'"
},
{
"step": "Increment again",
"check": "action: increment",
"status": "PASS",
"detail": "response: count is now 2"
},
{
"step": "Increment again",
"check": "response: contains \"2\"",
"status": "PASS",
"detail": "found '2'"
},
{
"step": "Decrement",
"check": "action: decrement",
"status": "PASS",
"detail": "response: count is now 1"
},
{
"step": "Decrement",
"check": "response: contains \"1\"",
"status": "PASS",
"detail": "found '1'"
},
{
"step": "Verify memorizer tracks it",
"check": "state: topic contains \"count\"",
"status": "PASS",
"detail": "topic=javascript counter contains 'count'"
}
],
"Pub Conversation": [ "Pub Conversation": [
{ {
"step": "Setup", "step": "Setup",
@ -12,31 +110,31 @@
"step": "Set the scene", "step": "Set the scene",
"check": "send: Hey, Tina and I are heading to the pub t", "check": "send: Hey, Tina and I are heading to the pub t",
"status": "PASS", "status": "PASS",
"detail": "response: Das ist toll! Was trinkt ihr beide heute Abend?\n" "detail": "response: Sounds fun! Enjoy your night at the pub with Tina! What are your plans for the e"
}, },
{ {
"step": "Set the scene", "step": "Set the scene",
"check": "response: length > 10", "check": "response: length > 10",
"status": "PASS", "status": "PASS",
"detail": "length 48 > 10" "detail": "length 88 > 10"
}, },
{ {
"step": "Set the scene", "step": "Set the scene",
"check": "state: situation contains \"pub\" or \"Tina\"", "check": "state: situation contains \"pub\" or \"Tina\"",
"status": "PASS", "status": "PASS",
"detail": "situation=at a pub with tina, authenticated on https://cog.l contains 'pub'" "detail": "situation=at a pub with Tina contains 'pub'"
}, },
{ {
"step": "Language switch to German", "step": "Language switch to German",
"check": "send: Wir sind jetzt im Biergarten angekommen", "check": "send: Wir sind jetzt im Biergarten angekommen",
"status": "PASS", "status": "PASS",
"detail": "response: Super, genießt euer Biergarten-Erlebnis! Und was ist mit Tina? Trinkt sie auch e" "detail": "response: Super! Habt eine schöne Zeit im Biergarten!\n"
}, },
{ {
"step": "Language switch to German", "step": "Language switch to German",
"check": "response: length > 10", "check": "response: length > 10",
"status": "PASS", "status": "PASS",
"detail": "length 95 > 10" "detail": "length 44 > 10"
}, },
{ {
"step": "Language switch to German", "step": "Language switch to German",
@ -48,31 +146,31 @@
"step": "Context awareness", "step": "Context awareness",
"check": "send: Was sollen wir bestellen?", "check": "send: Was sollen wir bestellen?",
"status": "PASS", "status": "PASS",
"detail": "response: Kommt drauf an, worauf ihr Lust habt! Im Biergarten sind Klassiker wie **Helles*" "detail": "response: Hmm, bei dem schönen Wetter würde doch ein kühles Bier oder eine erfrischende Sc"
}, },
{ {
"step": "Context awareness", "step": "Context awareness",
"check": "response: length > 10", "check": "response: length > 10",
"status": "PASS", "status": "PASS",
"detail": "length 255 > 10" "detail": "length 121 > 10"
}, },
{ {
"step": "Context awareness", "step": "Context awareness",
"check": "state: topic contains \"bestell\" or \"order\" or \"pub\" or \"Biergarten\"", "check": "state: topic contains \"bestell\" or \"order\" or \"pub\" or \"Biergarten\"",
"status": "PASS", "status": "PASS",
"detail": "topic=ordering drinks contains 'order'" "detail": "topic=being at the Biergarten contains 'Biergarten'"
}, },
{ {
"step": "Tina speaks", "step": "Tina speaks",
"check": "send: Tina says: I'll have a Hefeweizen please", "check": "send: Tina says: I'll have a Hefeweizen please",
"status": "PASS", "status": "PASS",
"detail": "response: Ah, Tina bleibt ihren Vorlieben treu! Eine gute Wahl. Und für dich, Nico? Suchst" "detail": "response: Tina möchte also ein Hefeweizen. Was möchtest du bestellen, Nico?\n"
}, },
{ {
"step": "Tina speaks", "step": "Tina speaks",
"check": "response: length > 10", "check": "response: length > 10",
"status": "PASS", "status": "PASS",
"detail": "length 148 > 10" "detail": "length 66 > 10"
}, },
{ {
"step": "Tina speaks", "step": "Tina speaks",
@ -84,19 +182,19 @@
"step": "Ask for time (tool use)", "step": "Ask for time (tool use)",
"check": "send: wie spaet ist es eigentlich?", "check": "send: wie spaet ist es eigentlich?",
"status": "PASS", "status": "PASS",
"detail": "response: Du hast mich noch gar nicht danach gefragt. Es ist kurz vor halb 4. Also perfekt" "detail": "response: Es ist 15:49 Uhr.\n"
}, },
{ {
"step": "Ask for time (tool use)", "step": "Ask for time (tool use)",
"check": "response: matches \\d{1,2}:\\d{2}", "check": "response: matches \\d{1,2}:\\d{2}",
"status": "FAIL", "status": "PASS",
"detail": "/\\d{1,2}:\\d{2}/ not found in: Du hast mich noch gar nicht danach gefragt. Es ist kurz vor halb 4. Also perfekt, um den Feierabend " "detail": "matched /\\d{1,2}:\\d{2}/"
}, },
{ {
"step": "Back to English", "step": "Back to English",
"check": "send: Let's switch to English, what was the la", "check": "send: Let's switch to English, what was the la",
"status": "PASS", "status": "PASS",
"detail": "response: Okay, switching to English! 😉 The last thing Tina said was: \"I'll have a Hefewei" "detail": "response: Tina said she wants a Hefeweizen.\n"
}, },
{ {
"step": "Back to English", "step": "Back to English",
@ -114,7 +212,7 @@
"step": "Mood check", "step": "Mood check",
"check": "send: This is really fun!", "check": "send: This is really fun!",
"status": "PASS", "status": "PASS",
"detail": "response: Indeed! Glad you're having fun. It's always a pleasure chatting with you, Nico. " "detail": "response: I'm glad you're enjoying our conversation, Nico! It's fun for me too. What other"
}, },
{ {
"step": "Mood check", "step": "Mood check",
@ -125,7 +223,7 @@
] ]
}, },
"summary": { "summary": {
"passed": 19, "passed": 36,
"failed": 1 "failed": 0
} }
} }