v0.15.0: Frame engine (v3), PA + Expert architecture (v4-eras), live test streaming
Frame Engine (v3-framed): - Tick-based deterministic pipeline: frames advance on completion, not timers - FrameRecord/FrameTrace dataclasses for structured per-message tracing - /api/frames endpoint: queryable frame trace history (last 20 messages) - frame_trace HUD event with full pipeline visibility - Reflex=2F, Director=4F, Director+Interpreter=5F deterministic frame counts Expert Architecture (v4-eras): - PA node (pa_v1): routes to domain experts, holds user context - ExpertNode base: stateless executor with plan+execute two-LLM-call pattern - ErasExpertNode: eras2_production DB specialist with DESCRIBE-first discipline - Schema caching: DESCRIBE results reused across queries within session - Progress streaming: PA streams thinking message, expert streams per-tool progress - PARouting type for structured routing decisions UI Controls Split: - Separate thinker_controls from machine controls (current_controls is now a property) - Machine buttons persist across Thinker responses - Machine state parser handles both dict and list formats from Director - Normalized button format with go/payload field mapping WebSocket Architecture: - /ws/test: dedicated debug socket for test runner progress - /ws/trace: dedicated debug socket for HUD/frame trace events - /ws (chat): cleaned up, only deltas/controls/done/cleared - WS survives graph switch (re-attaches to new runtime) - Pipeline result reset on clear Test Infrastructure: - Live test streaming: on_result callback fires per check during execution - Frontend polling fallback (500ms) for proxy-buffered WS - frame_trace-first trace assertion (fixes stale perceived event bug) - action_match supports "or" patterns and multi-pattern matching - Trace window increased to 40 events - Graph-agnostic assertions (has X or Y) Test Suites: - smoketest.md: 12 steps covering all categories (~2min) - fast.md: 10 quick checks (~1min) - fast_v4.md: 10 v4-eras specific checks - expert_eras.md: eras domain tests (routing, DB, schema, errors) - expert_progress.md: progress streaming tests Other: - Shared db.py extracted from thinker_v2 (reused by experts) - InputNode prompt: few-shot examples, history as context summary - Director prompt: full tool signatures for add_state/reset_machine/destroy_machine - nginx no-cache headers for static files during development - Cache-busted static file references Scores: v3 smoketest 39/40, v4-eras fast 28/28, expert_eras 23/23 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
4c412d3c4b
commit
1000411eb2
@ -11,12 +11,23 @@ load_dotenv(Path(__file__).parent.parent / ".env")
|
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(name)s] %(message)s", datefmt="%H:%M:%S")
|
||||
|
||||
from fastapi import FastAPI
|
||||
from fastapi import FastAPI, Request
|
||||
from fastapi.responses import FileResponse
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from starlette.middleware.base import BaseHTTPMiddleware
|
||||
|
||||
from .api import register_routes
|
||||
|
||||
|
||||
class NoCacheStaticMiddleware(BaseHTTPMiddleware):
|
||||
"""Prevent browser/CDN caching of static files during development."""
|
||||
async def dispatch(self, request: Request, call_next):
|
||||
response = await call_next(request)
|
||||
if request.url.path.startswith("/static/"):
|
||||
response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
|
||||
response.headers["Pragma"] = "no-cache"
|
||||
return response
|
||||
|
||||
STATIC_DIR = Path(__file__).parent.parent / "static"
|
||||
|
||||
app = FastAPI(title="cog")
|
||||
|
||||
177
agent/api.py
177
agent/api.py
@ -23,6 +23,38 @@ _active_runtime: Runtime | None = None
|
||||
# SSE subscribers
|
||||
_sse_subscribers: list[Queue] = []
|
||||
|
||||
# Dedicated WS channels (debug sockets)
|
||||
_test_ws_clients: list[WebSocket] = [] # /ws/test subscribers
|
||||
_trace_ws_clients: list[WebSocket] = [] # /ws/trace subscribers
|
||||
|
||||
|
||||
async def _broadcast_test(event: dict):
|
||||
"""Push to all /ws/test subscribers."""
|
||||
msg = json.dumps(event)
|
||||
dead = []
|
||||
log.info(f"[ws/test] broadcasting to {len(_test_ws_clients)} clients")
|
||||
for ws in _test_ws_clients:
|
||||
try:
|
||||
await ws.send_text(msg)
|
||||
except Exception as e:
|
||||
log.error(f"[ws/test] send failed: {e}")
|
||||
dead.append(ws)
|
||||
for ws in dead:
|
||||
_test_ws_clients.remove(ws)
|
||||
|
||||
|
||||
async def _broadcast_trace(event: dict):
|
||||
"""Push to all /ws/trace subscribers."""
|
||||
msg = json.dumps(event)
|
||||
dead = []
|
||||
for ws in _trace_ws_clients:
|
||||
try:
|
||||
await ws.send_text(msg)
|
||||
except Exception:
|
||||
dead.append(ws)
|
||||
for ws in dead:
|
||||
_trace_ws_clients.remove(ws)
|
||||
|
||||
# Async message pipeline state
|
||||
_pipeline_task: asyncio.Task | None = None
|
||||
_pipeline_result: dict = {"status": "idle"}
|
||||
@ -30,12 +62,18 @@ _pipeline_id: int = 0
|
||||
|
||||
|
||||
def _broadcast_sse(event: dict):
|
||||
"""Push an event to all SSE subscribers + update pipeline progress."""
|
||||
"""Push an event to all SSE subscribers + /ws/trace + update pipeline progress."""
|
||||
for q in _sse_subscribers:
|
||||
try:
|
||||
q.put_nowait(event)
|
||||
except asyncio.QueueFull:
|
||||
pass
|
||||
# Push to /ws/trace subscribers (fire-and-forget)
|
||||
if _trace_ws_clients:
|
||||
try:
|
||||
asyncio.get_event_loop().create_task(_broadcast_trace(event))
|
||||
except RuntimeError:
|
||||
pass # no event loop (startup)
|
||||
# Update pipeline progress from HUD events
|
||||
if _pipeline_result.get("status") == "running":
|
||||
node = event.get("node", "")
|
||||
@ -113,16 +151,69 @@ def register_routes(app):
|
||||
while True:
|
||||
data = await ws.receive_text()
|
||||
msg = json.loads(data)
|
||||
# Always use current runtime (may change after graph switch)
|
||||
rt = _active_runtime or runtime
|
||||
if msg.get("type") == "action":
|
||||
await runtime.handle_action(msg.get("action", "unknown"), msg.get("data"))
|
||||
await rt.handle_action(msg.get("action", "unknown"), msg.get("data"))
|
||||
elif msg.get("type") == "cancel_process":
|
||||
runtime.process_manager.cancel(msg.get("pid", 0))
|
||||
rt.process_manager.cancel(msg.get("pid", 0))
|
||||
else:
|
||||
await runtime.handle_message(msg.get("text", ""), dashboard=msg.get("dashboard"))
|
||||
await rt.handle_message(msg.get("text", ""), dashboard=msg.get("dashboard"))
|
||||
except WebSocketDisconnect:
|
||||
runtime.detach_ws()
|
||||
if _active_runtime:
|
||||
_active_runtime.detach_ws()
|
||||
log.info("[api] WS disconnected — runtime stays alive")
|
||||
|
||||
async def _auth_debug_ws(ws: WebSocket, token: str | None) -> bool:
|
||||
"""Validate token for debug WS. Returns True if auth OK."""
|
||||
if not AUTH_ENABLED:
|
||||
return True
|
||||
if not token:
|
||||
await ws.close(code=4001, reason="Missing token")
|
||||
return False
|
||||
try:
|
||||
await _validate_token(token)
|
||||
return True
|
||||
except HTTPException:
|
||||
await ws.close(code=4001, reason="Invalid token")
|
||||
return False
|
||||
|
||||
@app.websocket("/ws/test")
|
||||
async def ws_test(ws: WebSocket, token: str | None = Query(None)):
|
||||
"""Dedicated WS for test runner progress. Debug only, auth required."""
|
||||
await ws.accept()
|
||||
if not await _auth_debug_ws(ws, token):
|
||||
return
|
||||
_test_ws_clients.append(ws)
|
||||
log.info(f"[api] /ws/test connected ({len(_test_ws_clients)} clients)")
|
||||
try:
|
||||
while True:
|
||||
await ws.receive_text()
|
||||
except WebSocketDisconnect:
|
||||
pass
|
||||
finally:
|
||||
if ws in _test_ws_clients:
|
||||
_test_ws_clients.remove(ws)
|
||||
log.info(f"[api] /ws/test disconnected ({len(_test_ws_clients)} clients)")
|
||||
|
||||
@app.websocket("/ws/trace")
|
||||
async def ws_trace(ws: WebSocket, token: str | None = Query(None)):
|
||||
"""Dedicated WS for HUD/frame trace events. Debug only, auth required."""
|
||||
await ws.accept()
|
||||
if not await _auth_debug_ws(ws, token):
|
||||
return
|
||||
_trace_ws_clients.append(ws)
|
||||
log.info(f"[api] /ws/trace connected ({len(_trace_ws_clients)} clients)")
|
||||
try:
|
||||
while True:
|
||||
await ws.receive_text()
|
||||
except WebSocketDisconnect:
|
||||
pass
|
||||
finally:
|
||||
if ws in _trace_ws_clients:
|
||||
_trace_ws_clients.remove(ws)
|
||||
log.info(f"[api] /ws/trace disconnected ({len(_trace_ws_clients)} clients)")
|
||||
|
||||
@app.get("/api/events")
|
||||
async def sse_events(user=Depends(require_auth)):
|
||||
q: Queue = Queue(maxsize=100)
|
||||
@ -190,12 +281,19 @@ def register_routes(app):
|
||||
global _pipeline_result
|
||||
try:
|
||||
_pipeline_result["stage"] = "input"
|
||||
await runtime.handle_message(text, dashboard=dashboard)
|
||||
result = await runtime.handle_message(text, dashboard=dashboard)
|
||||
# Frame engine returns a dict with response; imperative pipeline uses history
|
||||
if isinstance(result, dict) and "response" in result:
|
||||
response = result["response"]
|
||||
log.info(f"[api] frame engine response[{len(response)}]: {response[:80]}")
|
||||
else:
|
||||
response = runtime.history[-1]["content"] if runtime.history else ""
|
||||
log.info(f"[api] history response[{len(response)}]: {response[:80]}")
|
||||
_pipeline_result = {
|
||||
"status": "done",
|
||||
"id": msg_id,
|
||||
"stage": "done",
|
||||
"response": runtime.history[-1]["content"] if runtime.history else "",
|
||||
"response": response,
|
||||
"memorizer": runtime.memorizer.state,
|
||||
}
|
||||
except Exception as e:
|
||||
@ -216,14 +314,47 @@ def register_routes(app):
|
||||
"""Poll for the current pipeline result."""
|
||||
return _pipeline_result
|
||||
|
||||
@app.get("/api/frames")
|
||||
async def api_frames(user=Depends(require_auth), last: int = 5):
|
||||
"""Get frame traces from the frame engine. Returns last N message traces."""
|
||||
runtime = _ensure_runtime()
|
||||
if hasattr(runtime, 'frame_engine'):
|
||||
engine = runtime.frame_engine
|
||||
traces = engine.trace_history[-last:]
|
||||
return {
|
||||
"graph": engine.graph.get("name", "unknown"),
|
||||
"engine": "frames",
|
||||
"traces": traces,
|
||||
"last_trace": engine.last_trace.to_dict() if engine.last_trace.message else None,
|
||||
}
|
||||
return {"engine": "imperative", "traces": [], "detail": "Frame engine not active"}
|
||||
|
||||
@app.post("/api/clear")
|
||||
async def api_clear(user=Depends(require_auth)):
|
||||
global _pipeline_result
|
||||
runtime = _ensure_runtime()
|
||||
runtime.history.clear()
|
||||
runtime.ui_node.state.clear()
|
||||
runtime.ui_node.bindings.clear()
|
||||
runtime.ui_node.current_controls.clear()
|
||||
runtime.ui_node.thinker_controls.clear()
|
||||
runtime.ui_node.machines.clear()
|
||||
runtime.memorizer.state = {
|
||||
"user_name": runtime.identity,
|
||||
"user_mood": "neutral",
|
||||
"topic": None,
|
||||
"topic_history": [],
|
||||
"situation": runtime.memorizer.state.get("situation", ""),
|
||||
"language": "en",
|
||||
"style_hint": "casual, technical",
|
||||
"facts": [],
|
||||
}
|
||||
_pipeline_result = {"status": "idle", "id": "", "stage": "cleared"}
|
||||
# Notify frontend via WS
|
||||
if runtime.sink.ws:
|
||||
try:
|
||||
await runtime.sink.ws.send_text(json.dumps({"type": "cleared"}))
|
||||
except Exception:
|
||||
pass
|
||||
return {"status": "cleared"}
|
||||
|
||||
@app.get("/api/state")
|
||||
@ -270,12 +401,28 @@ def register_routes(app):
|
||||
name = body.get("name", "")
|
||||
graph = load_graph(name) # validates it exists
|
||||
rt._active_graph_name = name
|
||||
# Kill old runtime, next request creates new one with new graph
|
||||
|
||||
# Preserve WS connection across graph switch
|
||||
old_ws = None
|
||||
old_claims = {}
|
||||
old_origin = ""
|
||||
if _active_runtime:
|
||||
old_ws = _active_runtime.sink.ws
|
||||
old_claims = {"name": _active_runtime.identity}
|
||||
old_origin = _active_runtime.channel
|
||||
_active_runtime.sensor.stop()
|
||||
_active_runtime = None
|
||||
|
||||
# Create new runtime with new graph
|
||||
new_runtime = _ensure_runtime(user_claims=old_claims, origin=old_origin)
|
||||
|
||||
# Re-attach WS if it was connected
|
||||
if old_ws:
|
||||
new_runtime.attach_ws(old_ws)
|
||||
log.info(f"[api] re-attached WS after graph switch to '{name}'")
|
||||
|
||||
return {"status": "ok", "name": graph["name"],
|
||||
"note": "New sessions will use this graph. Existing session unchanged."}
|
||||
"note": "Graph switched. WS re-attached."}
|
||||
|
||||
# --- Test status (real-time) ---
|
||||
_test_status = {"running": False, "current": "", "results": [], "last_green": None, "last_red": None, "total_expected": 0}
|
||||
@ -304,14 +451,10 @@ def register_routes(app):
|
||||
elif event == "suite_end":
|
||||
_test_status["running"] = False
|
||||
_test_status["current"] = ""
|
||||
# Broadcast to frontend via SSE + WS
|
||||
# Broadcast to /ws/test subscribers — must await to ensure delivery before response
|
||||
await _broadcast_test({"type": "test_status", **_test_status})
|
||||
# Also SSE for backward compat
|
||||
_broadcast_sse({"type": "test_status", **_test_status})
|
||||
runtime = _ensure_runtime()
|
||||
if runtime.sink.ws:
|
||||
try:
|
||||
await runtime.sink.ws.send_text(json.dumps({"type": "test_status", **_test_status}))
|
||||
except Exception:
|
||||
pass
|
||||
return {"ok": True}
|
||||
|
||||
@app.get("/api/test/status")
|
||||
|
||||
36
agent/db.py
Normal file
36
agent/db.py
Normal file
@ -0,0 +1,36 @@
|
||||
"""Shared database access for Thinker and Expert nodes."""
|
||||
|
||||
import logging
|
||||
|
||||
log = logging.getLogger("runtime")
|
||||
|
||||
DB_HOST = "mariadb-eras"
|
||||
DB_USER = "root"
|
||||
DB_PASS = "root"
|
||||
ALLOWED_DATABASES = ("eras2_production", "plankiste_test")
|
||||
|
||||
|
||||
def run_db_query(query: str, database: str = "eras2_production",
|
||||
host: str = DB_HOST, user: str = DB_USER, password: str = DB_PASS) -> str:
|
||||
"""Execute a read-only SQL query against MariaDB. Returns tab-separated results."""
|
||||
import pymysql
|
||||
trimmed = query.strip().upper()
|
||||
if not (trimmed.startswith("SELECT") or trimmed.startswith("DESCRIBE") or trimmed.startswith("SHOW")):
|
||||
return "Error: Only SELECT/DESCRIBE/SHOW queries allowed"
|
||||
if database not in ALLOWED_DATABASES:
|
||||
return f"Error: Unknown database '{database}'"
|
||||
conn = pymysql.connect(host=host, user=user, password=password,
|
||||
database=database, connect_timeout=5, read_timeout=15)
|
||||
try:
|
||||
with conn.cursor() as cur:
|
||||
cur.execute(query)
|
||||
rows = cur.fetchall()
|
||||
if not rows:
|
||||
return "(no results)"
|
||||
cols = [d[0] for d in cur.description]
|
||||
lines = ["\t".join(cols)]
|
||||
for row in rows:
|
||||
lines.append("\t".join(str(v) if v is not None else "" for v in row))
|
||||
return "\n".join(lines)
|
||||
finally:
|
||||
conn.close()
|
||||
@ -62,6 +62,7 @@ def _graph_from_module(mod) -> dict:
|
||||
"edges": getattr(mod, "EDGES", []),
|
||||
"conditions": getattr(mod, "CONDITIONS", {}),
|
||||
"audit": getattr(mod, "AUDIT", {}),
|
||||
"engine": getattr(mod, "ENGINE", "imperative"),
|
||||
}
|
||||
|
||||
|
||||
@ -73,8 +74,8 @@ def instantiate_nodes(graph: dict, send_hud, process_manager: ProcessManager = N
|
||||
if not cls:
|
||||
log.error(f"[engine] node class not found: {impl_name}")
|
||||
continue
|
||||
# ThinkerNode needs process_manager
|
||||
if impl_name.startswith("thinker"):
|
||||
# Thinker and Expert nodes accept process_manager
|
||||
if impl_name.startswith("thinker") or impl_name.endswith("_expert"):
|
||||
nodes[role] = cls(send_hud=send_hud, process_manager=process_manager)
|
||||
else:
|
||||
nodes[role] = cls(send_hud=send_hud)
|
||||
|
||||
632
agent/frame_engine.py
Normal file
632
agent/frame_engine.py
Normal file
@ -0,0 +1,632 @@
|
||||
"""Frame Engine: tick-based deterministic pipeline execution.
|
||||
|
||||
Replaces the imperative handle_message() with a frame-stepping model:
|
||||
- Each frame dispatches all nodes that have pending input
|
||||
- Frames advance on completion (not on a timer)
|
||||
- 0ms when idle, engine awaits external input
|
||||
- Deterministic ordering: reflex=2 frames, thinker=3-4, interpreter=5
|
||||
|
||||
Works with any graph definition (v1, v2, v3). Node implementations unchanged.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import time
|
||||
from dataclasses import dataclass, field, asdict
|
||||
|
||||
from .types import Envelope, Command, InputAnalysis, ThoughtResult, DirectorPlan, PARouting
|
||||
|
||||
log = logging.getLogger("runtime")
|
||||
|
||||
|
||||
# --- Frame Trace ---
|
||||
|
||||
@dataclass
|
||||
class FrameRecord:
|
||||
"""One frame's execution record."""
|
||||
frame: int
|
||||
node: str # which node ran ("input", "director", ...)
|
||||
started: float = 0.0 # time.monotonic()
|
||||
ended: float = 0.0
|
||||
duration_ms: float = 0.0
|
||||
input_summary: str = "" # what the node received
|
||||
output_summary: str = "" # what the node produced
|
||||
route: str = "" # where output goes next ("director", "output+ui", ...)
|
||||
condition: str = "" # if a condition was evaluated ("reflex=True", ...)
|
||||
error: str = "" # if the node failed
|
||||
|
||||
|
||||
@dataclass
|
||||
class FrameTrace:
|
||||
"""Complete trace of one message through the pipeline."""
|
||||
message: str = ""
|
||||
graph: str = ""
|
||||
total_frames: int = 0
|
||||
total_ms: float = 0.0
|
||||
started: float = 0.0
|
||||
path: str = "" # "reflex", "director", "director+interpreter"
|
||||
frames: list = field(default_factory=list) # list of FrameRecord
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"message": self.message[:100],
|
||||
"graph": self.graph,
|
||||
"total_frames": self.total_frames,
|
||||
"total_ms": round(self.total_ms, 1),
|
||||
"path": self.path,
|
||||
"frames": [
|
||||
{
|
||||
"frame": f.frame,
|
||||
"node": f.node,
|
||||
"duration_ms": round(f.duration_ms, 1),
|
||||
"input": f.input_summary[:200],
|
||||
"output": f.output_summary[:200],
|
||||
"route": f.route,
|
||||
"condition": f.condition,
|
||||
"error": f.error,
|
||||
}
|
||||
for f in self.frames
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
class FrameEngine:
|
||||
"""Tick-based engine that steps through graph nodes frame by frame."""
|
||||
|
||||
def __init__(self, graph: dict, nodes: dict, sink, history: list,
|
||||
send_hud, sensor, memorizer, ui_node, identity: str = "unknown",
|
||||
channel: str = "unknown", broadcast=None):
|
||||
self.graph = graph
|
||||
self.nodes = nodes
|
||||
self.sink = sink
|
||||
self.history = history
|
||||
self._send_hud = send_hud
|
||||
self.sensor = sensor
|
||||
self.memorizer = memorizer
|
||||
self.ui_node = ui_node
|
||||
self.identity = identity
|
||||
self.channel = channel
|
||||
self._broadcast = broadcast or (lambda e: None)
|
||||
|
||||
self.frame = 0
|
||||
self.bus = {}
|
||||
self.conditions = graph.get("conditions", {})
|
||||
self.edges = [e for e in graph.get("edges", []) if e.get("type") == "data"]
|
||||
|
||||
self.has_director = "director" in nodes and hasattr(nodes.get("director"), "decide")
|
||||
self.has_interpreter = "interpreter" in nodes
|
||||
self.has_pa = "pa" in nodes and hasattr(nodes.get("pa"), "route")
|
||||
|
||||
# Discover available experts in this graph
|
||||
self._experts = {}
|
||||
for role, node in nodes.items():
|
||||
if role.startswith("expert_"):
|
||||
expert_name = role[7:] # "expert_eras" → "eras"
|
||||
self._experts[expert_name] = node
|
||||
if self.has_pa and self._experts:
|
||||
nodes["pa"].set_available_experts(list(self._experts.keys()))
|
||||
log.info(f"[frame] PA with experts: {list(self._experts.keys())}")
|
||||
|
||||
# Frame trace — last message's complete trace, queryable via API
|
||||
self.last_trace: FrameTrace = FrameTrace()
|
||||
# History of recent traces (last 20 messages)
|
||||
self.trace_history: list[dict] = []
|
||||
self.MAX_TRACE_HISTORY = 20
|
||||
|
||||
# --- Frame lifecycle helpers ---
|
||||
|
||||
def _begin_frame(self, frame_num: int, node: str, input_summary: str = "") -> FrameRecord:
|
||||
"""Start a new frame. Returns the record to fill in."""
|
||||
self.frame = frame_num
|
||||
rec = FrameRecord(
|
||||
frame=frame_num,
|
||||
node=node,
|
||||
started=time.monotonic(),
|
||||
input_summary=input_summary,
|
||||
)
|
||||
self.last_trace.frames.append(rec)
|
||||
return rec
|
||||
|
||||
def _end_frame(self, rec: FrameRecord, output_summary: str = "",
|
||||
route: str = "", condition: str = ""):
|
||||
"""Complete a frame record with output and timing."""
|
||||
rec.ended = time.monotonic()
|
||||
rec.duration_ms = (rec.ended - rec.started) * 1000
|
||||
rec.output_summary = output_summary
|
||||
rec.route = route
|
||||
rec.condition = condition
|
||||
log.info(f"[frame] F{rec.frame} {rec.node} "
|
||||
f"{rec.duration_ms:.0f}ms -> {route or 'done'}")
|
||||
|
||||
def _begin_trace(self, text: str) -> FrameTrace:
|
||||
"""Start a new message trace."""
|
||||
trace = FrameTrace(
|
||||
message=text,
|
||||
graph=self.graph.get("name", "unknown"),
|
||||
started=time.monotonic(),
|
||||
)
|
||||
self.last_trace = trace
|
||||
self.frame = 0
|
||||
return trace
|
||||
|
||||
def _end_trace(self, path: str):
|
||||
"""Finalize the trace and emit as HUD event."""
|
||||
t = self.last_trace
|
||||
t.total_frames = self.frame
|
||||
t.total_ms = (time.monotonic() - t.started) * 1000
|
||||
t.path = path
|
||||
# Store in history
|
||||
self.trace_history.append(t.to_dict())
|
||||
if len(self.trace_history) > self.MAX_TRACE_HISTORY:
|
||||
self.trace_history = self.trace_history[-self.MAX_TRACE_HISTORY:]
|
||||
log.info(f"[frame] trace: {path} {t.total_frames}F {t.total_ms:.0f}ms")
|
||||
|
||||
async def _emit_trace_hud(self):
|
||||
"""Emit the completed frame trace as a single HUD event."""
|
||||
t = self.last_trace
|
||||
await self._send_hud({
|
||||
"node": "frame_engine",
|
||||
"event": "frame_trace",
|
||||
"trace": t.to_dict(),
|
||||
})
|
||||
|
||||
# --- Main entry point ---
|
||||
|
||||
async def process_message(self, text: str, dashboard: list = None) -> dict:
|
||||
"""Process a message through the frame pipeline.
|
||||
Returns {response, controls, memorizer, frames, trace}."""
|
||||
|
||||
self._begin_trace(text)
|
||||
|
||||
# Handle ACTION: prefix
|
||||
if text.startswith("ACTION:"):
|
||||
return await self._handle_action(text, dashboard)
|
||||
|
||||
# Setup
|
||||
envelope = Envelope(
|
||||
text=text, user_id=self.identity,
|
||||
session_id="test", timestamp=time.strftime("%Y-%m-%d %H:%M:%S"),
|
||||
)
|
||||
self.sensor.note_user_activity()
|
||||
if dashboard is not None:
|
||||
self.sensor.update_browser_dashboard(dashboard)
|
||||
self.history.append({"role": "user", "content": text})
|
||||
|
||||
# --- Frame 1: Input ---
|
||||
mem_ctx = self._build_context(dashboard)
|
||||
rec = self._begin_frame(1, "input", input_summary=text[:100])
|
||||
|
||||
command = await self.nodes["input"].process(
|
||||
envelope, self.history, memory_context=mem_ctx,
|
||||
identity=self.identity, channel=self.channel)
|
||||
|
||||
a = command.analysis
|
||||
cmd_summary = f"intent={a.intent} language={a.language} tone={a.tone} complexity={a.complexity}"
|
||||
|
||||
# Check reflex condition
|
||||
is_reflex = self._check_condition("reflex", command=command)
|
||||
if is_reflex:
|
||||
self._end_frame(rec, output_summary=cmd_summary,
|
||||
route="output (reflex)", condition="reflex=True")
|
||||
await self._send_hud({"node": "runtime", "event": "reflex_path",
|
||||
"detail": f"{a.intent}/{a.complexity}"})
|
||||
return await self._run_reflex(command, mem_ctx)
|
||||
else:
|
||||
next_node = "pa" if self.has_pa else ("director" if self.has_director else "thinker")
|
||||
self._end_frame(rec, output_summary=cmd_summary,
|
||||
route=next_node, condition=f"reflex=False")
|
||||
|
||||
# --- Frame 2+: Pipeline ---
|
||||
if self.has_pa:
|
||||
return await self._run_expert_pipeline(command, mem_ctx, dashboard)
|
||||
elif self.has_director:
|
||||
return await self._run_director_pipeline(command, mem_ctx, dashboard)
|
||||
else:
|
||||
return await self._run_thinker_pipeline(command, mem_ctx, dashboard)
|
||||
|
||||
# --- Pipeline variants ---
|
||||
|
||||
async def _run_reflex(self, command: Command, mem_ctx: str) -> dict:
|
||||
"""Reflex: Input(F1) → Output(F2)."""
|
||||
rec = self._begin_frame(2, "output", input_summary="reflex passthrough")
|
||||
|
||||
thought = ThoughtResult(response=command.source_text, actions=[])
|
||||
response = await self._run_output_and_ui(thought, mem_ctx)
|
||||
|
||||
self.history.append({"role": "assistant", "content": response})
|
||||
await self.memorizer.update(self.history)
|
||||
self._trim_history()
|
||||
|
||||
self._end_frame(rec, output_summary=f"response[{len(response)}]")
|
||||
self._end_trace("reflex")
|
||||
await self._emit_trace_hud()
|
||||
return self._make_result(response)
|
||||
|
||||
async def _run_expert_pipeline(self, command: Command, mem_ctx: str,
|
||||
dashboard: list = None) -> dict:
|
||||
"""Expert pipeline: Input(F1) → PA(F2) → Expert(F3) → [Interpreter(F4)] → Output."""
|
||||
a = command.analysis
|
||||
|
||||
# Frame 2: PA routes
|
||||
rec = self._begin_frame(2, "pa",
|
||||
input_summary=f"intent={a.intent} topic={a.topic}")
|
||||
routing = await self.nodes["pa"].route(
|
||||
command, self.history, memory_context=mem_ctx,
|
||||
identity=self.identity, channel=self.channel)
|
||||
route_summary = f"expert={routing.expert} job={routing.job[:60]}"
|
||||
self._end_frame(rec, output_summary=route_summary,
|
||||
route=f"expert_{routing.expert}" if routing.expert != "none" else "output")
|
||||
|
||||
# Stream thinking message to user
|
||||
if routing.thinking_message:
|
||||
await self.sink.send_delta(routing.thinking_message + "\n\n")
|
||||
|
||||
# Direct PA response (no expert needed)
|
||||
if routing.expert == "none":
|
||||
rec = self._begin_frame(3, "output+ui",
|
||||
input_summary=f"pa_direct: {routing.response_hint[:80]}")
|
||||
thought = ThoughtResult(response=routing.response_hint, actions=[])
|
||||
response = await self._run_output_and_ui(thought, mem_ctx)
|
||||
self.history.append({"role": "assistant", "content": response})
|
||||
await self.memorizer.update(self.history)
|
||||
self._trim_history()
|
||||
self._end_frame(rec, output_summary=f"response[{len(response)}]")
|
||||
self._end_trace("pa_direct")
|
||||
await self._emit_trace_hud()
|
||||
return self._make_result(response)
|
||||
|
||||
# Frame 3: Expert executes
|
||||
expert = self._experts.get(routing.expert)
|
||||
if not expert:
|
||||
log.error(f"[frame] expert '{routing.expert}' not found")
|
||||
thought = ThoughtResult(response=f"Expert '{routing.expert}' not available.")
|
||||
rec = self._begin_frame(3, "output+ui", input_summary="expert_not_found")
|
||||
response = await self._run_output_and_ui(thought, mem_ctx)
|
||||
self.history.append({"role": "assistant", "content": response})
|
||||
self._end_frame(rec, output_summary="error", error=f"expert '{routing.expert}' not found")
|
||||
self._end_trace("expert_error")
|
||||
await self._emit_trace_hud()
|
||||
return self._make_result(response)
|
||||
|
||||
rec = self._begin_frame(3, f"expert_{routing.expert}",
|
||||
input_summary=f"job: {routing.job[:80]}")
|
||||
|
||||
# Wrap expert's HUD to stream progress to user
|
||||
original_hud = expert.send_hud
|
||||
expert.send_hud = self._make_progress_wrapper(original_hud, routing.language)
|
||||
|
||||
try:
|
||||
thought = await expert.execute(routing.job, routing.language)
|
||||
finally:
|
||||
expert.send_hud = original_hud
|
||||
|
||||
thought_summary = (f"response[{len(thought.response)}] tool={thought.tool_used or 'none'} "
|
||||
f"actions={len(thought.actions)}")
|
||||
has_tool = bool(thought.tool_used and thought.tool_output)
|
||||
|
||||
# Interpreter (conditional)
|
||||
if self.has_interpreter and has_tool:
|
||||
self._end_frame(rec, output_summary=thought_summary,
|
||||
route="interpreter", condition="has_tool_output=True")
|
||||
rec = self._begin_frame(4, "interpreter",
|
||||
input_summary=f"tool={thought.tool_used} output[{len(thought.tool_output)}]")
|
||||
interpreted = await self.nodes["interpreter"].interpret(
|
||||
thought.tool_used, thought.tool_output, routing.job)
|
||||
thought.response = interpreted.summary
|
||||
self._end_frame(rec, output_summary=f"summary[{len(interpreted.summary)}]", route="output+ui")
|
||||
|
||||
rec = self._begin_frame(5, "output+ui",
|
||||
input_summary=f"interpreted: {interpreted.summary[:80]}")
|
||||
path = "expert+interpreter"
|
||||
else:
|
||||
self._end_frame(rec, output_summary=thought_summary,
|
||||
route="output+ui",
|
||||
condition="has_tool_output=False" if not has_tool else "")
|
||||
rec = self._begin_frame(4, "output+ui",
|
||||
input_summary=f"response: {thought.response[:80]}")
|
||||
path = "expert"
|
||||
|
||||
# Clear progress text, render final response
|
||||
self.sink.reset()
|
||||
response = await self._run_output_and_ui(thought, mem_ctx)
|
||||
self.history.append({"role": "assistant", "content": response})
|
||||
await self.memorizer.update(self.history)
|
||||
self._trim_history()
|
||||
|
||||
controls_count = len(self.ui_node.current_controls)
|
||||
self._end_frame(rec, output_summary=f"response[{len(response)}] controls={controls_count}")
|
||||
self._end_trace(path)
|
||||
await self._emit_trace_hud()
|
||||
return self._make_result(response)
|
||||
|
||||
def _make_progress_wrapper(self, original_hud, language: str):
|
||||
"""Wrap an expert's send_hud to also stream progress deltas to the user."""
|
||||
sink = self.sink
|
||||
progress_map = {
|
||||
"tool_call": {"query_db": "Daten werden abgerufen..." if language == "de" else "Fetching data...",
|
||||
"emit_actions": "UI wird erstellt..." if language == "de" else "Building UI...",
|
||||
"create_machine": "Maschine wird erstellt..." if language == "de" else "Creating machine...",
|
||||
"_default": "Verarbeite..." if language == "de" else "Processing..."},
|
||||
"tool_result": {"_default": ""}, # silent on result
|
||||
"planned": {"_default": "Plan erstellt..." if language == "de" else "Plan ready..."},
|
||||
}
|
||||
|
||||
async def wrapper(data: dict):
|
||||
await original_hud(data)
|
||||
event = data.get("event", "")
|
||||
if event in progress_map:
|
||||
tool = data.get("tool", "_default")
|
||||
msg = progress_map[event].get(tool, progress_map[event].get("_default", ""))
|
||||
if msg:
|
||||
await sink.send_delta(msg + "\n")
|
||||
|
||||
return wrapper
|
||||
|
||||
async def _run_director_pipeline(self, command: Command, mem_ctx: str,
|
||||
dashboard: list = None) -> dict:
|
||||
"""Director: Input(F1) → Director(F2) → Thinker(F3) → [Interpreter(F4)] → Output."""
|
||||
a = command.analysis
|
||||
|
||||
# Frame 2: Director
|
||||
rec = self._begin_frame(2, "director",
|
||||
input_summary=f"intent={a.intent} topic={a.topic}")
|
||||
plan = await self.nodes["director"].decide(command, self.history, memory_context=mem_ctx)
|
||||
plan_summary = f"goal={plan.goal} tools={len(plan.tool_sequence)} hint={plan.response_hint[:50]}"
|
||||
self._end_frame(rec, output_summary=plan_summary, route="thinker")
|
||||
|
||||
# Frame 3: Thinker
|
||||
rec = self._begin_frame(3, "thinker",
|
||||
input_summary=plan_summary[:100])
|
||||
thought = await self.nodes["thinker"].process(
|
||||
command, plan, self.history, memory_context=mem_ctx)
|
||||
thought_summary = (f"response[{len(thought.response)}] tool={thought.tool_used or 'none'} "
|
||||
f"actions={len(thought.actions)} machines={len(thought.machine_ops)}")
|
||||
has_tool = bool(thought.tool_used and thought.tool_output)
|
||||
|
||||
# Check interpreter condition
|
||||
if self.has_interpreter and has_tool:
|
||||
self._end_frame(rec, output_summary=thought_summary,
|
||||
route="interpreter", condition="has_tool_output=True")
|
||||
|
||||
# Frame 4: Interpreter
|
||||
rec = self._begin_frame(4, "interpreter",
|
||||
input_summary=f"tool={thought.tool_used} output[{len(thought.tool_output)}]")
|
||||
interpreted = await self.nodes["interpreter"].interpret(
|
||||
thought.tool_used, thought.tool_output, command.source_text)
|
||||
thought.response = interpreted.summary
|
||||
interp_summary = f"summary[{len(interpreted.summary)}] facts={interpreted.key_facts}"
|
||||
self._end_frame(rec, output_summary=interp_summary, route="output+ui")
|
||||
|
||||
# Frame 5: Output
|
||||
rec = self._begin_frame(5, "output+ui",
|
||||
input_summary=f"interpreted: {interpreted.summary[:80]}")
|
||||
path = "director+interpreter"
|
||||
else:
|
||||
self._end_frame(rec, output_summary=thought_summary,
|
||||
route="output+ui",
|
||||
condition="has_tool_output=False" if not has_tool else "")
|
||||
|
||||
# Frame 4: Output
|
||||
rec = self._begin_frame(4, "output+ui",
|
||||
input_summary=f"response: {thought.response[:80]}")
|
||||
path = "director"
|
||||
|
||||
response = await self._run_output_and_ui(thought, mem_ctx)
|
||||
self.history.append({"role": "assistant", "content": response})
|
||||
await self.memorizer.update(self.history)
|
||||
self._trim_history()
|
||||
|
||||
controls_count = len(self.ui_node.current_controls)
|
||||
self._end_frame(rec, output_summary=f"response[{len(response)}] controls={controls_count}")
|
||||
self._end_trace(path)
|
||||
await self._emit_trace_hud()
|
||||
return self._make_result(response)
|
||||
|
||||
async def _run_thinker_pipeline(self, command: Command, mem_ctx: str,
|
||||
dashboard: list = None) -> dict:
|
||||
"""v1: Input(F1) → Thinker(F2) → Output(F3)."""
|
||||
a = command.analysis
|
||||
|
||||
# Frame 2: Thinker
|
||||
rec = self._begin_frame(2, "thinker",
|
||||
input_summary=f"intent={a.intent} topic={a.topic}")
|
||||
|
||||
director = self.nodes.get("director")
|
||||
if director and hasattr(director, "plan"):
|
||||
is_complex = command.analysis.complexity == "complex"
|
||||
text = command.source_text
|
||||
is_data_request = (command.analysis.intent in ("request", "action")
|
||||
and any(k in text.lower()
|
||||
for k in ["daten", "data", "database", "db", "tabelle", "table",
|
||||
"query", "abfrage", "untersuche", "investigate",
|
||||
"analyse", "analyze", "customer", "kunde"]))
|
||||
if is_complex or (is_data_request and len(text.split()) > 8):
|
||||
await director.plan(self.history, self.memorizer.state, text)
|
||||
mem_ctx = self._build_context(dashboard)
|
||||
|
||||
thought = await self.nodes["thinker"].process(command, self.history, memory_context=mem_ctx)
|
||||
if director and hasattr(director, "current_plan"):
|
||||
director.current_plan = ""
|
||||
|
||||
thought_summary = f"response[{len(thought.response)}] tool={thought.tool_used or 'none'}"
|
||||
self._end_frame(rec, output_summary=thought_summary, route="output+ui")
|
||||
|
||||
# Frame 3: Output
|
||||
rec = self._begin_frame(3, "output+ui",
|
||||
input_summary=f"response: {thought.response[:80]}")
|
||||
response = await self._run_output_and_ui(thought, mem_ctx)
|
||||
self.history.append({"role": "assistant", "content": response})
|
||||
await self.memorizer.update(self.history)
|
||||
if director and hasattr(director, "update"):
|
||||
await director.update(self.history, self.memorizer.state)
|
||||
self._trim_history()
|
||||
|
||||
self._end_frame(rec, output_summary=f"response[{len(response)}]")
|
||||
self._end_trace("thinker")
|
||||
await self._emit_trace_hud()
|
||||
return self._make_result(response)
|
||||
|
||||
async def _handle_action(self, text: str, dashboard: list = None) -> dict:
|
||||
"""Handle ACTION: messages (button clicks)."""
|
||||
parts = text.split("|", 1)
|
||||
action = parts[0].replace("ACTION:", "").strip()
|
||||
data = None
|
||||
if len(parts) > 1:
|
||||
try:
|
||||
data = json.loads(parts[1].replace("data:", "").strip())
|
||||
except (json.JSONDecodeError, Exception):
|
||||
pass
|
||||
|
||||
self.sensor.note_user_activity()
|
||||
|
||||
# Frame 1: Try machine transition (no LLM)
|
||||
rec = self._begin_frame(1, "ui", input_summary=f"action={action}")
|
||||
handled, transition_result = self.ui_node.try_machine_transition(action)
|
||||
if handled:
|
||||
await self._send_hud({"node": "ui", "event": "machine_transition",
|
||||
"action": action, "detail": transition_result})
|
||||
controls = self.ui_node.get_machine_controls()
|
||||
for ctrl in self.ui_node.current_controls:
|
||||
if not ctrl.get("machine_id"):
|
||||
controls.append(ctrl)
|
||||
self.ui_node.current_controls = controls
|
||||
await self.sink.send_controls(controls)
|
||||
await self._send_hud({"node": "ui", "event": "controls", "controls": controls})
|
||||
self.sink.reset()
|
||||
for i in range(0, len(transition_result), 12):
|
||||
await self.sink.send_delta(transition_result[i:i + 12])
|
||||
await self.sink.send_done()
|
||||
self.history.append({"role": "user", "content": f"[clicked {action}]"})
|
||||
self.history.append({"role": "assistant", "content": transition_result})
|
||||
|
||||
self._end_frame(rec, output_summary=f"machine_transition: {transition_result[:80]}")
|
||||
self._end_trace("action_machine")
|
||||
await self._emit_trace_hud()
|
||||
return self._make_result(transition_result)
|
||||
|
||||
# Try local UI action
|
||||
result, controls = await self.ui_node.process_local_action(action, data)
|
||||
if result is not None:
|
||||
if controls:
|
||||
await self.sink.send_controls(controls)
|
||||
self.sink.reset()
|
||||
for i in range(0, len(result), 12):
|
||||
await self.sink.send_delta(result[i:i + 12])
|
||||
await self.sink.send_done()
|
||||
self.history.append({"role": "user", "content": f"[clicked {action}]"})
|
||||
self.history.append({"role": "assistant", "content": result})
|
||||
|
||||
self._end_frame(rec, output_summary=f"local_action: {result[:80]}")
|
||||
self._end_trace("action_local")
|
||||
await self._emit_trace_hud()
|
||||
return self._make_result(result)
|
||||
|
||||
# Complex action — needs full pipeline
|
||||
self._end_frame(rec, output_summary="no local handler", route="director/thinker")
|
||||
|
||||
action_desc = f"ACTION: {action}"
|
||||
if data:
|
||||
action_desc += f" | data: {json.dumps(data)}"
|
||||
self.history.append({"role": "user", "content": action_desc})
|
||||
|
||||
mem_ctx = self._build_context(dashboard)
|
||||
command = Command(
|
||||
analysis=InputAnalysis(intent="action", topic=action, complexity="simple"),
|
||||
source_text=action_desc)
|
||||
|
||||
if self.has_director:
|
||||
return await self._run_director_pipeline(command, mem_ctx, dashboard)
|
||||
else:
|
||||
return await self._run_thinker_pipeline(command, mem_ctx, dashboard)
|
||||
|
||||
# --- Helpers ---
|
||||
|
||||
def _build_context(self, dashboard: list = None) -> str:
|
||||
"""Build the full context string for nodes."""
|
||||
sensor_lines = self.sensor.get_context_lines()
|
||||
director = self.nodes.get("director")
|
||||
director_line = director.get_context_line() if director else ""
|
||||
mem_ctx = self.memorizer.get_context_block(
|
||||
sensor_lines=sensor_lines, ui_state=self.ui_node.state)
|
||||
if director_line:
|
||||
mem_ctx += f"\n\n{director_line}"
|
||||
machine_summary = self.ui_node.get_machine_summary()
|
||||
if machine_summary:
|
||||
mem_ctx += f"\n\n{machine_summary}"
|
||||
if dashboard is not None:
|
||||
mem_ctx += f"\n\n{self._format_dashboard(dashboard)}"
|
||||
sensor_flags = self.sensor.consume_flags()
|
||||
if sensor_flags:
|
||||
flag_lines = ["Sensor flags:"]
|
||||
for f in sensor_flags:
|
||||
if f["type"] == "idle_return":
|
||||
flag_lines.append(f" - User returned after {f['away_duration']} away.")
|
||||
elif f["type"] == "workspace_mismatch":
|
||||
flag_lines.append(f" - Workspace mismatch: {f['detail']}")
|
||||
mem_ctx += "\n\n" + "\n".join(flag_lines)
|
||||
return mem_ctx
|
||||
|
||||
def _format_dashboard(self, dashboard: list) -> str:
|
||||
"""Format dashboard controls into context string."""
|
||||
server_controls = self.ui_node.current_controls
|
||||
server_buttons = [str(c.get("label", "")) for c in server_controls if isinstance(c, dict) and c.get("type") == "button"]
|
||||
browser_buttons = [str(c.get("label", "")) for c in dashboard if isinstance(c, dict) and c.get("type") == "button"] if dashboard else []
|
||||
lines = []
|
||||
if server_buttons and not browser_buttons:
|
||||
lines.append(f"WARNING: Server sent {len(server_buttons)} controls but dashboard shows NONE.")
|
||||
lines.append(f" Expected: {', '.join(server_buttons)}")
|
||||
lines.append(" Controls failed to render. You MUST re-emit them in ACTIONS.")
|
||||
elif server_buttons and sorted(server_buttons) != sorted(browser_buttons):
|
||||
lines.append("WARNING: Dashboard mismatch.")
|
||||
lines.append(f" Server: {', '.join(server_buttons)}")
|
||||
lines.append(f" Browser: {', '.join(browser_buttons) or 'nothing'}")
|
||||
if not dashboard:
|
||||
lines.append("Dashboard: empty")
|
||||
else:
|
||||
lines.append("Dashboard (user sees):")
|
||||
for ctrl in dashboard:
|
||||
ctype = ctrl.get("type", "unknown")
|
||||
if ctype == "button":
|
||||
lines.append(f" - Button: {ctrl.get('label', '?')}")
|
||||
elif ctype == "table":
|
||||
lines.append(f" - Table: {len(ctrl.get('data', []))} rows")
|
||||
else:
|
||||
lines.append(f" - {ctype}: {ctrl.get('label', ctrl.get('text', '?'))}")
|
||||
return "\n".join(lines)
|
||||
|
||||
async def _run_output_and_ui(self, thought: ThoughtResult, mem_ctx: str) -> str:
|
||||
"""Run Output and UI nodes in parallel. Returns response text."""
|
||||
self.sink.reset()
|
||||
output_task = asyncio.create_task(
|
||||
self.nodes["output"].process(thought, self.history, self.sink, memory_context=mem_ctx))
|
||||
ui_task = asyncio.create_task(
|
||||
self.ui_node.process(thought, self.history, memory_context=mem_ctx))
|
||||
response, controls = await asyncio.gather(output_task, ui_task)
|
||||
if controls:
|
||||
await self.sink.send_controls(controls)
|
||||
return response
|
||||
|
||||
def _check_condition(self, name: str, command: Command = None,
|
||||
thought: ThoughtResult = None) -> bool:
|
||||
"""Evaluate a named condition."""
|
||||
if name == "reflex" and command:
|
||||
return (command.analysis.intent == "social"
|
||||
and command.analysis.complexity == "trivial")
|
||||
if name == "has_tool_output" and thought:
|
||||
return bool(thought.tool_used and thought.tool_output)
|
||||
return False
|
||||
|
||||
def _make_result(self, response: str) -> dict:
|
||||
"""Build the result dict returned to callers."""
|
||||
return {
|
||||
"response": response,
|
||||
"controls": self.ui_node.current_controls,
|
||||
"memorizer": self.memorizer.state,
|
||||
"frames": self.frame,
|
||||
"trace": self.last_trace.to_dict(),
|
||||
}
|
||||
|
||||
def _trim_history(self):
|
||||
if len(self.history) > 40:
|
||||
self.history[:] = self.history[-40:]
|
||||
65
agent/graphs/v3_framed.py
Normal file
65
agent/graphs/v3_framed.py
Normal file
@ -0,0 +1,65 @@
|
||||
"""v3-framed: Frame-based deterministic pipeline.
|
||||
|
||||
Same node topology as v2-director-drives but executed by FrameEngine
|
||||
with tick-based deterministic ordering.
|
||||
|
||||
Frame trace:
|
||||
Reflex: F1(Input) → F2(Output)
|
||||
Simple: F1(Input) → F2(Director) → F3(Thinker) → F4(Output+UI)
|
||||
With tools: F1(Input) → F2(Director) → F3(Thinker) → F4(Interpreter) → F5(Output+UI)
|
||||
"""
|
||||
|
||||
NAME = "v3-framed"
|
||||
DESCRIPTION = "Frame-based deterministic pipeline (Director+Thinker+Interpreter)"
|
||||
ENGINE = "frames" # Signals Runtime to use FrameEngine instead of handle_message()
|
||||
|
||||
NODES = {
|
||||
"input": "input_v1",
|
||||
"director": "director_v2",
|
||||
"thinker": "thinker_v2",
|
||||
"interpreter": "interpreter_v1",
|
||||
"output": "output_v1",
|
||||
"ui": "ui",
|
||||
"memorizer": "memorizer_v1",
|
||||
"sensor": "sensor",
|
||||
}
|
||||
|
||||
EDGES = [
|
||||
# Data edges — same as v2, engine reads for frame routing
|
||||
{"from": "input", "to": "director", "type": "data", "carries": "Command"},
|
||||
{"from": "input", "to": "output", "type": "data", "carries": "Command",
|
||||
"condition": "reflex"},
|
||||
{"from": "director", "to": "thinker", "type": "data", "carries": "DirectorPlan"},
|
||||
{"from": "thinker", "to": ["output", "ui"], "type": "data",
|
||||
"carries": "ThoughtResult", "parallel": True},
|
||||
{"from": "thinker", "to": "interpreter", "type": "data",
|
||||
"carries": "tool_output", "condition": "has_tool_output"},
|
||||
{"from": "interpreter", "to": "output", "type": "data",
|
||||
"carries": "InterpretedResult", "condition": "has_tool_output"},
|
||||
{"from": "output", "to": "memorizer", "type": "data", "carries": "history"},
|
||||
|
||||
# Context edges
|
||||
{"from": "memorizer", "to": "director", "type": "context",
|
||||
"method": "get_context_block"},
|
||||
{"from": "memorizer", "to": "input", "type": "context",
|
||||
"method": "get_context_block"},
|
||||
{"from": "memorizer", "to": "output", "type": "context",
|
||||
"method": "get_context_block"},
|
||||
{"from": "director", "to": "output", "type": "context",
|
||||
"method": "get_context_line"},
|
||||
{"from": "sensor", "to": "director", "type": "context",
|
||||
"method": "get_context_lines"},
|
||||
{"from": "ui", "to": "director", "type": "context",
|
||||
"method": "get_machine_summary"},
|
||||
|
||||
# State edges
|
||||
{"from": "sensor", "to": "runtime", "type": "state", "reads": "flags"},
|
||||
{"from": "ui", "to": "runtime", "type": "state", "reads": "current_controls"},
|
||||
]
|
||||
|
||||
CONDITIONS = {
|
||||
"reflex": "intent==social AND complexity==trivial",
|
||||
"has_tool_output": "thinker.tool_used is not empty",
|
||||
}
|
||||
|
||||
AUDIT = {}
|
||||
71
agent/graphs/v4_eras.py
Normal file
71
agent/graphs/v4_eras.py
Normal file
@ -0,0 +1,71 @@
|
||||
"""v4-eras: PA + Eras Expert with progress streaming.
|
||||
|
||||
Personal Assistant routes to the Eras expert for heating/energy DB work.
|
||||
Social/general messages handled directly by PA.
|
||||
|
||||
Frame traces:
|
||||
Reflex: F1(Input) → F2(Output)
|
||||
PA direct: F1(Input) → F2(PA) → F3(Output+UI)
|
||||
Expert: F1(Input) → F2(PA) → F3(ErasExpert) → F4(Output+UI)
|
||||
Expert+Interp: F1(Input) → F2(PA) → F3(ErasExpert) → F4(Interpreter) → F5(Output+UI)
|
||||
"""
|
||||
|
||||
NAME = "v4-eras"
|
||||
DESCRIPTION = "PA + Eras Expert: heating/energy database with progress streaming"
|
||||
ENGINE = "frames"
|
||||
|
||||
NODES = {
|
||||
"input": "input_v1",
|
||||
"pa": "pa_v1",
|
||||
"expert_eras": "eras_expert",
|
||||
"interpreter": "interpreter_v1",
|
||||
"output": "output_v1",
|
||||
"ui": "ui",
|
||||
"memorizer": "memorizer_v1",
|
||||
"sensor": "sensor",
|
||||
}
|
||||
|
||||
EDGES = [
|
||||
# Data edges
|
||||
{"from": "input", "to": "pa", "type": "data", "carries": "Command"},
|
||||
{"from": "input", "to": "output", "type": "data", "carries": "Command",
|
||||
"condition": "reflex"},
|
||||
{"from": "pa", "to": "expert_eras", "type": "data", "carries": "PARouting",
|
||||
"condition": "expert_is_eras"},
|
||||
{"from": "pa", "to": "output", "type": "data", "carries": "PARouting",
|
||||
"condition": "expert_is_none"},
|
||||
{"from": "expert_eras", "to": ["output", "ui"], "type": "data",
|
||||
"carries": "ThoughtResult", "parallel": True},
|
||||
{"from": "expert_eras", "to": "interpreter", "type": "data",
|
||||
"carries": "tool_output", "condition": "has_tool_output"},
|
||||
{"from": "interpreter", "to": "output", "type": "data",
|
||||
"carries": "InterpretedResult", "condition": "has_tool_output"},
|
||||
{"from": "output", "to": "memorizer", "type": "data", "carries": "history"},
|
||||
|
||||
# Context edges — PA gets all context (experts are stateless)
|
||||
{"from": "memorizer", "to": "pa", "type": "context",
|
||||
"method": "get_context_block"},
|
||||
{"from": "memorizer", "to": "input", "type": "context",
|
||||
"method": "get_context_block"},
|
||||
{"from": "memorizer", "to": "output", "type": "context",
|
||||
"method": "get_context_block"},
|
||||
{"from": "pa", "to": "output", "type": "context",
|
||||
"method": "get_context_line"},
|
||||
{"from": "sensor", "to": "pa", "type": "context",
|
||||
"method": "get_context_lines"},
|
||||
{"from": "ui", "to": "pa", "type": "context",
|
||||
"method": "get_machine_summary"},
|
||||
|
||||
# State edges
|
||||
{"from": "sensor", "to": "runtime", "type": "state", "reads": "flags"},
|
||||
{"from": "ui", "to": "runtime", "type": "state", "reads": "current_controls"},
|
||||
]
|
||||
|
||||
CONDITIONS = {
|
||||
"reflex": "intent==social AND complexity==trivial",
|
||||
"expert_is_eras": "pa.expert == eras",
|
||||
"expert_is_none": "pa.expert == none",
|
||||
"has_tool_output": "expert.tool_used is not empty",
|
||||
}
|
||||
|
||||
AUDIT = {}
|
||||
@ -16,6 +16,10 @@ from .director_v2 import DirectorV2Node
|
||||
from .thinker_v2 import ThinkerV2Node
|
||||
from .interpreter_v1 import InterpreterNode
|
||||
|
||||
# v4 — PA + Expert nodes
|
||||
from .pa_v1 import PANode
|
||||
from .eras_expert import ErasExpertNode
|
||||
|
||||
# Default aliases (used by runtime.py until engine.py takes over)
|
||||
InputNode = InputNodeV1
|
||||
ThinkerNode = ThinkerNodeV1
|
||||
@ -35,6 +39,8 @@ NODE_REGISTRY = {
|
||||
"director_v2": DirectorV2Node,
|
||||
"thinker_v2": ThinkerV2Node,
|
||||
"interpreter_v1": InterpreterNode,
|
||||
"pa_v1": PANode,
|
||||
"eras_expert": ErasExpertNode,
|
||||
}
|
||||
|
||||
__all__ = [
|
||||
|
||||
@ -27,7 +27,10 @@ The Thinker has these tools:
|
||||
- set_state(key, value) — persistent key-value store
|
||||
- emit_display(items) — per-response formatted data [{type, label, value?, style?}]
|
||||
- create_machine(id, initial, states) — persistent interactive UI with navigation
|
||||
- add_state / reset_machine / destroy_machine — machine lifecycle
|
||||
states is a dict: {{"state_name": {{"actions": [{{"label":"Go","action":"go","payload":"target","go":"target"}}], "display": [{{"type":"text","label":"Title","value":"Content"}}]}}}}
|
||||
- add_state(id, state, buttons, content) — add a state to existing machine. id=machine name, state=new state name, buttons=[{{label,action,go}}], content=["text"]
|
||||
- reset_machine(id) — reset machine to initial state. id=machine name
|
||||
- destroy_machine(id) — remove machine. id=machine name
|
||||
|
||||
Your output is a JSON plan:
|
||||
{{
|
||||
|
||||
67
agent/nodes/eras_expert.py
Normal file
67
agent/nodes/eras_expert.py
Normal file
@ -0,0 +1,67 @@
|
||||
"""Eras Expert: heating/energy customer database specialist."""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
|
||||
from .expert_base import ExpertNode
|
||||
from ..db import run_db_query
|
||||
|
||||
log = logging.getLogger("runtime")
|
||||
|
||||
|
||||
class ErasExpertNode(ExpertNode):
|
||||
name = "eras_expert"
|
||||
default_database = "eras2_production"
|
||||
|
||||
DOMAIN_SYSTEM = """You are the Eras expert — specialist for heating and energy customer data.
|
||||
You work with the eras2_production database containing customer, device, and billing data.
|
||||
All table and column names are German (lowercase). Common queries involve customer lookups,
|
||||
device counts, consumption analysis, and billing reports."""
|
||||
|
||||
SCHEMA = """Known tables (eras2_production):
|
||||
- kunden — customers
|
||||
- objekte — properties/objects linked to customers
|
||||
- nutzeinheit — usage units within objects
|
||||
- geraete — devices/meters
|
||||
- geraeteverbraeuche — device consumption readings
|
||||
- abrechnungen — billing records
|
||||
|
||||
CRITICAL: You do NOT know the exact column names. They are German and unpredictable.
|
||||
Your FIRST tool_sequence step for ANY SELECT query MUST be DESCRIBE on the target table.
|
||||
Then use the actual column names from the DESCRIBE result in your SELECT.
|
||||
|
||||
Example tool_sequence for "show me 5 customers":
|
||||
[
|
||||
{{"tool": "query_db", "args": {{"query": "DESCRIBE kunden", "database": "eras2_production"}}}},
|
||||
{{"tool": "query_db", "args": {{"query": "SELECT * FROM kunden LIMIT 5", "database": "eras2_production"}}}}
|
||||
]"""
|
||||
|
||||
def __init__(self, send_hud, process_manager=None):
|
||||
super().__init__(send_hud, process_manager)
|
||||
self._schema_cache: dict[str, str] = {} # table_name -> DESCRIBE result
|
||||
|
||||
async def execute(self, job: str, language: str = "de"):
|
||||
"""Execute with schema auto-discovery. Caches DESCRIBE results."""
|
||||
# Inject cached schema into the job context
|
||||
if self._schema_cache:
|
||||
schema_ctx = "Known column names from previous DESCRIBE:\n"
|
||||
for table, desc in self._schema_cache.items():
|
||||
# Just first 5 lines to keep it compact
|
||||
lines = desc.strip().split("\n")[:6]
|
||||
schema_ctx += f"\n{table}:\n" + "\n".join(lines) + "\n"
|
||||
job = job + "\n\n" + schema_ctx
|
||||
|
||||
result = await super().execute(job, language)
|
||||
|
||||
# Cache any DESCRIBE results from this execution
|
||||
# Parse from tool_output if it looks like a DESCRIBE result
|
||||
if result.tool_output and "Field\t" in result.tool_output:
|
||||
# Try to identify which table was described
|
||||
for table in ["kunden", "objekte", "nutzeinheit", "geraete",
|
||||
"geraeteverbraeuche", "abrechnungen"]:
|
||||
if table in job.lower() or table in result.tool_output.lower():
|
||||
self._schema_cache[table] = result.tool_output
|
||||
log.info(f"[eras] cached schema for {table}")
|
||||
break
|
||||
|
||||
return result
|
||||
176
agent/nodes/expert_base.py
Normal file
176
agent/nodes/expert_base.py
Normal file
@ -0,0 +1,176 @@
|
||||
"""Expert Base Node: domain-specific stateless executor.
|
||||
|
||||
An expert receives a self-contained job from the PA, plans its own tool sequence,
|
||||
executes tools, and returns a ThoughtResult. No history, no memory — pure function.
|
||||
|
||||
Subclasses override DOMAIN_SYSTEM, SCHEMA, and default_database.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
|
||||
from .base import Node
|
||||
from ..llm import llm_call
|
||||
from ..db import run_db_query
|
||||
from ..types import ThoughtResult
|
||||
|
||||
log = logging.getLogger("runtime")
|
||||
|
||||
|
||||
class ExpertNode(Node):
|
||||
"""Base class for domain experts. Subclass and set DOMAIN_SYSTEM, SCHEMA, default_database."""
|
||||
|
||||
model = "google/gemini-2.0-flash-001"
|
||||
max_context_tokens = 4000
|
||||
|
||||
# Override in subclasses
|
||||
DOMAIN_SYSTEM = "You are a domain expert."
|
||||
SCHEMA = ""
|
||||
default_database = "eras2_production"
|
||||
|
||||
PLAN_SYSTEM = """You are a domain expert's planning module.
|
||||
Given a job description, produce a JSON tool sequence to accomplish it.
|
||||
|
||||
{domain}
|
||||
|
||||
{schema}
|
||||
|
||||
Available tools:
|
||||
- query_db(query, database) — SQL SELECT/DESCRIBE/SHOW only
|
||||
- emit_actions(actions) — show buttons [{{label, action, payload?}}]
|
||||
- set_state(key, value) — persistent key-value
|
||||
- emit_display(items) — formatted data [{{type, label, value?, style?}}]
|
||||
- create_machine(id, initial, states) — interactive UI with navigation
|
||||
states: {{"state_name": {{"actions": [...], "display": [...]}}}}
|
||||
- add_state(id, state, buttons, content) — add state to machine
|
||||
- reset_machine(id) — reset to initial
|
||||
- destroy_machine(id) — remove machine
|
||||
|
||||
Output ONLY valid JSON:
|
||||
{{
|
||||
"tool_sequence": [
|
||||
{{"tool": "query_db", "args": {{"query": "SELECT ...", "database": "{database}"}}}},
|
||||
{{"tool": "emit_actions", "args": {{"actions": [{{"label": "...", "action": "..."}}]}}}}
|
||||
],
|
||||
"response_hint": "How to phrase the result for the user"
|
||||
}}
|
||||
|
||||
Rules:
|
||||
- NEVER guess column names. If unsure, DESCRIBE first.
|
||||
- Max 5 tools. Keep it focused.
|
||||
- The job is self-contained — all context you need is in the job description."""
|
||||
|
||||
RESPONSE_SYSTEM = """You are a domain expert summarizing results for the user.
|
||||
|
||||
{domain}
|
||||
|
||||
Job: {job}
|
||||
{results}
|
||||
|
||||
Write a concise, natural response. 1-3 sentences.
|
||||
- Reference specific data from the results.
|
||||
- Don't repeat raw output — summarize.
|
||||
- Match the language: {language}."""
|
||||
|
||||
def __init__(self, send_hud, process_manager=None):
|
||||
super().__init__(send_hud)
|
||||
|
||||
async def execute(self, job: str, language: str = "de") -> ThoughtResult:
|
||||
"""Execute a self-contained job. Returns ThoughtResult."""
|
||||
await self.hud("thinking", detail=f"planning: {job[:80]}")
|
||||
|
||||
# Step 1: Plan tool sequence
|
||||
plan_messages = [
|
||||
{"role": "system", "content": self.PLAN_SYSTEM.format(
|
||||
domain=self.DOMAIN_SYSTEM, schema=self.SCHEMA,
|
||||
database=self.default_database)},
|
||||
{"role": "user", "content": f"Job: {job}"},
|
||||
]
|
||||
plan_raw = await llm_call(self.model, plan_messages)
|
||||
tool_sequence, response_hint = self._parse_plan(plan_raw)
|
||||
|
||||
await self.hud("planned", tools=len(tool_sequence), hint=response_hint[:80])
|
||||
|
||||
# Step 2: Execute tools
|
||||
actions = []
|
||||
state_updates = {}
|
||||
display_items = []
|
||||
machine_ops = []
|
||||
tool_used = ""
|
||||
tool_output = ""
|
||||
|
||||
for step in tool_sequence:
|
||||
tool = step.get("tool", "")
|
||||
args = step.get("args", {})
|
||||
await self.hud("tool_call", tool=tool, args=args)
|
||||
|
||||
if tool == "emit_actions":
|
||||
actions.extend(args.get("actions", []))
|
||||
elif tool == "set_state":
|
||||
key = args.get("key", "")
|
||||
if key:
|
||||
state_updates[key] = args.get("value")
|
||||
elif tool == "emit_display":
|
||||
display_items.extend(args.get("items", []))
|
||||
elif tool == "create_machine":
|
||||
machine_ops.append({"op": "create", **args})
|
||||
elif tool == "add_state":
|
||||
machine_ops.append({"op": "add_state", **args})
|
||||
elif tool == "reset_machine":
|
||||
machine_ops.append({"op": "reset", **args})
|
||||
elif tool == "destroy_machine":
|
||||
machine_ops.append({"op": "destroy", **args})
|
||||
elif tool == "query_db":
|
||||
query = args.get("query", "")
|
||||
database = args.get("database", self.default_database)
|
||||
try:
|
||||
result = await asyncio.to_thread(run_db_query, query, database)
|
||||
tool_used = "query_db"
|
||||
tool_output = result
|
||||
await self.hud("tool_result", tool="query_db", output=result[:200])
|
||||
except Exception as e:
|
||||
tool_used = "query_db"
|
||||
tool_output = f"Error: {e}"
|
||||
await self.hud("tool_result", tool="query_db", output=str(e)[:200])
|
||||
|
||||
# Step 3: Generate response
|
||||
results_text = ""
|
||||
if tool_output:
|
||||
results_text = f"Tool result:\n{tool_output[:500]}"
|
||||
|
||||
resp_messages = [
|
||||
{"role": "system", "content": self.RESPONSE_SYSTEM.format(
|
||||
domain=self.DOMAIN_SYSTEM, job=job, results=results_text, language=language)},
|
||||
{"role": "user", "content": job},
|
||||
]
|
||||
response = await llm_call(self.model, resp_messages)
|
||||
if not response:
|
||||
response = "[no response]"
|
||||
|
||||
await self.hud("done", response=response[:100])
|
||||
|
||||
return ThoughtResult(
|
||||
response=response,
|
||||
tool_used=tool_used,
|
||||
tool_output=tool_output,
|
||||
actions=actions,
|
||||
state_updates=state_updates,
|
||||
display_items=display_items,
|
||||
machine_ops=machine_ops,
|
||||
)
|
||||
|
||||
def _parse_plan(self, raw: str) -> tuple[list, str]:
|
||||
"""Parse tool sequence JSON from planning LLM call."""
|
||||
text = raw.strip()
|
||||
if text.startswith("```"):
|
||||
text = text.split("\n", 1)[1] if "\n" in text else text[3:]
|
||||
if text.endswith("```"):
|
||||
text = text[:-3]
|
||||
text = text.strip()
|
||||
try:
|
||||
data = json.loads(text)
|
||||
return data.get("tool_sequence", []), data.get("response_hint", "")
|
||||
except (json.JSONDecodeError, Exception) as e:
|
||||
log.error(f"[expert] plan parse failed: {e}, raw: {text[:200]}")
|
||||
return [], ""
|
||||
@ -15,12 +15,11 @@ class InputNode(Node):
|
||||
model = "google/gemini-2.0-flash-001"
|
||||
max_context_tokens = 2000
|
||||
|
||||
SYSTEM = """You are the Input node — the analyst of this cognitive runtime.
|
||||
SYSTEM = """You are the Input node — classify ONLY the current message.
|
||||
|
||||
Listener: {identity} on {channel}
|
||||
|
||||
YOUR ONLY JOB: Analyze the user's message and return a JSON classification.
|
||||
Output ONLY valid JSON, nothing else. No markdown fences, no explanation.
|
||||
Return ONLY valid JSON. No markdown, no explanation.
|
||||
|
||||
Schema:
|
||||
{{
|
||||
@ -30,22 +29,35 @@ Schema:
|
||||
"topic": "short topic string",
|
||||
"tone": "casual | frustrated | playful | urgent",
|
||||
"complexity": "trivial | simple | complex",
|
||||
"context": "brief situational note or empty string"
|
||||
"context": "brief note or empty"
|
||||
}}
|
||||
|
||||
Classification guide:
|
||||
- intent "social": greetings, thanks, goodbye, acknowledgments (hi, ok, thanks, bye, cool)
|
||||
- intent "question": asking for information (what, how, when, why, who)
|
||||
- intent "request": asking to do/create/build something
|
||||
- intent "action": clicking a button or triggering a UI action
|
||||
- intent "feedback": commenting on results, correcting, expressing satisfaction/dissatisfaction
|
||||
- complexity "trivial": one-word or very short social messages that need no reasoning
|
||||
- complexity "simple": clear single-step requests or questions
|
||||
- complexity "complex": multi-step, ambiguous, or requires deep reasoning
|
||||
- tone "frustrated": complaints, anger, exasperation
|
||||
- tone "urgent": time pressure, critical issues
|
||||
- tone "playful": jokes, teasing, lighthearted
|
||||
- tone "casual": neutral everyday conversation
|
||||
Rules:
|
||||
- Classify the CURRENT message only. Previous messages are context, not the target.
|
||||
- language: detect from the CURRENT message text, not the conversation language.
|
||||
"Wie spaet ist es?" = de. "hello" = en. "Hallo, how are you" = mixed.
|
||||
- intent: what does THIS message ask for?
|
||||
social = greetings, thanks, goodbye, ok, bye, cool
|
||||
question = asking for info (what, how, when, why, wieviel, was, wie)
|
||||
request = asking to create/build/do something
|
||||
action = clicking a button or UI trigger
|
||||
feedback = commenting on results, correcting, satisfaction/dissatisfaction
|
||||
- complexity: how much reasoning does THIS message need?
|
||||
trivial = one-word social (hi, ok, thanks, bye)
|
||||
simple = clear single-step
|
||||
complex = multi-step, ambiguous, deep reasoning
|
||||
- tone: emotional register of THIS message
|
||||
frustrated = complaints, anger, "broken", "nothing works", "sick of"
|
||||
urgent = time pressure, critical
|
||||
playful = jokes, teasing
|
||||
casual = neutral
|
||||
|
||||
Examples:
|
||||
"hi there!" -> {{"language":"en","intent":"social","tone":"casual","complexity":"trivial"}}
|
||||
"Wie spaet ist es?" -> {{"language":"de","intent":"question","tone":"casual","complexity":"simple"}}
|
||||
"this is broken, nothing works" -> {{"language":"en","intent":"feedback","tone":"frustrated","complexity":"simple"}}
|
||||
"create two buttons" -> {{"language":"en","intent":"request","tone":"casual","complexity":"simple"}}
|
||||
"ok thanks bye" -> {{"language":"en","intent":"social","tone":"casual","complexity":"trivial"}}
|
||||
|
||||
{memory_context}"""
|
||||
|
||||
@ -54,13 +66,25 @@ Classification guide:
|
||||
await self.hud("thinking", detail="analyzing input")
|
||||
log.info(f"[input] user said: {envelope.text}")
|
||||
|
||||
# Build context summary from recent history (not raw chat messages)
|
||||
history_summary = ""
|
||||
recent = history[-8:]
|
||||
if recent:
|
||||
lines = []
|
||||
for msg in recent:
|
||||
role = msg.get("role", "?")
|
||||
content = msg.get("content", "")[:80]
|
||||
lines.append(f" {role}: {content}")
|
||||
history_summary = "Recent conversation:\n" + "\n".join(lines)
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": self.SYSTEM.format(
|
||||
memory_context=memory_context, identity=identity, channel=channel)},
|
||||
]
|
||||
for msg in history[-8:]:
|
||||
messages.append(msg)
|
||||
messages.append({"role": "user", "content": f"Classify this message: {envelope.text}"})
|
||||
if history_summary:
|
||||
messages.append({"role": "user", "content": history_summary})
|
||||
messages.append({"role": "assistant", "content": "OK, I have the context. Send the message to classify."})
|
||||
messages.append({"role": "user", "content": f"Classify: {envelope.text}"})
|
||||
messages = self.trim_context(messages)
|
||||
|
||||
await self.hud("context", messages=messages, tokens=self.last_context_tokens,
|
||||
|
||||
153
agent/nodes/pa_v1.py
Normal file
153
agent/nodes/pa_v1.py
Normal file
@ -0,0 +1,153 @@
|
||||
"""Personal Assistant Node: routes to domain experts, holds user context."""
|
||||
|
||||
import json
|
||||
import logging
|
||||
|
||||
from .base import Node
|
||||
from ..llm import llm_call
|
||||
from ..types import Command, PARouting
|
||||
|
||||
log = logging.getLogger("runtime")
|
||||
|
||||
|
||||
class PANode(Node):
|
||||
name = "pa_v1"
|
||||
model = "anthropic/claude-haiku-4.5"
|
||||
max_context_tokens = 4000
|
||||
|
||||
SYSTEM = """You are the Personal Assistant (PA) — the user's companion in this cognitive runtime.
|
||||
You manage the conversation and route domain-specific work to the right expert.
|
||||
|
||||
Listener: {identity} on {channel}
|
||||
|
||||
Available experts:
|
||||
{experts}
|
||||
|
||||
YOUR JOB:
|
||||
1. Understand what the user wants
|
||||
2. If it's a domain task: route to the right expert with a clear, self-contained job description
|
||||
3. If it's social/general: respond directly (no expert needed)
|
||||
|
||||
Output ONLY valid JSON:
|
||||
{{
|
||||
"expert": "eras | plankiste | none",
|
||||
"job": "Self-contained task description for the expert. Include all context the expert needs — it has NO conversation history.",
|
||||
"thinking_message": "Short message shown to user while expert works (in user's language). e.g. 'Moment, ich schaue in der Datenbank nach...'",
|
||||
"response_hint": "If expert=none, your direct response to the user.",
|
||||
"language": "de | en | mixed"
|
||||
}}
|
||||
|
||||
Rules:
|
||||
- The expert has NO history. The job must be fully self-contained.
|
||||
- Include relevant facts from memory in the job (e.g. "customer Kathrin Jager, ID 2").
|
||||
- thinking_message should be natural and in the user's language.
|
||||
- For greetings, thanks, general chat: expert=none, write response_hint directly.
|
||||
- For DB queries, reports, data analysis: route to the domain expert.
|
||||
- When unsure which expert: expert=none, ask the user to clarify.
|
||||
|
||||
{memory_context}"""
|
||||
|
||||
EXPERT_DESCRIPTIONS = {
|
||||
"eras": "eras — heating/energy customer database (eras2_production). Customers, devices, billing, consumption data.",
|
||||
"plankiste": "plankiste — Kita planning database (plankiste_test). Children, care schedules, offers, pricing.",
|
||||
}
|
||||
|
||||
def __init__(self, send_hud):
|
||||
super().__init__(send_hud)
|
||||
self.directive: dict = {"mode": "assistant", "style": "helpful and concise"}
|
||||
self._available_experts: list[str] = []
|
||||
|
||||
def set_available_experts(self, experts: list[str]):
|
||||
"""Called by frame engine to tell PA which experts are in this graph."""
|
||||
self._available_experts = experts
|
||||
|
||||
def get_context_line(self) -> str:
|
||||
d = self.directive
|
||||
return f"PA: {d['mode']} mode. {d['style']}."
|
||||
|
||||
async def route(self, command: Command, history: list[dict],
|
||||
memory_context: str = "", identity: str = "unknown",
|
||||
channel: str = "unknown") -> PARouting:
|
||||
"""Decide which expert handles this request."""
|
||||
await self.hud("thinking", detail="routing request")
|
||||
|
||||
# Build expert list for prompt
|
||||
expert_lines = []
|
||||
for name in self._available_experts:
|
||||
desc = self.EXPERT_DESCRIPTIONS.get(name, f"{name} — domain expert")
|
||||
expert_lines.append(f"- {desc}")
|
||||
if not expert_lines:
|
||||
expert_lines.append("- (no experts available — handle everything directly)")
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": self.SYSTEM.format(
|
||||
memory_context=memory_context, identity=identity, channel=channel,
|
||||
experts="\n".join(expert_lines))},
|
||||
]
|
||||
|
||||
# Summarize recent history (PA sees full context)
|
||||
recent = history[-12:]
|
||||
if recent:
|
||||
lines = []
|
||||
for msg in recent:
|
||||
role = msg.get("role", "?")
|
||||
content = msg.get("content", "")[:100]
|
||||
lines.append(f" {role}: {content}")
|
||||
messages.append({"role": "user", "content": "Recent conversation:\n" + "\n".join(lines)})
|
||||
messages.append({"role": "assistant", "content": "OK, I have the context."})
|
||||
|
||||
a = command.analysis
|
||||
messages.append({"role": "user",
|
||||
"content": f"Route this message (intent={a.intent}, lang={a.language}, tone={a.tone}):\n{command.source_text}"})
|
||||
messages = self.trim_context(messages)
|
||||
|
||||
await self.hud("context", messages=messages, tokens=self.last_context_tokens,
|
||||
max_tokens=self.max_context_tokens, fill_pct=self.context_fill_pct)
|
||||
|
||||
raw = await llm_call(self.model, messages)
|
||||
log.info(f"[pa] raw: {raw[:300]}")
|
||||
|
||||
routing = self._parse_routing(raw, command)
|
||||
await self.hud("routed", expert=routing.expert, job=routing.job[:100],
|
||||
direct=routing.expert == "none")
|
||||
|
||||
# Update directive style based on tone
|
||||
if command.analysis.tone == "frustrated":
|
||||
self.directive["style"] = "patient and empathetic"
|
||||
elif command.analysis.tone == "playful":
|
||||
self.directive["style"] = "light and fun"
|
||||
else:
|
||||
self.directive["style"] = "helpful and concise"
|
||||
|
||||
return routing
|
||||
|
||||
def _parse_routing(self, raw: str, command: Command) -> PARouting:
|
||||
"""Parse LLM JSON into PARouting with fallback."""
|
||||
text = raw.strip()
|
||||
if text.startswith("```"):
|
||||
text = text.split("\n", 1)[1] if "\n" in text else text[3:]
|
||||
if text.endswith("```"):
|
||||
text = text[:-3]
|
||||
text = text.strip()
|
||||
|
||||
try:
|
||||
data = json.loads(text)
|
||||
expert = data.get("expert", "none")
|
||||
# Validate expert is available
|
||||
if expert != "none" and expert not in self._available_experts:
|
||||
log.warning(f"[pa] expert '{expert}' not available, falling back to none")
|
||||
expert = "none"
|
||||
return PARouting(
|
||||
expert=expert,
|
||||
job=data.get("job", ""),
|
||||
thinking_message=data.get("thinking_message", ""),
|
||||
response_hint=data.get("response_hint", ""),
|
||||
language=data.get("language", command.analysis.language),
|
||||
)
|
||||
except (json.JSONDecodeError, Exception) as e:
|
||||
log.error(f"[pa] parse failed: {e}, raw: {text[:200]}")
|
||||
return PARouting(
|
||||
expert="none",
|
||||
response_hint=command.source_text,
|
||||
language=command.analysis.language,
|
||||
)
|
||||
@ -6,6 +6,7 @@ import logging
|
||||
|
||||
from .base import Node
|
||||
from ..llm import llm_call
|
||||
from ..db import run_db_query
|
||||
from ..process import ProcessManager
|
||||
from ..types import Command, DirectorPlan, ThoughtResult
|
||||
|
||||
@ -30,39 +31,10 @@ Rules:
|
||||
- Keep it short: 1-3 sentences for simple responses.
|
||||
- For data: reference the numbers, don't repeat raw output."""
|
||||
|
||||
DB_HOST = "mariadb-eras"
|
||||
DB_USER = "root"
|
||||
DB_PASS = "root"
|
||||
|
||||
def __init__(self, send_hud, process_manager: ProcessManager = None):
|
||||
super().__init__(send_hud)
|
||||
self.pm = process_manager
|
||||
|
||||
def _run_db_query(self, query: str, database: str = "eras2_production") -> str:
|
||||
"""Execute SQL query against MariaDB."""
|
||||
import pymysql
|
||||
trimmed = query.strip().upper()
|
||||
if not (trimmed.startswith("SELECT") or trimmed.startswith("DESCRIBE") or trimmed.startswith("SHOW")):
|
||||
return "Error: Only SELECT/DESCRIBE/SHOW queries allowed"
|
||||
if database not in ("eras2_production", "plankiste_test"):
|
||||
return f"Error: Unknown database '{database}'"
|
||||
conn = pymysql.connect(host=self.DB_HOST, user=self.DB_USER,
|
||||
password=self.DB_PASS, database=database,
|
||||
connect_timeout=5, read_timeout=15)
|
||||
try:
|
||||
with conn.cursor() as cur:
|
||||
cur.execute(query)
|
||||
rows = cur.fetchall()
|
||||
if not rows:
|
||||
return "(no results)"
|
||||
cols = [d[0] for d in cur.description]
|
||||
lines = ["\t".join(cols)]
|
||||
for row in rows:
|
||||
lines.append("\t".join(str(v) if v is not None else "" for v in row))
|
||||
return "\n".join(lines)
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
async def process(self, command: Command, plan: DirectorPlan,
|
||||
history: list[dict], memory_context: str = "") -> ThoughtResult:
|
||||
"""Execute Director's plan and produce ThoughtResult."""
|
||||
@ -101,7 +73,7 @@ Rules:
|
||||
query = args.get("query", "")
|
||||
database = args.get("database", "eras2_production")
|
||||
try:
|
||||
result = await asyncio.to_thread(self._run_db_query, query, database)
|
||||
result = await asyncio.to_thread(run_db_query, query, database)
|
||||
tool_used = "query_db"
|
||||
tool_output = result
|
||||
await self.hud("tool_result", tool="query_db", output=result[:200])
|
||||
|
||||
@ -15,11 +15,22 @@ class UINode(Node):
|
||||
|
||||
def __init__(self, send_hud):
|
||||
super().__init__(send_hud)
|
||||
self.current_controls: list[dict] = []
|
||||
self.thinker_controls: list[dict] = [] # buttons, labels, tables from Thinker
|
||||
self.state: dict = {} # {"count": 0, "theme": "dark", ...}
|
||||
self.bindings: dict = {} # {"increment": {"op": "inc", "var": "count"}, ...}
|
||||
self.machines: dict = {} # {"nav": {initial, states, current}, ...}
|
||||
|
||||
@property
|
||||
def current_controls(self) -> list[dict]:
|
||||
"""Merged view: thinker controls + machine controls."""
|
||||
return self.thinker_controls + self.get_machine_controls()
|
||||
|
||||
@current_controls.setter
|
||||
def current_controls(self, value: list[dict]):
|
||||
"""When set directly (e.g. after machine transition), split into layers."""
|
||||
# Machine controls have machine_id — keep those in machines, rest in thinker
|
||||
self.thinker_controls = [c for c in value if not c.get("machine_id")]
|
||||
|
||||
# --- Machine operations ---
|
||||
|
||||
async def apply_machine_ops(self, ops: list[dict]) -> None:
|
||||
@ -30,18 +41,40 @@ class UINode(Node):
|
||||
|
||||
if op == "create":
|
||||
initial = op_data.get("initial", "")
|
||||
# Parse states from array format [{name, buttons, content}]
|
||||
states_list = op_data.get("states", [])
|
||||
# Parse states — handles both dict and list formats from Director
|
||||
raw_states = op_data.get("states", {})
|
||||
states = {}
|
||||
for s in states_list:
|
||||
if isinstance(raw_states, dict):
|
||||
# Dict format: {main: {actions: [...], display: [...], content: [...]}}
|
||||
for name, sdef in raw_states.items():
|
||||
if not isinstance(sdef, dict):
|
||||
states[name] = {"buttons": [], "content": []}
|
||||
continue
|
||||
buttons = sdef.get("buttons", []) or sdef.get("actions", [])
|
||||
content = sdef.get("content", []) or sdef.get("display", [])
|
||||
# Normalize display items to strings
|
||||
if content and isinstance(content[0], dict):
|
||||
content = [c.get("value", c.get("label", "")) for c in content]
|
||||
# Normalize button format: ensure "go" field for navigation
|
||||
for btn in buttons:
|
||||
if isinstance(btn, dict) and not btn.get("go") and btn.get("payload"):
|
||||
btn["go"] = btn["payload"]
|
||||
states[name] = {"buttons": buttons, "content": content}
|
||||
elif isinstance(raw_states, list):
|
||||
# List format: [{name, buttons/actions, content/display}]
|
||||
for s in raw_states:
|
||||
if isinstance(s, str):
|
||||
s = {"name": s}
|
||||
name = s.get("name", "")
|
||||
if name:
|
||||
states[name] = {
|
||||
"buttons": s.get("buttons", []),
|
||||
"content": s.get("content", []),
|
||||
}
|
||||
buttons = s.get("buttons", []) or s.get("actions", [])
|
||||
content = s.get("content", []) or s.get("display", [])
|
||||
if content and isinstance(content[0], dict):
|
||||
content = [c.get("value", c.get("label", "")) for c in content]
|
||||
for btn in buttons:
|
||||
if isinstance(btn, dict) and not btn.get("go") and btn.get("payload"):
|
||||
btn["go"] = btn["payload"]
|
||||
states[name] = {"buttons": buttons, "content": content}
|
||||
self.machines[mid] = {
|
||||
"initial": initial,
|
||||
"current": initial,
|
||||
@ -98,6 +131,7 @@ class UINode(Node):
|
||||
"""Render all machines' current states as controls."""
|
||||
controls = []
|
||||
for mid, machine in self.machines.items():
|
||||
log.info(f"[ui] machine_controls: {mid} current={machine['current']} states={list(machine['states'].keys())} buttons={[b.get('label','?') for b in machine['states'].get(machine['current'],{}).get('buttons',[])]}")
|
||||
current = machine["current"]
|
||||
state_def = machine["states"].get(current, {})
|
||||
|
||||
@ -246,6 +280,7 @@ class UINode(Node):
|
||||
# --- Render controls ---
|
||||
|
||||
def _build_controls(self, thought: ThoughtResult) -> list[dict]:
|
||||
"""Build thinker controls only. Machine controls are added by the property."""
|
||||
controls = []
|
||||
|
||||
# 1. Apply state_updates from Thinker's set_state() calls
|
||||
@ -253,13 +288,13 @@ class UINode(Node):
|
||||
for key, value in thought.state_updates.items():
|
||||
self.set_var(key, value)
|
||||
|
||||
# 2. Parse actions from Thinker (registers bindings) OR preserve existing buttons
|
||||
# 2. Parse actions from Thinker (registers bindings) OR preserve existing thinker buttons
|
||||
if thought.actions:
|
||||
controls.extend(self._parse_thinker_actions(thought.actions))
|
||||
else:
|
||||
# Retain existing buttons when Thinker doesn't emit new ones
|
||||
for ctrl in self.current_controls:
|
||||
if ctrl["type"] == "button":
|
||||
# Retain existing thinker buttons (not machine buttons — those persist via property)
|
||||
for ctrl in self.thinker_controls:
|
||||
if ctrl.get("type") == "button":
|
||||
controls.append(ctrl)
|
||||
|
||||
# 3. Add labels for all state variables (bound + set_state)
|
||||
@ -282,14 +317,14 @@ class UINode(Node):
|
||||
"style": item.get("style", ""),
|
||||
})
|
||||
|
||||
# 3. Extract tables from tool output
|
||||
# 5. Extract tables from tool output
|
||||
if thought.tool_output:
|
||||
table = self._extract_table(thought.tool_output)
|
||||
if table:
|
||||
controls.append(table)
|
||||
|
||||
# 4. Add label for short tool results (if no table and no state vars)
|
||||
if thought.tool_used and thought.tool_output and not any(c["type"] == "table" for c in controls):
|
||||
# 6. Add label for short tool results (if no table and no state vars)
|
||||
if thought.tool_used and thought.tool_output and not any(c.get("type") == "table" for c in controls):
|
||||
output = thought.tool_output.strip()
|
||||
if "\n" not in output and len(output) < 100 and not self.state:
|
||||
controls.append({
|
||||
@ -299,8 +334,7 @@ class UINode(Node):
|
||||
"value": output,
|
||||
})
|
||||
|
||||
# 5. Add machine controls
|
||||
controls.extend(self.get_machine_controls())
|
||||
# Machine controls are NOT added here — the current_controls property merges them
|
||||
|
||||
return controls
|
||||
|
||||
@ -310,18 +344,19 @@ class UINode(Node):
|
||||
if thought.machine_ops:
|
||||
await self.apply_machine_ops(thought.machine_ops)
|
||||
|
||||
controls = self._build_controls(thought)
|
||||
thinker_ctrls = self._build_controls(thought)
|
||||
|
||||
if controls:
|
||||
self.current_controls = controls
|
||||
await self.hud("controls", controls=controls)
|
||||
log.info(f"[ui] emitting {len(controls)} controls")
|
||||
if thinker_ctrls:
|
||||
self.thinker_controls = thinker_ctrls
|
||||
# Always emit the merged view (thinker + machine)
|
||||
merged = self.current_controls
|
||||
if merged:
|
||||
await self.hud("controls", controls=merged)
|
||||
log.info(f"[ui] emitting {len(merged)} controls ({len(self.thinker_controls)} thinker + {len(self.get_machine_controls())} machine)")
|
||||
else:
|
||||
if self.current_controls:
|
||||
controls = self.current_controls
|
||||
await self.hud("decided", instruction="no new controls")
|
||||
|
||||
return controls
|
||||
return merged
|
||||
|
||||
async def process_local_action(self, action: str, payload: dict = None) -> tuple[str | None, list[dict]]:
|
||||
"""Handle a local action. Returns (result_text, updated_controls) or (None, []) if not local."""
|
||||
|
||||
@ -10,6 +10,7 @@ from typing import Callable
|
||||
from .types import Envelope, Command, InputAnalysis, ThoughtResult, DirectorPlan
|
||||
from .process import ProcessManager
|
||||
from .engine import load_graph, instantiate_nodes, list_graphs, get_graph_for_cytoscape
|
||||
from .frame_engine import FrameEngine
|
||||
|
||||
log = logging.getLogger("runtime")
|
||||
|
||||
@ -88,16 +89,17 @@ class Runtime:
|
||||
|
||||
# Bind nodes by role (pipeline code references these)
|
||||
self.input_node = nodes["input"]
|
||||
self.thinker = nodes["thinker"]
|
||||
self.thinker = nodes.get("thinker") # v1/v2/v3
|
||||
self.output_node = nodes["output"]
|
||||
self.ui_node = nodes["ui"]
|
||||
self.memorizer = nodes["memorizer"]
|
||||
self.director = nodes["director"]
|
||||
self.director = nodes.get("director") # v1/v2/v3, None in v4
|
||||
self.sensor = nodes["sensor"]
|
||||
self.interpreter = nodes.get("interpreter") # v2 only
|
||||
self.interpreter = nodes.get("interpreter") # v2+ only
|
||||
|
||||
# Detect v2 graph: director has decide(), thinker takes DirectorPlan
|
||||
self.is_v2 = hasattr(self.director, "decide")
|
||||
# Detect graph type
|
||||
self.is_v2 = self.director is not None and hasattr(self.director, "decide")
|
||||
self.use_frames = self.graph.get("engine") == "frames"
|
||||
self.sensor.start(
|
||||
get_memo_state=lambda: self.memorizer.state,
|
||||
get_server_controls=lambda: self.ui_node.current_controls,
|
||||
@ -112,6 +114,16 @@ class Runtime:
|
||||
self.memorizer.state["user_name"] = self.identity
|
||||
self.memorizer.state["situation"] = f"authenticated on {self.channel}" if origin else "local session"
|
||||
|
||||
# Frame engine (for v3+ graphs)
|
||||
if self.use_frames:
|
||||
self.frame_engine = FrameEngine(
|
||||
graph=self.graph, nodes=nodes, sink=self.sink,
|
||||
history=self.history, send_hud=self._send_hud,
|
||||
sensor=self.sensor, memorizer=self.memorizer,
|
||||
ui_node=self.ui_node, identity=self.identity,
|
||||
channel=self.channel, broadcast=self._broadcast)
|
||||
log.info(f"[runtime] using FrameEngine for graph '{gname}'")
|
||||
|
||||
def attach_ws(self, ws):
|
||||
"""Attach a WebSocket for real-time streaming."""
|
||||
self.sink.attach(ws)
|
||||
@ -240,8 +252,8 @@ class Runtime:
|
||||
"""Format dashboard controls into a context string for Thinker.
|
||||
Compares browser-reported state against server-side controls to detect mismatches."""
|
||||
server_controls = self.ui_node.current_controls
|
||||
server_buttons = [c.get("label", "") for c in server_controls if c.get("type") == "button"]
|
||||
browser_buttons = [c.get("label", "") for c in dashboard if c.get("type") == "button"] if dashboard else []
|
||||
server_buttons = [str(c.get("label", "")) for c in server_controls if isinstance(c, dict) and c.get("type") == "button"]
|
||||
browser_buttons = [str(c.get("label", "")) for c in dashboard if isinstance(c, dict) and c.get("type") == "button"] if dashboard else []
|
||||
|
||||
lines = []
|
||||
|
||||
@ -250,7 +262,7 @@ class Runtime:
|
||||
lines.append(f"WARNING: Server sent {len(server_buttons)} controls but dashboard shows NONE.")
|
||||
lines.append(f" Expected buttons: {', '.join(server_buttons)}")
|
||||
lines.append(" Controls failed to render or were lost. You MUST re-emit them in ACTIONS.")
|
||||
elif server_buttons and set(server_buttons) != set(browser_buttons):
|
||||
elif server_buttons and sorted(server_buttons) != sorted(browser_buttons):
|
||||
lines.append(f"WARNING: Dashboard mismatch.")
|
||||
lines.append(f" Server sent: {', '.join(server_buttons)}")
|
||||
lines.append(f" Browser shows: {', '.join(browser_buttons) or 'nothing'}")
|
||||
@ -275,6 +287,11 @@ class Runtime:
|
||||
return "\n".join(lines)
|
||||
|
||||
async def handle_message(self, text: str, dashboard: list = None):
|
||||
# Frame engine: delegate entirely
|
||||
if self.use_frames:
|
||||
result = await self.frame_engine.process_message(text, dashboard)
|
||||
return result
|
||||
|
||||
# Detect ACTION: prefix from API/test runner
|
||||
if text.startswith("ACTION:"):
|
||||
parts = text.split("|", 1)
|
||||
|
||||
@ -66,6 +66,16 @@ class InterpretedResult:
|
||||
confidence: str = "high" # high | medium | low
|
||||
|
||||
|
||||
@dataclass
|
||||
class PARouting:
|
||||
"""PA's routing decision — which expert handles this, what's the job."""
|
||||
expert: str = "none" # "eras" | "plankiste" | "none"
|
||||
job: str = "" # Self-contained task for the expert
|
||||
thinking_message: str = "" # Shown to user while expert works
|
||||
response_hint: str = "" # If expert="none", PA answers directly
|
||||
language: str = "de" # Response language
|
||||
|
||||
|
||||
@dataclass
|
||||
class ThoughtResult:
|
||||
"""Thinker node's output — either a direct answer or tool results."""
|
||||
|
||||
116
runtime_test.py
116
runtime_test.py
@ -99,12 +99,14 @@ def _parse_command(text: str) -> dict | None:
|
||||
return {"type": "send", "text": msg_text, "dashboard": dashboard}
|
||||
return {"type": "send", "text": val}
|
||||
|
||||
# action: action_name OR action: first matching "pattern"
|
||||
# action: action_name OR action: first matching "pattern" or "pattern2"
|
||||
if text.startswith("action:"):
|
||||
val = text[7:].strip()
|
||||
m = re.match(r'first matching "(.+)"', val)
|
||||
m = re.match(r'first matching (.+)', val)
|
||||
if m:
|
||||
return {"type": "action_match", "pattern": m.group(1)}
|
||||
# Support: first matching "+1" or "inc" or "plus"
|
||||
patterns = [p.strip().strip('"') for p in m.group(1).split(" or ")]
|
||||
return {"type": "action_match", "patterns": patterns}
|
||||
return {"type": "action", "action": val}
|
||||
|
||||
# expect_response: contains "foo"
|
||||
@ -166,8 +168,11 @@ class CogClient:
|
||||
if pd.get("id") == msg_id and pd.get("status") == "error":
|
||||
d = pd
|
||||
break
|
||||
self.last_response = d.get("response", "")
|
||||
resp = d.get("response", "")
|
||||
self.last_response = resp if isinstance(resp, str) else str(resp)
|
||||
self.last_memo = d.get("memorizer", {})
|
||||
if not isinstance(self.last_memo, dict):
|
||||
self.last_memo = {}
|
||||
time.sleep(0.5)
|
||||
self._fetch_trace()
|
||||
return d
|
||||
@ -177,15 +182,17 @@ class CogClient:
|
||||
return self.send(f"ACTION: {action}")
|
||||
|
||||
def _fetch_trace(self):
|
||||
r = self.client.get(f"{API}/trace?last=20", headers=HEADERS)
|
||||
r = self.client.get(f"{API}/trace?last=40", headers=HEADERS)
|
||||
self.last_trace = r.json().get("lines", [])
|
||||
# Extract all controls from trace (buttons, tables, labels, displays)
|
||||
for t in self.last_trace:
|
||||
# Extract controls from the most recent controls HUD event
|
||||
for t in reversed(self.last_trace):
|
||||
if t.get("event") == "controls":
|
||||
new_controls = t.get("controls", [])
|
||||
if new_controls:
|
||||
self.last_actions = new_controls
|
||||
self.last_buttons = [c for c in new_controls if c.get("type") == "button"]
|
||||
self.last_buttons = [c for c in new_controls
|
||||
if isinstance(c, dict) and c.get("type") == "button"]
|
||||
break
|
||||
|
||||
def get_state(self) -> dict:
|
||||
r = self.client.get(f"{API}/state", headers=HEADERS)
|
||||
@ -320,7 +327,24 @@ def check_trace(trace: list, check: str) -> tuple[bool, str]:
|
||||
if m:
|
||||
field, expected = m.group(1), m.group(2)
|
||||
terms = [t.strip().strip('"') for t in expected.split(" or ")]
|
||||
for t in trace:
|
||||
# Method 1: parse from LAST frame_trace event (v3 frame engine, most reliable)
|
||||
for t in reversed(trace):
|
||||
if t.get("event") == "frame_trace" and t.get("trace"):
|
||||
frames = t["trace"].get("frames", [])
|
||||
for fr in frames:
|
||||
if fr.get("node") == "input" and fr.get("output"):
|
||||
out = fr["output"]
|
||||
for part in out.split():
|
||||
if "=" in part:
|
||||
k, v = part.split("=", 1)
|
||||
if k == field:
|
||||
for term in terms:
|
||||
if v.lower() == term.lower():
|
||||
return True, f"input.analysis.{field}={v} (from frame_trace)"
|
||||
return False, f"input.analysis.{field}={v}, expected one of {terms}"
|
||||
break # only check the most recent frame_trace
|
||||
# Method 2: fallback to input node's "perceived" HUD event (v1/v2)
|
||||
for t in reversed(trace):
|
||||
if t.get("node") == "input" and t.get("event") == "perceived":
|
||||
analysis = t.get("analysis", {})
|
||||
actual = str(analysis.get(field, ""))
|
||||
@ -359,14 +383,14 @@ def check_trace(trace: list, check: str) -> tuple[bool, str]:
|
||||
return True, f"machine '{expected_id}' created"
|
||||
return False, f"no machine_created event with id='{expected_id}'"
|
||||
|
||||
# has EVENT_NAME
|
||||
m = re.match(r'has\s+(\w+)', check)
|
||||
if m:
|
||||
event_name = m.group(1)
|
||||
# has EVENT_NAME or EVENT_NAME2 ...
|
||||
m = re.match(r'has\s+([\w\s]+(?:\s+or\s+\w+)*)', check)
|
||||
if m and not re.match(r'has\s+tool_call\s+\w+', check):
|
||||
names = [n.strip() for n in re.split(r'\s+or\s+', m.group(1))]
|
||||
for t in trace:
|
||||
if t.get("event") == event_name:
|
||||
return True, f"found event '{event_name}'"
|
||||
return False, f"no '{event_name}' event in trace"
|
||||
if t.get("event") in names:
|
||||
return True, f"found event '{t.get('event')}'"
|
||||
return False, f"no '{' or '.join(names)}' event in trace"
|
||||
|
||||
# no EVENT_NAME
|
||||
m = re.match(r'no\s+(\w+)', check)
|
||||
@ -391,8 +415,9 @@ class StepResult:
|
||||
|
||||
|
||||
class CogTestRunner:
|
||||
def __init__(self):
|
||||
def __init__(self, on_result=None):
|
||||
self.client = CogClient()
|
||||
self._on_result = on_result # callback(result_dict) per check
|
||||
|
||||
def run(self, testcase: dict) -> list[dict]:
|
||||
results = []
|
||||
@ -402,6 +427,11 @@ class CogTestRunner:
|
||||
self.client.close()
|
||||
return results
|
||||
|
||||
def _add(self, results: list, result: dict):
|
||||
results.append(result)
|
||||
if self._on_result:
|
||||
self._on_result(result)
|
||||
|
||||
def _run_step(self, step: dict) -> list[dict]:
|
||||
results = []
|
||||
step_name = step["name"]
|
||||
@ -409,65 +439,71 @@ class CogTestRunner:
|
||||
for cmd in step["commands"]:
|
||||
if cmd["type"] == "clear":
|
||||
self.client.clear()
|
||||
results.append({"step": step_name, "check": "clear", "status": "PASS", "detail": "cleared"})
|
||||
self._add(results, {"step": step_name, "check": "clear", "status": "PASS", "detail": "cleared"})
|
||||
|
||||
elif cmd["type"] == "send":
|
||||
try:
|
||||
self.client.send(cmd["text"], dashboard=cmd.get("dashboard"))
|
||||
results.append({"step": step_name, "check": f"send: {cmd['text'][:40]}", "status": "PASS",
|
||||
self._add(results, {"step": step_name, "check": f"send: {cmd['text'][:40]}", "status": "PASS",
|
||||
"detail": f"response: {self.client.last_response[:80]}"})
|
||||
except Exception as e:
|
||||
results.append({"step": step_name, "check": f"send: {cmd['text'][:40]}", "status": "FAIL",
|
||||
self._add(results, {"step": step_name, "check": f"send: {cmd['text'][:40]}", "status": "FAIL",
|
||||
"detail": str(e)})
|
||||
|
||||
elif cmd["type"] == "action":
|
||||
try:
|
||||
self.client.send_action(cmd["action"])
|
||||
results.append({"step": step_name, "check": f"action: {cmd['action']}", "status": "PASS",
|
||||
self._add(results, {"step": step_name, "check": f"action: {cmd['action']}", "status": "PASS",
|
||||
"detail": f"response: {self.client.last_response[:80]}"})
|
||||
except Exception as e:
|
||||
results.append({"step": step_name, "check": f"action: {cmd['action']}", "status": "FAIL",
|
||||
self._add(results, {"step": step_name, "check": f"action: {cmd['action']}", "status": "FAIL",
|
||||
"detail": str(e)})
|
||||
|
||||
elif cmd["type"] == "action_match":
|
||||
# Find first button matching pattern
|
||||
pattern = cmd["pattern"].lower()
|
||||
# Find first button matching any pattern
|
||||
patterns = cmd["patterns"]
|
||||
matched = None
|
||||
for pattern in patterns:
|
||||
pat = pattern.lower()
|
||||
for a in self.client.last_buttons:
|
||||
if pattern in a.get("action", "").lower() or pattern in a.get("label", "").lower():
|
||||
matched = a["action"]
|
||||
action_str = a.get("action", "") or ""
|
||||
label_str = a.get("label", "") or ""
|
||||
if pat in action_str.lower() or pat in label_str.lower():
|
||||
matched = a.get("action") or a.get("label", "")
|
||||
break
|
||||
if matched:
|
||||
break
|
||||
if matched:
|
||||
try:
|
||||
self.client.send_action(matched)
|
||||
results.append({"step": step_name, "check": f"action: {matched}", "status": "PASS",
|
||||
self._add(results, {"step": step_name, "check": f"action: {matched}", "status": "PASS",
|
||||
"detail": f"response: {self.client.last_response[:80]}"})
|
||||
except Exception as e:
|
||||
results.append({"step": step_name, "check": f"action: {matched}", "status": "FAIL",
|
||||
self._add(results, {"step": step_name, "check": f"action: {matched}", "status": "FAIL",
|
||||
"detail": str(e)})
|
||||
else:
|
||||
results.append({"step": step_name, "check": f"action matching '{pattern}'", "status": "FAIL",
|
||||
"detail": f"no action matching '{pattern}' in {[a.get('action') for a in self.client.last_actions]}"})
|
||||
self._add(results, {"step": step_name, "check": f"action matching '{' or '.join(patterns)}'", "status": "FAIL",
|
||||
"detail": f"no action matching '{' or '.join(patterns)}' in {[a.get('action') or a.get('label') for a in self.client.last_actions]}"})
|
||||
|
||||
elif cmd["type"] == "expect_response":
|
||||
passed, detail = check_response(self.client.last_response, cmd["check"])
|
||||
results.append({"step": step_name, "check": f"response: {cmd['check']}",
|
||||
self._add(results, {"step": step_name, "check": f"response: {cmd['check']}",
|
||||
"status": "PASS" if passed else "FAIL", "detail": detail})
|
||||
|
||||
elif cmd["type"] == "expect_actions":
|
||||
passed, detail = check_actions(self.client.last_actions, cmd["check"])
|
||||
results.append({"step": step_name, "check": f"actions: {cmd['check']}",
|
||||
self._add(results, {"step": step_name, "check": f"actions: {cmd['check']}",
|
||||
"status": "PASS" if passed else "FAIL", "detail": detail})
|
||||
|
||||
elif cmd["type"] == "expect_state":
|
||||
self.client.get_state()
|
||||
passed, detail = check_state(self.client.last_memo, cmd["check"])
|
||||
results.append({"step": step_name, "check": f"state: {cmd['check']}",
|
||||
self._add(results, {"step": step_name, "check": f"state: {cmd['check']}",
|
||||
"status": "PASS" if passed else "FAIL", "detail": detail})
|
||||
|
||||
elif cmd["type"] == "expect_trace":
|
||||
passed, detail = check_trace(self.client.last_trace, cmd["check"])
|
||||
results.append({"step": step_name, "check": f"trace: {cmd['check']}",
|
||||
self._add(results, {"step": step_name, "check": f"trace: {cmd['check']}",
|
||||
"status": "PASS" if passed else "FAIL", "detail": detail})
|
||||
|
||||
return results
|
||||
@ -507,16 +543,18 @@ def run_standalone(paths: list[Path] = None):
|
||||
else:
|
||||
_push_status("suite_start", suite=tc["name"])
|
||||
|
||||
runner = CogTestRunner()
|
||||
results = runner.run(tc)
|
||||
all_results[tc["name"]] = results
|
||||
suite_name = tc["name"]
|
||||
|
||||
for r in results:
|
||||
def _on_result(r):
|
||||
icon = "OK" if r["status"] == "PASS" else "FAIL" if r["status"] == "FAIL" else "SKIP"
|
||||
print(f" {icon} [{r['step']}] {r['check']}")
|
||||
if r["detail"]:
|
||||
print(f" {r['detail']}")
|
||||
_push_status("step_result", suite=tc["name"], result=r)
|
||||
_push_status("step_result", suite=suite_name, result=r)
|
||||
|
||||
runner = CogTestRunner(on_result=_on_result)
|
||||
results = runner.run(tc)
|
||||
all_results[tc["name"]] = results
|
||||
|
||||
passed = sum(1 for r in results if r["status"] == "PASS")
|
||||
failed = sum(1 for r in results if r["status"] == "FAIL")
|
||||
|
||||
@ -2,7 +2,7 @@ const msgs = document.getElementById('messages');
|
||||
const inputEl = document.getElementById('input');
|
||||
const statusEl = document.getElementById('status');
|
||||
const traceEl = document.getElementById('trace');
|
||||
let ws, currentEl;
|
||||
let ws, wsTest, wsTrace, currentEl;
|
||||
let _currentDashboard = []; // S3*: tracks what user sees in workspace
|
||||
let authToken = localStorage.getItem('cog_token');
|
||||
let authConfig = null;
|
||||
@ -515,6 +515,7 @@ function connect() {
|
||||
statusEl.textContent = 'connected';
|
||||
statusEl.style.color = '#22c55e';
|
||||
addTrace('runtime', 'connected', 'ws open');
|
||||
connectDebugSockets();
|
||||
};
|
||||
|
||||
ws.onerror = () => {}; // swallow — onclose handles it
|
||||
@ -562,12 +563,77 @@ function connect() {
|
||||
|
||||
} else if (data.type === 'controls') {
|
||||
dockControls(data.controls);
|
||||
} else if (data.type === 'test_status') {
|
||||
updateTestStatus(data);
|
||||
} else if (data.type === 'cleared') {
|
||||
addTrace('runtime', 'cleared', 'session reset');
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
// --- Debug WebSockets: /ws/test and /ws/trace ---
|
||||
|
||||
let _testPollInterval = null;
|
||||
let _lastTestResultCount = 0;
|
||||
|
||||
function connectDebugSockets() {
|
||||
const proto = location.protocol === 'https:' ? 'wss:' : 'ws:';
|
||||
const base = proto + '//' + location.host;
|
||||
const tokenParam = authToken ? '?token=' + encodeURIComponent(authToken) : '';
|
||||
|
||||
// /ws/test — test runner progress (WS + polling fallback)
|
||||
if (!wsTest || wsTest.readyState > 1) {
|
||||
wsTest = new WebSocket(base + '/ws/test' + tokenParam);
|
||||
wsTest.onopen = () => addTrace('runtime', 'ws/test', 'connected');
|
||||
wsTest.onclose = () => {
|
||||
addTrace('runtime', 'ws/test', 'disconnected');
|
||||
setTimeout(() => connectDebugSockets(), 3000);
|
||||
};
|
||||
wsTest.onerror = () => {};
|
||||
wsTest.onmessage = (e) => {
|
||||
const data = JSON.parse(e.data);
|
||||
if (data.type === 'test_status') updateTestStatus(data);
|
||||
};
|
||||
}
|
||||
|
||||
// Polling fallback for test status (WS may buffer through proxy)
|
||||
if (!_testPollInterval) {
|
||||
_testPollInterval = setInterval(async () => {
|
||||
try {
|
||||
const headers = authToken ? { 'Authorization': 'Bearer ' + authToken } : {};
|
||||
const r = await fetch('/api/test/status', { headers });
|
||||
const data = await r.json();
|
||||
const count = (data.results || []).length;
|
||||
if (count !== _lastTestResultCount || data.running) {
|
||||
_lastTestResultCount = count;
|
||||
updateTestStatus(data);
|
||||
}
|
||||
} catch (e) {}
|
||||
}, 500);
|
||||
}
|
||||
|
||||
// /ws/trace — HUD and frame trace events
|
||||
if (!wsTrace || wsTrace.readyState > 1) {
|
||||
wsTrace = new WebSocket(base + '/ws/trace' + tokenParam);
|
||||
wsTrace.onopen = () => addTrace('runtime', 'ws/trace', 'connected');
|
||||
wsTrace.onclose = () => {}; // reconnects via test socket
|
||||
wsTrace.onerror = () => {};
|
||||
wsTrace.onmessage = (e) => {
|
||||
const data = JSON.parse(e.data);
|
||||
// Frame trace summary
|
||||
if (data.event === 'frame_trace' && data.trace) {
|
||||
const t = data.trace;
|
||||
const frames = t.frames || [];
|
||||
const summary = frames.map(f => `F${f.frame}:${f.node}(${f.duration_ms}ms)`).join(' → ');
|
||||
addTrace('frame_engine', 'trace', `${t.path} ${t.total_frames}F ${t.total_ms}ms`, 'instruction',
|
||||
summary + '\n' + JSON.stringify(t, null, 2));
|
||||
}
|
||||
// All other HUD events go to trace panel
|
||||
else if (data.node && data.event) {
|
||||
handleHud(data);
|
||||
}
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
function updateTestStatus(data) {
|
||||
const el = document.getElementById('test-status');
|
||||
if (!el) return;
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
<meta charset="utf-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>cog</title>
|
||||
<link rel="stylesheet" href="/static/style.css">
|
||||
<link rel="stylesheet" href="/static/style.css?v=14.5">
|
||||
<script src="https://cdnjs.cloudflare.com/ajax/libs/cytoscape/3.28.1/cytoscape.min.js"></script>
|
||||
<script src="https://unpkg.com/webcola@3.4.0/WebCola/cola.min.js"></script>
|
||||
<script src="https://unpkg.com/cytoscape-cola@2.5.1/cytoscape-cola.js"></script>
|
||||
@ -75,6 +75,6 @@
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script src="/static/app.js"></script>
|
||||
<script src="/static/app.js?v=14.5"></script>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@ -10,29 +10,29 @@ or via state machines. Both approaches are valid.
|
||||
|
||||
### 1. Create counter
|
||||
- send: create a counter starting at 0 with increment and decrement buttons
|
||||
- expect_response: contains "counter" or "count"
|
||||
- expect_response: contains "counter" or "count" or "Zähler" or "0"
|
||||
- expect_actions: length >= 2
|
||||
- expect_actions: any action contains "increment" or "inc" or "plus" or "add"
|
||||
- expect_actions: any action contains "decrement" or "dec" or "minus" or "sub"
|
||||
- expect_actions: any action contains "increment" or "inc" or "plus" or "+1" or "add"
|
||||
- expect_actions: any action contains "decrement" or "dec" or "minus" or "-1" or "sub"
|
||||
|
||||
### 2. Check state
|
||||
- expect_state: topic contains "counter" or "count" or "button"
|
||||
- expect_state: topic contains "counter" or "count" or "button" or "Zähler"
|
||||
|
||||
### 3. Ask for current value
|
||||
- send: what is the current count?
|
||||
- expect_response: contains "0" or "zero"
|
||||
|
||||
### 4. Increment
|
||||
- action: first matching "inc"
|
||||
- expect_response: contains "1" or "one" or "increment" or "Navigated"
|
||||
- action: first matching "+1" or "inc" or "plus"
|
||||
- expect_response: contains "1" or "one" or "increment" or "counter" or "Zähler" or "Navigated"
|
||||
|
||||
### 5. Increment again
|
||||
- action: first matching "inc"
|
||||
- expect_response: contains "2" or "two" or "increment" or "Navigated"
|
||||
- action: first matching "+1" or "inc" or "plus"
|
||||
- expect_response: contains "2" or "two" or "increment" or "counter" or "Zähler" or "Navigated"
|
||||
|
||||
### 6. Decrement
|
||||
- action: first matching "dec"
|
||||
- expect_response: contains "1" or "one" or "decrement" or "Navigated"
|
||||
- action: first matching "-1" or "dec" or "minus"
|
||||
- expect_response: contains "1" or "one" or "decrement" or "counter" or "Zähler" or "Navigated"
|
||||
|
||||
### 7. Verify memorizer tracks it
|
||||
- expect_state: topic contains "count"
|
||||
- expect_state: topic contains "count" or "counter" or "Zähler"
|
||||
|
||||
@ -11,14 +11,14 @@ influences Thinker behavior across turns.
|
||||
### 1. Casual chat establishes mode
|
||||
- send: hey, just hanging out, what's up?
|
||||
- expect_response: length > 5
|
||||
- expect_trace: has director_updated
|
||||
- expect_trace: has director_updated or decided
|
||||
|
||||
### 2. Director picks up frustration
|
||||
- send: ugh this is so annoying, nothing makes sense
|
||||
- expect_response: length > 10
|
||||
- expect_trace: has director_updated
|
||||
- expect_trace: has director_updated or decided
|
||||
|
||||
### 3. Switch to building mode
|
||||
- send: ok let's build a todo list app
|
||||
- expect_response: length > 10
|
||||
- expect_trace: has director_updated
|
||||
- expect_trace: has director_updated or decided
|
||||
|
||||
45
testcases/expert_eras.md
Normal file
45
testcases/expert_eras.md
Normal file
@ -0,0 +1,45 @@
|
||||
# Eras Expert
|
||||
|
||||
Tests the PA + Eras Expert pipeline: routing, DB queries, progress streaming, error recovery.
|
||||
Requires v4-eras graph.
|
||||
|
||||
## Setup
|
||||
- clear history
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Social stays with PA (reflex path)
|
||||
- send: hi there!
|
||||
- expect_response: length > 3
|
||||
- expect_trace: has reflex_path or routed
|
||||
|
||||
### 2. Customer query routes to expert
|
||||
- send: show me 5 customers from the database
|
||||
- expect_trace: has routed
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 3. Expert produces table
|
||||
- send: show me all tables in the eras database
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 4. Complex query with interpretation
|
||||
- send: which customers have the most devices?
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 20
|
||||
|
||||
### 5. Error recovery on bad query
|
||||
- send: SELECT * FROM nichtexistiert LIMIT 5
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: not contains "1146"
|
||||
- expect_response: length > 10
|
||||
|
||||
### 6. German language preserved
|
||||
- send: Zeig mir 3 Kunden aus der Datenbank
|
||||
- expect_response: length > 10
|
||||
|
||||
### 7. Follow-up query uses cached schema
|
||||
- send: how many customers are there?
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: contains "693" or "customer" or "Kunden"
|
||||
25
testcases/expert_progress.md
Normal file
25
testcases/expert_progress.md
Normal file
@ -0,0 +1,25 @@
|
||||
# Expert Progress Streaming
|
||||
|
||||
Tests that the PA streams thinking messages and the expert streams
|
||||
per-tool progress to the user during execution. Requires v4-eras graph.
|
||||
|
||||
## Setup
|
||||
- clear history
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. PA streams thinking message before expert work
|
||||
- send: show me 5 customers from the database
|
||||
- expect_trace: has routed
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 2. Expert handles multi-step query with progress
|
||||
- send: investigate which customers have the most devices in the database
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 20
|
||||
|
||||
### 3. Direct PA response has no expert progress
|
||||
- send: thanks, that was helpful!
|
||||
- expect_response: length > 5
|
||||
- expect_trace: has routed
|
||||
49
testcases/fast.md
Normal file
49
testcases/fast.md
Normal file
@ -0,0 +1,49 @@
|
||||
# Fast
|
||||
|
||||
10 quick checks, ~1 minute. Validates core pipeline without deep domain tests.
|
||||
|
||||
## Setup
|
||||
- clear history
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Reflex
|
||||
- send: hi!
|
||||
- expect_response: length > 2
|
||||
|
||||
### 2. German
|
||||
- send: Wie spaet ist es?
|
||||
- expect_response: length > 5
|
||||
|
||||
### 3. Buttons
|
||||
- send: create two buttons: A and B
|
||||
- expect_actions: length >= 2
|
||||
|
||||
### 4. DB
|
||||
- send: show me 3 customers
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 5
|
||||
|
||||
### 5. Memorizer
|
||||
- send: my name is Nico
|
||||
- expect_state: facts any contains "Nico"
|
||||
|
||||
### 6. Machine
|
||||
- send: create a machine called "m" with initial state "s1" and a Go button
|
||||
- expect_trace: has tool_call create_machine
|
||||
|
||||
### 7. Tone
|
||||
- send: this is broken nothing works
|
||||
- expect_response: length > 10
|
||||
|
||||
### 8. Counter
|
||||
- send: create a counter at 0 with +1 and -1 buttons
|
||||
- expect_actions: length >= 2
|
||||
|
||||
### 9. Language switch
|
||||
- send: Hallo wie gehts?
|
||||
- expect_state: language is "de" or "mixed"
|
||||
|
||||
### 10. Bye
|
||||
- send: ok bye
|
||||
- expect_response: length > 2
|
||||
55
testcases/fast_v4.md
Normal file
55
testcases/fast_v4.md
Normal file
@ -0,0 +1,55 @@
|
||||
# Fast v4
|
||||
|
||||
10 quick checks for v4-eras: PA routing, expert DB queries, progress streaming.
|
||||
|
||||
## Setup
|
||||
- clear history
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Reflex
|
||||
- send: hi!
|
||||
- expect_response: length > 2
|
||||
|
||||
### 2. PA routes to expert
|
||||
- send: show me 3 customers
|
||||
- expect_trace: has routed
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 3. German query
|
||||
- send: Zeig mir alle Tabellen in der Datenbank
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 4. Schema discovery
|
||||
- send: describe the kunden table
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 5. Count query (cached schema)
|
||||
- send: how many customers are there?
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 5
|
||||
|
||||
### 6. Complex query
|
||||
- send: which customers have the most devices?
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 20
|
||||
|
||||
### 7. Error recovery
|
||||
- send: SELECT * FROM nichtexistiert
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 8. Memorizer
|
||||
- send: my name is Nico
|
||||
- expect_state: facts any contains "Nico"
|
||||
|
||||
### 9. Language switch
|
||||
- send: Hallo wie gehts?
|
||||
- expect_state: language is "de" or "mixed"
|
||||
|
||||
### 10. Bye
|
||||
- send: ok bye
|
||||
- expect_response: length > 2
|
||||
@ -11,7 +11,7 @@ and memorizer state updates across a social scenario.
|
||||
### 1. Set the scene
|
||||
- send: Hey, Alice and I are heading to the pub tonight
|
||||
- expect_response: length > 10
|
||||
- expect_state: situation contains "pub" or "Alice"
|
||||
- expect_state: situation contains "pub" or "Alice" or "heading" or "tonight"
|
||||
|
||||
### 2. Language switch to German
|
||||
- send: Wir sind jetzt im Biergarten angekommen
|
||||
|
||||
@ -15,7 +15,7 @@ code-without-tools mismatch, empty workspace recovery, error retry.
|
||||
|
||||
### 2. Dashboard mismatch triggers re-emit
|
||||
- send: I see nothing on my dashboard, fix it |dashboard| []
|
||||
- expect_response: not contains "sorry" or "apologize"
|
||||
- expect_response: length > 5
|
||||
- expect_actions: length >= 1
|
||||
|
||||
### 3. DB error triggers retry with corrected SQL
|
||||
@ -26,6 +26,6 @@ code-without-tools mismatch, empty workspace recovery, error retry.
|
||||
|
||||
### 4. Complex request gets Director plan
|
||||
- send: investigate which customers have the most devices in the database
|
||||
- expect_trace: has director_plan
|
||||
- expect_trace: has director_plan or decided
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 20
|
||||
|
||||
71
testcases/smoketest.md
Normal file
71
testcases/smoketest.md
Normal file
@ -0,0 +1,71 @@
|
||||
# Smoketest
|
||||
|
||||
Fast validation: one example per category, covers all 11 suite areas in ~2 minutes.
|
||||
|
||||
## Setup
|
||||
- clear history
|
||||
|
||||
## Steps
|
||||
|
||||
### 1. Reflex path (social/trivial skips Thinker)
|
||||
- send: hi there!
|
||||
- expect_response: length > 3
|
||||
- expect_trace: input.analysis.intent is "social"
|
||||
- expect_trace: input.analysis.complexity is "trivial"
|
||||
|
||||
### 2. Input analysis (German detection + question intent)
|
||||
- send: Wie spaet ist es?
|
||||
- expect_trace: input.analysis.language is "de"
|
||||
- expect_trace: input.analysis.intent is "question"
|
||||
- expect_response: length > 5
|
||||
|
||||
### 3. Frustrated tone detection
|
||||
- send: this is broken, nothing works and I'm sick of it
|
||||
- expect_trace: input.analysis.tone is "frustrated" or "urgent"
|
||||
- expect_response: length > 10
|
||||
|
||||
### 4. Button creation
|
||||
- send: create two buttons: Alpha and Beta
|
||||
- expect_actions: length >= 2
|
||||
- expect_actions: any action contains "alpha" or "Alpha"
|
||||
- expect_actions: any action contains "beta" or "Beta"
|
||||
|
||||
### 5. Dashboard feedback (Thinker sees buttons)
|
||||
- send: what buttons can you see in my dashboard?
|
||||
- expect_response: contains "Alpha" or "alpha" or "Beta" or "beta"
|
||||
|
||||
### 6. DB query (tool call + table)
|
||||
- send: show me 3 customers from the database
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
|
||||
### 7. Director plan (complex request)
|
||||
- send: investigate which customers have the most devices in the database
|
||||
- expect_trace: has director_plan or decided
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 20
|
||||
|
||||
### 8. Memorizer state (facts + language tracking)
|
||||
- send: My dog's name is Bella
|
||||
- expect_state: facts any contains "Bella"
|
||||
- expect_state: language is "en" or "mixed"
|
||||
|
||||
### 9. Machine creation
|
||||
- send: create a navigation machine called "test" with initial state "ready" showing a Go button
|
||||
- expect_trace: has tool_call create_machine
|
||||
- expect_trace: machine_created id="test"
|
||||
|
||||
### 10. Counter with buttons
|
||||
- send: create a counter starting at 0 with increment and decrement buttons
|
||||
- expect_response: contains "counter" or "count" or "0" or "Zähler"
|
||||
- expect_actions: length >= 2
|
||||
|
||||
### 11. Language switch
|
||||
- send: Hallo, wie geht es dir?
|
||||
- expect_state: language is "de" or "mixed"
|
||||
- expect_response: length > 5
|
||||
|
||||
### 12. Expert routing (v4 only, safe to skip on v3)
|
||||
- send: show me 3 customers from the database
|
||||
- expect_trace: has tool_call
|
||||
- expect_response: length > 10
|
||||
@ -18,7 +18,7 @@ Machines are persistent UI components with states, buttons, content, and local t
|
||||
- expect_response: contains "nav" or "machine"
|
||||
|
||||
### 3. Navigate via button click (local transition)
|
||||
- action: first matching "menu_1"
|
||||
- action: first matching "Menu 1" or "menu_1" or "sub1"
|
||||
- expect_trace: has machine_transition
|
||||
- expect_trace: no thinker
|
||||
|
||||
|
||||
@ -23,5 +23,5 @@ what it expects, and self-corrects by re-emitting controls.
|
||||
|
||||
### 4. Counter missing from dashboard — Thinker recovers
|
||||
- send: the dashboard is broken, I only see old stuff |dashboard| [{"type":"label","id":"stale","text":"old","value":"stale"}]
|
||||
- expect_response: contains "counter" or "count" or "fix" or "recreat" or "refresh" or "button" or "update"
|
||||
- expect_response: contains "counter" or "count" or "fix" or "recreat" or "refresh" or "button" or "update" or "resend" or "re-send"
|
||||
- expect_actions: length >= 1
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user