24 Commits

Author SHA1 Message Date
Nico
217d1a57d9 v0.16.0: Workspace component system — cards, lists, structured display
New workspace components:
- emit_card: structured detail card with title, subtitle, fields, actions
  Fields can be clickable links (action property)
  Used for: entity details (Kunde, Objekt, Auftrag)
- emit_list: vertical list of cards for multiple entities
  Used for: search results, navigation lists
- "WHEN TO USE WHAT" guide in expert prompt

Frontend rendering:
- renderCard() with key-value fields, clickable links, action buttons
- List container with title + stacked cards
- Full CSS: dark theme cards, hover states, link styling

Pipeline:
- ExpertNode handles emit_card/emit_list in tool execution
- UINode passes card/list through as-is (not wrapped in display)
- Test runner: check_actions supports "has card", "has list", "has X or Y"

Workspace components test: 22/22

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 20:54:47 +02:00
Nico
d8ab778257 v0.15.9: Auto-DESCRIBE retry, unmapped table recovery, animation queue
Expert retry loop enhanced:
- On "Unknown column" error, auto-DESCRIBEs the failing table
- DESCRIBE result injected into re-plan context
- Unmapped tables handled via SELECT * LIMIT fallback
- Recovery test step 4: abrechnungsinformationen (unmapped) → success

Graph animation queue:
- Events queued and played sequentially with 200ms interval
- Prevents bulk HUD events from canceling each other's animations
- Node pulses and edge flashes play one by one

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 19:43:33 +02:00
Nico
067cbccea6 v0.15.8: Expert retry loop, fixed geraete schema, action routing, stable nodes
Expert retry loop (max 3 attempts):
- On SQL error, re-plans with error context injected
- "PREVIOUS ATTEMPTS FAILED" section tells LLM what went wrong
- Breaks out of tool sequence on error, retries full plan
- Only reports failure after exhausting retries
- Recovery test: 13/13

Schema fixes:
- geraete: Geraetenummer, Bezeichnung, Beschreibung (were Fabriknummer, Funkkennung)
- geraeteverbraeuche: all columns verified
- nutzer: all columns verified

Action routing:
- Button clicks route through PA→Expert in v4 (was missing has_pa check)
- WS handler catches exceptions, sends error HUD instead of crashing

Nodes panel:
- Fixed pipeline order, no re-sorting
- Normalized names (pa_v1→pa, expert_eras→eras)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 19:34:01 +02:00
Nico
faeb9d3254 v0.15.7: Fix action routing for v4, WS error handling, stable nodes panel
Action routing:
- Button clicks now route through PA→Expert in v4 (was missing has_pa check)
- Previously crashed with KeyError on missing thinker node

WS error handling:
- Exceptions in WS handler caught and logged, not crash
- Frontend receives error HUD event instead of disconnect
- Prevents 1006 reconnect loops on action errors

Nodes panel:
- Fixed pipeline order (no re-sorting on events)
- Deduplicated node names (pa_v1→pa, expert_eras→eras)
- Normalized names in state tracker

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 19:16:15 +02:00
Nico
84fa0830d8 v0.15.6: Baked schema expert — no DESCRIBE at runtime, domain mastery 38/38
Expert knows the full eras2_production schema cold:
- All PKs, FKs, column names verified from DESCRIBE
- Junction tables: objektkunde (kunden↔objekte), objektadressen, kundenadressen
- Exact JOIN patterns baked into prompt
- No DESCRIBE/SHOW at runtime — plan once, execute
- Domain language responses (not SQL dumps)

Simplified ExpertNode.execute():
- Removed iterative DESCRIBE→re-plan loop
- Single plan+execute pass (schema is known)
- Faster: 1 LLM call for plan instead of 2-3

Domain mastery test (eras_domain.md): 38/38
- Customer overview, junction table JOINs, full hierarchy traversal
- Address lookup, Verbrauchsdaten, domain language, no DESCRIBE check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 19:03:52 +02:00
Nico
b9320693ed v0.15.5: Corrected FK mappings, objektkunde junction table, multi-hop query test
Schema corrections:
- kunden PK = ID (not Kundennummer)
- objekte PK = ID (not ObjektID)
- kunden↔objekte linked via objektkunde junction table (many-to-many)
- Removed guessed column names, only verified PKs/FKs in SCHEMA
- Added explicit JOIN patterns for the hierarchy

Domain context test: 25/25 (added multi-hop Jaeger query through 4 tables)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:54:52 +02:00
Nico
e19520db74 v0.15.4: Populate graph + nodes panel on page load
- /api/graph/active now includes node_details (model, max_tokens per node)
- graph.js calls initNodesFromGraph() after fetching active graph
- Nodes panel shows all nodes with models immediately on load (before first message)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:42:53 +02:00
Nico
2d649fa448 v0.15.3: Domain context, iterative plan-execute, FK mappings, ES6 node inspector
Eras Expert domain context:
- Full Heizkostenabrechnung business model (Kunde>Objekte>Nutzeinheiten>Geraete)
- Known PK/FK mappings: kunden.Kundennummer, objekte.KundenID, etc.
- Correct JOIN example in SCHEMA prompt
- PA knows domain hierarchy for better job formulation

Iterative plan-execute in ExpertNode:
- DESCRIBE queries execute first, results injected into re-plan
- Re-plan uses actual column names from DESCRIBE
- Eliminates "Unknown column" errors on first query

Frontend:
- Node inspector: per-node cards with model, tokens, progress, last event
- Graph switcher buttons in top bar
- Clear button in top bar
- Nodes panel 300px wide
- WS reconnect on 1006 (deploy) without showing login
- Model info emitted on context HUD events

Domain context test: 21/21 (hierarchy, JOINs, FK, PA job quality)
Default graph: v4-eras

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 18:34:42 +02:00
Nico
3a9c2795cf v0.15.2: ES6 module refactor, 2-row layout, dashboard test, PA routing fix
Frontend refactored to ES6 modules (no bundler):
  js/main.js    — entry point, wires all modules
  js/auth.js    — OIDC login, token management
  js/ws.js      — /ws, /ws/test, /ws/trace connections + HUD handler
  js/chat.js    — messages, send, streaming
  js/graph.js   — Cytoscape visualization + animation
  js/trace.js   — trace panel
  js/dashboard.js — workspace controls rendering
  js/awareness.js — state panel, sensors, meters
  js/tests.js   — test status display
  js/util.js    — shared utilities

New 2-row layout:
  Top:    test status | connection status
  Middle: Workspace | Node Details | Graph
  Bottom: Chat | Awareness | Trace

PA routing: routes ALL tool requests to expert (DB, UI, buttons, machines)
Dashboard integration test: 15/15

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:58:47 +02:00
Nico
fda0d7cfce v0.15.1: PA routes all tool requests to expert, dashboard integration test
- PA prompt updated: routes ANY task needing tools (DB, UI, buttons, machines)
  to expert. Only social chat stays with PA.
- Expert descriptions include UI capabilities (buttons, machines, tables)
- Dashboard integration test: expert creates/replaces buttons, machines,
  tables — all persist correctly across queries
- v4-eras scores: fast 27/28, expert 23/23, dashboard 15/15, progress 11/11

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:28:44 +02:00
Nico
1000411eb2 v0.15.0: Frame engine (v3), PA + Expert architecture (v4-eras), live test streaming
Frame Engine (v3-framed):
- Tick-based deterministic pipeline: frames advance on completion, not timers
- FrameRecord/FrameTrace dataclasses for structured per-message tracing
- /api/frames endpoint: queryable frame trace history (last 20 messages)
- frame_trace HUD event with full pipeline visibility
- Reflex=2F, Director=4F, Director+Interpreter=5F deterministic frame counts

Expert Architecture (v4-eras):
- PA node (pa_v1): routes to domain experts, holds user context
- ExpertNode base: stateless executor with plan+execute two-LLM-call pattern
- ErasExpertNode: eras2_production DB specialist with DESCRIBE-first discipline
- Schema caching: DESCRIBE results reused across queries within session
- Progress streaming: PA streams thinking message, expert streams per-tool progress
- PARouting type for structured routing decisions

UI Controls Split:
- Separate thinker_controls from machine controls (current_controls is now a property)
- Machine buttons persist across Thinker responses
- Machine state parser handles both dict and list formats from Director
- Normalized button format with go/payload field mapping

WebSocket Architecture:
- /ws/test: dedicated debug socket for test runner progress
- /ws/trace: dedicated debug socket for HUD/frame trace events
- /ws (chat): cleaned up, only deltas/controls/done/cleared
- WS survives graph switch (re-attaches to new runtime)
- Pipeline result reset on clear

Test Infrastructure:
- Live test streaming: on_result callback fires per check during execution
- Frontend polling fallback (500ms) for proxy-buffered WS
- frame_trace-first trace assertion (fixes stale perceived event bug)
- action_match supports "or" patterns and multi-pattern matching
- Trace window increased to 40 events
- Graph-agnostic assertions (has X or Y)

Test Suites:
- smoketest.md: 12 steps covering all categories (~2min)
- fast.md: 10 quick checks (~1min)
- fast_v4.md: 10 v4-eras specific checks
- expert_eras.md: eras domain tests (routing, DB, schema, errors)
- expert_progress.md: progress streaming tests

Other:
- Shared db.py extracted from thinker_v2 (reused by experts)
- InputNode prompt: few-shot examples, history as context summary
- Director prompt: full tool signatures for add_state/reset_machine/destroy_machine
- nginx no-cache headers for static files during development
- Cache-busted static file references

Scores: v3 smoketest 39/40, v4-eras fast 28/28, expert_eras 23/23

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:10:31 +02:00
Nico
4c412d3c4b v0.14.4: Interpreter wired in v2, tool_call convention, Haiku models, UI fix
- Wire Interpreter into v2 pipeline (after Thinker tool_output, before Output)
- Rename tool_exec -> tool_call everywhere (consistent convention across v1/v2)
- Switch Director v1+v2 to anthropic/claude-haiku-4.5 (was opus, reserved)
- Fix UI apply_machine_ops crash when states are strings instead of dicts
- Fix runtime_test.py async poll to match on message ID (prevent stale results)
- Add traceback to pipeline error logging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 06:06:13 +02:00
Nico
51f2929092 v0.14.2: Test runner with live frontend reporting
- Harness reports to /api/test/status with suite_start/step_result/suite_end
- Frontend shows x/44 progress, per-test duration, total elapsed time
- Auto-discovers test count from test modules (no hardcoded number)
- run_all.py --report URL pushes live results to browser
- Fix: suite_start with count only resets on first call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 05:08:55 +02:00
Nico
6f4d26ab82 v0.14.1: Decouple Runtime from WebSocket — persistent server-side runtime
- OutputSink: collects output, optionally streams to attached WS
- Runtime no longer requires WebSocket — works headless for MCP
- WS connects/disconnects via attach_ws()/detach_ws(), runtime persists
- /api/send/check + /api/send (async) + /api/result (poll with progress)
- Graph switch destroys old runtime, next request creates new one
- Director v2 model: claude-opus-4 (was claude-sonnet-4, reserved)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 04:36:28 +02:00
Nico
5f447dfd53 v0.14.0: v2 Director-drives architecture + 3-pod K8s split
Architecture:
- director_v2: always-on brain, produces DirectorPlan with tool_sequence
- thinker_v2: pure executor, runs tools from DirectorPlan
- interpreter_v1: factual result summarizer, no hallucination
- v2_director_drives graph: Input -> Director -> Thinker -> Output

Infrastructure:
- Split into 3 pods: cog-frontend (nginx), cog-runtime (FastAPI), cog-mcp (SSE proxy)
- MCP survives runtime restarts (separate pod, proxies via HTTP)
- Async send pipeline: /api/send/check -> /api/send -> /api/result with progress
- Zero-downtime rolling updates (maxUnavailable: 0)
- Dynamic graph visualization (fetched from API, not hardcoded)

Tests: 22 new mocked unit tests (director_v2: 7, thinker_v2: 8, interpreter_v1: 7)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 04:17:44 +02:00
Nico
a2bc6347fc v0.13.0: Graph engine, versioned nodes, S3* audit, DB tools, Cytoscape
Architecture:
- Graph engine (engine.py) loads graph definitions, instantiates nodes
- Versioned nodes: input_v1, thinker_v1, output_v1, memorizer_v1, director_v1
- NODE_REGISTRY for dynamic node lookup by name
- Graph API: /api/graph/active, /api/graph/list, /api/graph/switch
- Graph definition: graphs/v1_current.py (7 nodes, 13 edges, 3 edge types)

S3* Audit system:
- Workspace mismatch detection (server vs browser controls)
- Code-without-tools retry (Thinker wrote code but no tool calls)
- Intent-without-action retry (request intent but Thinker only produced text)
- Dashboard feedback: browser sends workspace state on every message
- Sensor continuous comparison on 5s tick

State machines:
- create_machine / add_state / reset_machine / destroy_machine via function calling
- Local transitions (go:) resolve without LLM round-trip
- Button persistence across turns

Database tools:
- query_db tool via pymysql to MariaDB K3s pod (eras2_production)
- Table rendering in workspace (tab-separated parsing)
- Director pre-planning with Opus for complex data requests
- Error retry with corrected SQL

Frontend:
- Cytoscape.js pipeline graph with real-time node animations
- Overlay scrollbars (CSS-only, no reflow)
- Tool call/result trace events
- S3* audit events in trace

Testing:
- 167 integration tests (11 test suites)
- 22 node-level unit tests (test_nodes/)
- Three test levels: node unit, graph integration, scenario

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 00:18:45 +01:00
Nico
3f8886cbd2 v0.10.4: stateful UI engine — TDD counter test green (36/36)
RED->GREEN->REFACTOR cycle:
- UI node has state store (key-value), action bindings (op/var), and
  local action handlers (inc/dec/set/toggle — no LLM round-trip)
- Thinker self-model: knows its environment, that ACTIONS create real
  buttons, that UI handles state locally. Emits var/op payload for
  stateful actions.
- Thinker's context includes UI state so it can report current values
- /api/clear resets UI state, bindings, and controls
- Test runner: action_match for fuzzy action names, persistent actions
  across steps, _stream_text restored
- Counter test: 16/16 passed (create, read, inc, inc, dec, verify)
- Pub test: 20/20 passed (conversation, language switch, tool use, mood)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:50:37 +01:00
Nico
3d71c651fc v0.10.0: test framework with markdown testcases and web UI
- testcases/*.md: declarative test definitions (send, expect_response,
  expect_state, expect_actions, action)
- runtime_test.py: standalone runner + pytest integration via conftest.py
- /tests route: web UI showing last run results from results.json
- /api/tests: serves results JSON
- Two initial testcases: counter_state (UI actions) and pub_conversation
  (multi-turn, language switch, tool use, memorizer state)
- pub_conversation: 19/20 passed on first run
- Fix nm-text vertical overflow in node metrics bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:36:19 +01:00
Nico
acc0dff4e5 v0.9.9: deterministic UI — Thinker declares actions, UI renders without LLM
- Thinker emits ACTIONS: JSON line with follow-up buttons
- UI node is now pure code (no LLM call) — renders actions as buttons,
  extracts tables from pipe-separated tool output, labels for single values
- Controls only in workspace panel, not duplicated in chat
- Process cards only in awareness panel, failed auto-remove after 30s
- Auth expiry detection: 403/1006 shows login button, stops reconnect loop
- Sensor timezone fix: zoneinfo.ZoneInfo("Europe/Berlin") for proper DST
- Cache-Control: no-cache on index.html
- Markdown rendering in chat messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:06:24 +01:00
Nico
b6ca02f864 v0.9.2: dedicated UI node, strict node roles, markdown rendering
6-node pipeline: Input -> Thinker -> Output (voice) + UI (screen) in parallel

- Output: text only (markdown, emoji). Never emits HTML or controls.
- UI: dedicated node for labels, buttons, tables. Tracks workspace state.
  Replaces entire workspace on each update. Runs parallel with Output.
- Input: strict one-sentence perception. No more hallucinating responses.
- Thinker: controls removed from prompt, focuses on reasoning + tools.
- Frontend: markdown rendered in chat (bold, italic, code blocks, lists).
  Label control type added. UI node meter in top bar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:12:15 +01:00
Nico
f6939d47f5 v0.8.5: smart Output renderer + awareness panel
Output node upgraded from dumb echo to device-aware renderer:
- Knows it's rendering to HTML/browser, uses markdown formatting
- Receives full ThoughtResult (response + tool output + controls)
- Always in pipeline: Input perceives, Thinker reasons, Output renders
- Keeps user's language, weaves tool results into natural responses

Awareness panel (3-column layout):
- State: mood, topic, language, facts from Memorizer
- Sensors: clock, idle, memo deltas from Sensor ticks
- Processes: live cards with cancel during tool execution
- Workspace: docked controls (tables/buttons) persist across messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 02:02:41 +01:00
Nico
4e2cd4ed59 v0.8.3: fix SQL double-wrap bug in Thinker tool parsing
Python code blocks containing SQL keywords (SELECT, CREATE) were
incorrectly re-wrapped in the SQL template. Now only blocks explicitly
tagged as sql/sqlite get wrapped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 01:49:21 +01:00
Nico
231f81bc52 v0.8.2: fix pipeline — skip Output for tools, process HUD, inline controls, structured actions
- Thinker tool results stream directly to user, skipping Output node (halves latency)
- ProcessManager process_start/process_done events render as live cards in chat
- UI controls sent before response text, not after
- Button clicks route to handle_action(), skip Input, go straight to Thinker
- Fix Thinker model: gemini-2.5-flash-preview -> gemini-2.5-flash (old ID expired)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 01:43:07 +01:00
Nico
7458b2ea35 v0.8.0: refactor agent.py into modular package
Split 1161-line monolith into agent/ package:
auth, llm, types, process, runtime, api, and
nodes/ (base, sensor, input, output, thinker, memorizer).
No logic changes — pure structural split.
uvicorn agent:app entrypoint unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 01:36:41 +01:00