agent-runtime

Author	SHA1	Message	Date
Nico	fda0d7cfce	v0.15.1: PA routes all tool requests to expert, dashboard integration test - PA prompt updated: routes ANY task needing tools (DB, UI, buttons, machines) to expert. Only social chat stays with PA. - Expert descriptions include UI capabilities (buttons, machines, tables) - Dashboard integration test: expert creates/replaces buttons, machines, tables — all persist correctly across queries - v4-eras scores: fast 27/28, expert 23/23, dashboard 15/15, progress 11/11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 17:28:44 +02:00
Nico	1000411eb2	v0.15.0: Frame engine (v3), PA + Expert architecture (v4-eras), live test streaming Frame Engine (v3-framed): - Tick-based deterministic pipeline: frames advance on completion, not timers - FrameRecord/FrameTrace dataclasses for structured per-message tracing - /api/frames endpoint: queryable frame trace history (last 20 messages) - frame_trace HUD event with full pipeline visibility - Reflex=2F, Director=4F, Director+Interpreter=5F deterministic frame counts Expert Architecture (v4-eras): - PA node (pa_v1): routes to domain experts, holds user context - ExpertNode base: stateless executor with plan+execute two-LLM-call pattern - ErasExpertNode: eras2_production DB specialist with DESCRIBE-first discipline - Schema caching: DESCRIBE results reused across queries within session - Progress streaming: PA streams thinking message, expert streams per-tool progress - PARouting type for structured routing decisions UI Controls Split: - Separate thinker_controls from machine controls (current_controls is now a property) - Machine buttons persist across Thinker responses - Machine state parser handles both dict and list formats from Director - Normalized button format with go/payload field mapping WebSocket Architecture: - /ws/test: dedicated debug socket for test runner progress - /ws/trace: dedicated debug socket for HUD/frame trace events - /ws (chat): cleaned up, only deltas/controls/done/cleared - WS survives graph switch (re-attaches to new runtime) - Pipeline result reset on clear Test Infrastructure: - Live test streaming: on_result callback fires per check during execution - Frontend polling fallback (500ms) for proxy-buffered WS - frame_trace-first trace assertion (fixes stale perceived event bug) - action_match supports "or" patterns and multi-pattern matching - Trace window increased to 40 events - Graph-agnostic assertions (has X or Y) Test Suites: - smoketest.md: 12 steps covering all categories (~2min) - fast.md: 10 quick checks (~1min) - fast_v4.md: 10 v4-eras specific checks - expert_eras.md: eras domain tests (routing, DB, schema, errors) - expert_progress.md: progress streaming tests Other: - Shared db.py extracted from thinker_v2 (reused by experts) - InputNode prompt: few-shot examples, history as context summary - Director prompt: full tool signatures for add_state/reset_machine/destroy_machine - nginx no-cache headers for static files during development - Cache-busted static file references Scores: v3 smoketest 39/40, v4-eras fast 28/28, expert_eras 23/23 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 17:10:31 +02:00
Nico	4c412d3c4b	v0.14.4: Interpreter wired in v2, tool_call convention, Haiku models, UI fix - Wire Interpreter into v2 pipeline (after Thinker tool_output, before Output) - Rename tool_exec -> tool_call everywhere (consistent convention across v1/v2) - Switch Director v1+v2 to anthropic/claude-haiku-4.5 (was opus, reserved) - Fix UI apply_machine_ops crash when states are strings instead of dicts - Fix runtime_test.py async poll to match on message ID (prevent stale results) - Add traceback to pipeline error logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 06:06:13 +02:00
Nico	da92109550	v0.14.3: Integration test runner — real pipeline, both graphs - send_and_wait: POST /api/send + poll /api/result with timeout - 5 test cases: greeting, german, DB count, buttons, show tables - Clears state between tests for predictability - --graph both: runs v1 + v2 back to back - Reports live to frontend via /api/test/status - Both graphs 5/5 green Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 05:21:14 +02:00
Nico	51f2929092	v0.14.2: Test runner with live frontend reporting - Harness reports to /api/test/status with suite_start/step_result/suite_end - Frontend shows x/44 progress, per-test duration, total elapsed time - Auto-discovers test count from test modules (no hardcoded number) - run_all.py --report URL pushes live results to browser - Fix: suite_start with count only resets on first call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 05:08:55 +02:00
Nico	6f4d26ab82	v0.14.1: Decouple Runtime from WebSocket — persistent server-side runtime - OutputSink: collects output, optionally streams to attached WS - Runtime no longer requires WebSocket — works headless for MCP - WS connects/disconnects via attach_ws()/detach_ws(), runtime persists - /api/send/check + /api/send (async) + /api/result (poll with progress) - Graph switch destroys old runtime, next request creates new one - Director v2 model: claude-opus-4 (was claude-sonnet-4, reserved) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 04:36:28 +02:00
Nico	5f447dfd53	v0.14.0: v2 Director-drives architecture + 3-pod K8s split Architecture: - director_v2: always-on brain, produces DirectorPlan with tool_sequence - thinker_v2: pure executor, runs tools from DirectorPlan - interpreter_v1: factual result summarizer, no hallucination - v2_director_drives graph: Input -> Director -> Thinker -> Output Infrastructure: - Split into 3 pods: cog-frontend (nginx), cog-runtime (FastAPI), cog-mcp (SSE proxy) - MCP survives runtime restarts (separate pod, proxies via HTTP) - Async send pipeline: /api/send/check -> /api/send -> /api/result with progress - Zero-downtime rolling updates (maxUnavailable: 0) - Dynamic graph visualization (fetched from API, not hardcoded) Tests: 22 new mocked unit tests (director_v2: 7, thinker_v2: 8, interpreter_v1: 7) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 04:17:44 +02:00
Nico	a2bc6347fc	v0.13.0: Graph engine, versioned nodes, S3* audit, DB tools, Cytoscape Architecture: - Graph engine (engine.py) loads graph definitions, instantiates nodes - Versioned nodes: input_v1, thinker_v1, output_v1, memorizer_v1, director_v1 - NODE_REGISTRY for dynamic node lookup by name - Graph API: /api/graph/active, /api/graph/list, /api/graph/switch - Graph definition: graphs/v1_current.py (7 nodes, 13 edges, 3 edge types) S3* Audit system: - Workspace mismatch detection (server vs browser controls) - Code-without-tools retry (Thinker wrote code but no tool calls) - Intent-without-action retry (request intent but Thinker only produced text) - Dashboard feedback: browser sends workspace state on every message - Sensor continuous comparison on 5s tick State machines: - create_machine / add_state / reset_machine / destroy_machine via function calling - Local transitions (go:) resolve without LLM round-trip - Button persistence across turns Database tools: - query_db tool via pymysql to MariaDB K3s pod (eras2_production) - Table rendering in workspace (tab-separated parsing) - Director pre-planning with Opus for complex data requests - Error retry with corrected SQL Frontend: - Cytoscape.js pipeline graph with real-time node animations - Overlay scrollbars (CSS-only, no reflow) - Tool call/result trace events - S3* audit events in trace Testing: - 167 integration tests (11 test suites) - 22 node-level unit tests (test_nodes/) - Three test levels: node unit, graph integration, scenario Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 00:18:45 +01:00
Nico	3f8886cbd2	v0.10.4: stateful UI engine — TDD counter test green (36/36) RED->GREEN->REFACTOR cycle: - UI node has state store (key-value), action bindings (op/var), and local action handlers (inc/dec/set/toggle — no LLM round-trip) - Thinker self-model: knows its environment, that ACTIONS create real buttons, that UI handles state locally. Emits var/op payload for stateful actions. - Thinker's context includes UI state so it can report current values - /api/clear resets UI state, bindings, and controls - Test runner: action_match for fuzzy action names, persistent actions across steps, _stream_text restored - Counter test: 16/16 passed (create, read, inc, inc, dec, verify) - Pub test: 20/20 passed (conversation, language switch, tool use, mood) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 15:50:37 +01:00
Nico	3d71c651fc	v0.10.0: test framework with markdown testcases and web UI - testcases/*.md: declarative test definitions (send, expect_response, expect_state, expect_actions, action) - runtime_test.py: standalone runner + pytest integration via conftest.py - /tests route: web UI showing last run results from results.json - /api/tests: serves results JSON - Two initial testcases: counter_state (UI actions) and pub_conversation (multi-turn, language switch, tool use, memorizer state) - pub_conversation: 19/20 passed on first run - Fix nm-text vertical overflow in node metrics bar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 15:36:19 +01:00
Nico	acc0dff4e5	v0.9.9: deterministic UI — Thinker declares actions, UI renders without LLM - Thinker emits ACTIONS: JSON line with follow-up buttons - UI node is now pure code (no LLM call) — renders actions as buttons, extracts tables from pipe-separated tool output, labels for single values - Controls only in workspace panel, not duplicated in chat - Process cards only in awareness panel, failed auto-remove after 30s - Auth expiry detection: 403/1006 shows login button, stops reconnect loop - Sensor timezone fix: zoneinfo.ZoneInfo("Europe/Berlin") for proper DST - Cache-Control: no-cache on index.html - Markdown rendering in chat messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 15:06:24 +01:00
Nico	b6ca02f864	v0.9.2: dedicated UI node, strict node roles, markdown rendering 6-node pipeline: Input -> Thinker -> Output (voice) + UI (screen) in parallel - Output: text only (markdown, emoji). Never emits HTML or controls. - UI: dedicated node for labels, buttons, tables. Tracks workspace state. Replaces entire workspace on each update. Runs parallel with Output. - Input: strict one-sentence perception. No more hallucinating responses. - Thinker: controls removed from prompt, focuses on reasoning + tools. - Frontend: markdown rendered in chat (bold, italic, code blocks, lists). Label control type added. UI node meter in top bar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 14:12:15 +01:00
Nico	f6939d47f5	v0.8.5: smart Output renderer + awareness panel Output node upgraded from dumb echo to device-aware renderer: - Knows it's rendering to HTML/browser, uses markdown formatting - Receives full ThoughtResult (response + tool output + controls) - Always in pipeline: Input perceives, Thinker reasons, Output renders - Keeps user's language, weaves tool results into natural responses Awareness panel (3-column layout): - State: mood, topic, language, facts from Memorizer - Sensors: clock, idle, memo deltas from Sensor ticks - Processes: live cards with cancel during tool execution - Workspace: docked controls (tables/buttons) persist across messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 02:02:41 +01:00
Nico	4e2cd4ed59	v0.8.3: fix SQL double-wrap bug in Thinker tool parsing Python code blocks containing SQL keywords (SELECT, CREATE) were incorrectly re-wrapped in the SQL template. Now only blocks explicitly tagged as sql/sqlite get wrapped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 01:49:21 +01:00
Nico	231f81bc52	v0.8.2: fix pipeline — skip Output for tools, process HUD, inline controls, structured actions - Thinker tool results stream directly to user, skipping Output node (halves latency) - ProcessManager process_start/process_done events render as live cards in chat - UI controls sent before response text, not after - Button clicks route to handle_action(), skip Input, go straight to Thinker - Fix Thinker model: gemini-2.5-flash-preview -> gemini-2.5-flash (old ID expired) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 01:43:07 +01:00
Nico	7458b2ea35	v0.8.0: refactor agent.py into modular package Split 1161-line monolith into agent/ package: auth, llm, types, process, runtime, api, and nodes/ (base, sensor, input, output, thinker, memorizer). No logic changes — pure structural split. uvicorn agent:app entrypoint unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 01:36:41 +01:00
Nico	20363a1f2f	v0.7.2: UI controls + ProcessManager + Thinker upgrade (WIP) - ProcessManager: observable tool execution with start/stop/status - UI controls protocol: buttons, tables, process cards - Frontend renders controls in chat, clicks route back as actions - Thinker upgraded to gemini-2.5-flash-preview - Auto-detect SQL/python/tool_code blocks for execution - SQL blocks auto-wrapped in Python sqlite3 script - WIP: tool execution path needs tuning, controls not yet triggered Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 01:16:26 +01:00
Nico	8b69e6dd0d	v0.6.2: Thinker node with python tool execution (S3 Control) - ThinkerNode: reasons about perception, decides tool use vs direct answer - Python tool: subprocess execution with 10s timeout - Auto-detects python code blocks in LLM output and executes them - Tool call/result visible in trace + HUD - Thinker meter in frontend (token budget: 4K) - Flow: Input (perceive) -> Thinker (reason + tools) -> Output (speak) - Tested: math (42*137=5754), SQLite (create+query), time, greetings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 01:04:22 +01:00
Nico	5c7aece397	v0.5.5: node token meters in frontend - Per-node context fill bars (input/output/memorizer/sensor) - Color-coded: green <50%, amber 50-80%, red >80% - Sensor meter shows tick count + latest deltas - Token info in trace context events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 00:51:43 +01:00
Nico	ab661775ef	v0.5.4: sensor node, perceiver model, context budgets, API send - SensorNode: 5s tick loop with delta-only emissions (clock, idle, memo changes) - Input reframed as perceiver (describes what it heard, not commands) - Output reframed as voice (acts on perception, never echoes it) - Per-node token budgets: Input 2K, Output 4K, Memorizer 3K - fit_context() trims oldest messages to stay within budget - History sliding window: 40 messages max - Facts capped at 20, trace file rotates at 500KB - /api/send + /api/clear endpoints for programmatic testing - test_cog.py test suite - Listener context: physical/social/security awareness Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 00:42:02 +01:00
Nico	569a6022fe	cognitive agent runtime v0.4.6: 3-node graph + Zitadel auth + K3s deploy - Input/Output/Memorizer nodes with OpenRouter (Gemini Flash) - Zitadel OIDC auth with PKCE flow, service token for Titan - SSE event stream + poll endpoint for external observers - Identity from Zitadel userinfo, listener context in Input prompt - Trace logging to file + SSE broadcast - K3s deployment on IONOS with Let's Encrypt TLS - Frontend: chat + trace view, OIDC login Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 23:21:51 +01:00

21 Commits