Nico 1000411eb2 v0.15.0: Frame engine (v3), PA + Expert architecture (v4-eras), live test streaming

Frame Engine (v3-framed):
- Tick-based deterministic pipeline: frames advance on completion, not timers
- FrameRecord/FrameTrace dataclasses for structured per-message tracing
- /api/frames endpoint: queryable frame trace history (last 20 messages)
- frame_trace HUD event with full pipeline visibility
- Reflex=2F, Director=4F, Director+Interpreter=5F deterministic frame counts

Expert Architecture (v4-eras):
- PA node (pa_v1): routes to domain experts, holds user context
- ExpertNode base: stateless executor with plan+execute two-LLM-call pattern
- ErasExpertNode: eras2_production DB specialist with DESCRIBE-first discipline
- Schema caching: DESCRIBE results reused across queries within session
- Progress streaming: PA streams thinking message, expert streams per-tool progress
- PARouting type for structured routing decisions

UI Controls Split:
- Separate thinker_controls from machine controls (current_controls is now a property)
- Machine buttons persist across Thinker responses
- Machine state parser handles both dict and list formats from Director
- Normalized button format with go/payload field mapping

WebSocket Architecture:
- /ws/test: dedicated debug socket for test runner progress
- /ws/trace: dedicated debug socket for HUD/frame trace events
- /ws (chat): cleaned up, only deltas/controls/done/cleared
- WS survives graph switch (re-attaches to new runtime)
- Pipeline result reset on clear

Test Infrastructure:
- Live test streaming: on_result callback fires per check during execution
- Frontend polling fallback (500ms) for proxy-buffered WS
- frame_trace-first trace assertion (fixes stale perceived event bug)
- action_match supports "or" patterns and multi-pattern matching
- Trace window increased to 40 events
- Graph-agnostic assertions (has X or Y)

Test Suites:
- smoketest.md: 12 steps covering all categories (~2min)
- fast.md: 10 quick checks (~1min)
- fast_v4.md: 10 v4-eras specific checks
- expert_eras.md: eras domain tests (routing, DB, schema, errors)
- expert_progress.md: progress streaming tests

Other:
- Shared db.py extracted from thinker_v2 (reused by experts)
- InputNode prompt: few-shot examples, history as context summary
- Director prompt: full tool signatures for add_state/reset_machine/destroy_machine
- nginx no-cache headers for static files during development
- Cache-busted static file references

Scores: v3 smoketest 39/40, v4-eras fast 28/28, expert_eras 23/23

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-29 17:10:31 +02:00

1.2 KiB

Raw Blame History

Pub Conversation

Tests multi-turn conversation with context tracking, language switching, and memorizer state updates across a social scenario.

Setup

clear history

Steps

1. Set the scene

send: Hey, Alice and I are heading to the pub tonight
expect_response: length > 10
expect_state: situation contains "pub" or "Alice" or "heading" or "tonight"

2. Language switch to German

send: Wir sind jetzt im Biergarten angekommen
expect_response: length > 10
expect_state: language is "de" or "mixed"

3. Context awareness

send: Was sollen wir bestellen?
expect_response: length > 10
expect_state: topic contains "bestell" or "order" or "pub" or "Biergarten"

4. Alice speaks

send: Alice says: I'll have a Hefeweizen please
expect_response: length > 10
expect_state: facts any contains "Alice" or "Hefeweizen"

5. Ask for time (tool use)

send: wie spaet ist es eigentlich?
expect_response: matches \d{1,2}:\d{2}

6. Back to English

send: Let's switch to English, what was the last thing Alice said?
expect_state: language is "en" or "mixed"
expect_response: contains "Alice" or "Hefeweizen"

7. Mood check

send: This is really fun!
expect_state: user_mood is "happy" or "playful" or "excited"

1.2 KiB Raw Blame History

Pub Conversation

Setup

Steps

1. Set the scene

2. Language switch to German

3. Context awareness

4. Alice speaks

5. Ask for time (tool use)

6. Back to English

7. Mood check

1.2 KiB

Raw Blame History