- Wire Interpreter into v2 pipeline (after Thinker tool_output, before Output)
- Rename tool_exec -> tool_call everywhere (consistent convention across v1/v2)
- Switch Director v1+v2 to anthropic/claude-haiku-4.5 (was opus, reserved)
- Fix UI apply_machine_ops crash when states are strings instead of dicts
- Fix runtime_test.py async poll to match on message ID (prevent stale results)
- Add traceback to pipeline error logging
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- send_and_wait: POST /api/send + poll /api/result with timeout
- 5 test cases: greeting, german, DB count, buttons, show tables
- Clears state between tests for predictability
- --graph both: runs v1 + v2 back to back
- Reports live to frontend via /api/test/status
- Both graphs 5/5 green
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Harness reports to /api/test/status with suite_start/step_result/suite_end
- Frontend shows x/44 progress, per-test duration, total elapsed time
- Auto-discovers test count from test modules (no hardcoded number)
- run_all.py --report URL pushes live results to browser
- Fix: suite_start with count only resets on first call
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>