# S3* Audit Corrections Tests that the S3* audit system detects and corrects Thinker failures: code-without-tools mismatch, empty workspace recovery, error retry. ## Setup - clear history ## Steps ### 1. Tool calls produce results (baseline) - send: create two buttons: Alpha and Beta - expect_actions: length >= 1 - expect_actions: any action contains "alpha" or "Alpha" ### 2. Dashboard mismatch triggers re-emit - send: I see nothing on my dashboard, fix it |dashboard| [] - expect_response: length > 5 - expect_actions: length >= 1 ### 3. DB error triggers retry with corrected SQL - send: SELECT * FROM NichtExistent LIMIT 5 - expect_trace: has tool_call - expect_response: not contains "1146" - expect_response: length > 10 ### 4. Complex request gets Director plan - send: investigate which customers have the most devices in the database - expect_trace: has director_plan or decided - expect_trace: has tool_call - expect_response: length > 20