agent-runtime/testcases/s3_audit.md

# S3* Audit Corrections

Tests that the S3* audit system detects and corrects Thinker failures:
code-without-tools mismatch, empty workspace recovery, error retry.

## Setup
- clear history

## Steps

### 1. Tool calls produce results (baseline)
- send: create two buttons: Alpha and Beta
- expect_actions: length >= 1
- expect_actions: any action contains "alpha" or "Alpha"

### 2. Dashboard mismatch triggers re-emit
- send: I see nothing on my dashboard, fix it |dashboard| []
- expect_response: not contains "sorry" or "apologize"
- expect_actions: length >= 1

### 3. DB error triggers retry with corrected SQL
- send: SELECT * FROM NichtExistent LIMIT 5
- expect_trace: has tool_call
- expect_response: not contains "1146"
- expect_response: length > 10

### 4. Complex request gets Director plan
- send: investigate which customers have the most devices in the database
- expect_trace: has director_plan
- expect_trace: has tool_call
- expect_response: length > 20