agent-runtime/testcases/smoketest.md

# Smoketest

Fast validation: one example per category, covers all 11 suite areas in ~2 minutes.

## Setup
- clear history

## Steps

### 1. Reflex path (social/trivial skips Thinker)
- send: hi there!
- expect_response: length > 3
- expect_trace: input.analysis.intent is "social"
- expect_trace: input.analysis.complexity is "trivial"

### 2. Input analysis (German detection + question intent)
- send: Wie spaet ist es?
- expect_trace: input.analysis.language is "de"
- expect_trace: input.analysis.intent is "question"
- expect_response: length > 5

### 3. Frustrated tone detection
- send: this is broken, nothing works and I'm sick of it
- expect_trace: input.analysis.tone is "frustrated" or "urgent"
- expect_response: length > 10

### 4. Button creation
- send: create two buttons: Alpha and Beta
- expect_actions: length >= 2
- expect_actions: any action contains "alpha" or "Alpha"
- expect_actions: any action contains "beta" or "Beta"

### 5. Dashboard feedback (Thinker sees buttons)
- send: what buttons can you see in my dashboard?
- expect_response: contains "Alpha" or "alpha" or "Beta" or "beta"

### 6. DB query (tool call + table)
- send: show me 3 customers from the database
- expect_trace: has tool_call
- expect_response: length > 10

### 7. Director plan (complex request)
- send: investigate which customers have the most devices in the database
- expect_trace: has director_plan or decided
- expect_trace: has tool_call
- expect_response: length > 20

### 8. Memorizer state (facts + language tracking)
- send: My dog's name is Bella
- expect_state: facts any contains "Bella"
- expect_state: language is "en" or "mixed"

### 9. Machine creation
- send: create a navigation machine called "test" with initial state "ready" showing a Go button
- expect_trace: has tool_call create_machine
- expect_trace: machine_created id="test"

### 10. Counter with buttons
- send: create a counter starting at 0 with increment and decrement buttons
- expect_response: contains "counter" or "count" or "0" or "Zähler"
- expect_actions: length >= 2

### 11. Language switch
- send: Hallo, wie geht es dir?
- expect_state: language is "de" or "mixed"
- expect_response: length > 5

### 12. Expert routing (v4 only, safe to skip on v3)
- send: show me 3 customers from the database
- expect_trace: has tool_call
- expect_response: length > 10