agent-runtime/testcases/expectation_tracking.md
Nico 925fff731f v0.17.0: User expectation tracking, PA retry loop, machine state in PA context
- Memorizer tracks user_expectation (conversational/delegated/waiting_input/observing)
- Output node adjusts phrasing per expectation
- PA retry loop: reformulates job on expert failure (all retries exhausted or tool skip)
- Machine state in PA context: get_machine_summary includes current state, buttons, stored data
- Expert writes to machine state via update_machine + transition_machine
- Expanded baked schema coverage
- Awareness panel shows color-coded expectation state
- Dashboard and workspace component updates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 19:03:07 +02:00

51 lines
1.7 KiB
Markdown

# Expectation Tracking
Tests that memorizer tracks user_expectation and it influences PA/Output behavior.
Exercises machine features (update_machine, transition_machine) alongside expectation transitions.
## Setup
- clear history
## Steps
### 1. Greeting sets conversational
- send: hi there!
- expect_response: length > 2
- expect_state: user_expectation is "conversational"
### 2. Create a wizard machine
- send: create a machine called "project" with states: planning (initial) and executing
- expect_trace: has machine_created
### 3. Delegate a task
- send: build me a summary report of the top 5 customers by device count
- expect_response: length > 20
- expect_state: user_expectation is "delegated" or "observing"
### 4. Ask about wizard (status check stays in flow)
- send: what state is my project machine in?
- expect_response: contains "planning" or "project"
- expect_state: user_expectation is "conversational" or "delegated"
### 5. Store data on machine
- send: use update_machine to store status=in_progress on the project machine
- expect_response: length > 5
### 6. Transition machine
- send: use transition_machine to move project to executing state
- expect_response: length > 5
### 7. Verify machine state and data
- send: what is the current state and data of the project machine?
- expect_response: contains "executing" or "in_progress"
### 8. Short nudge triggers waiting_input
- send: und?
- expect_response: length > 5
- expect_state: user_expectation is "waiting_input" or "conversational"
### 9. Quick thanks (observing)
- send: ok danke
- expect_response: length > 0
- expect_state: user_expectation is "observing" or "observational" or "conversational"