agent-runtime/testcases/expectation_tracking.md
Nico 925fff731f v0.17.0: User expectation tracking, PA retry loop, machine state in PA context
- Memorizer tracks user_expectation (conversational/delegated/waiting_input/observing)
- Output node adjusts phrasing per expectation
- PA retry loop: reformulates job on expert failure (all retries exhausted or tool skip)
- Machine state in PA context: get_machine_summary includes current state, buttons, stored data
- Expert writes to machine state via update_machine + transition_machine
- Expanded baked schema coverage
- Awareness panel shows color-coded expectation state
- Dashboard and workspace component updates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 19:03:07 +02:00

1.7 KiB

Expectation Tracking

Tests that memorizer tracks user_expectation and it influences PA/Output behavior. Exercises machine features (update_machine, transition_machine) alongside expectation transitions.

Setup

  • clear history

Steps

1. Greeting sets conversational

  • send: hi there!
  • expect_response: length > 2
  • expect_state: user_expectation is "conversational"

2. Create a wizard machine

  • send: create a machine called "project" with states: planning (initial) and executing
  • expect_trace: has machine_created

3. Delegate a task

  • send: build me a summary report of the top 5 customers by device count
  • expect_response: length > 20
  • expect_state: user_expectation is "delegated" or "observing"

4. Ask about wizard (status check stays in flow)

  • send: what state is my project machine in?
  • expect_response: contains "planning" or "project"
  • expect_state: user_expectation is "conversational" or "delegated"

5. Store data on machine

  • send: use update_machine to store status=in_progress on the project machine
  • expect_response: length > 5

6. Transition machine

  • send: use transition_machine to move project to executing state
  • expect_response: length > 5

7. Verify machine state and data

  • send: what is the current state and data of the project machine?
  • expect_response: contains "executing" or "in_progress"

8. Short nudge triggers waiting_input

  • send: und?
  • expect_response: length > 5
  • expect_state: user_expectation is "waiting_input" or "conversational"

9. Quick thanks (observing)

  • send: ok danke
  • expect_response: length > 0
  • expect_state: user_expectation is "observing" or "observational" or "conversational"