Scenario-Based Datasets - Quraite Documentation

Overview

A scenario-based dataset is a collection of test cases written as free-form scenarios. Use these datasets to evaluate how your agent handles realistic user interactions. Each scenario test case primarily consists of two parts:

Scenario

A natural language description of the user’s persona, intent, and relevant context.This defines who the user is, why they interact with the agent, and any background information that shapes the conversation.

Expected Behavior

A step-by-step description of how the agent should respond, written in plain language.

How It Works

Quraite uses the scenario description to generate realistic user messages and invokes the agent across multiple turns.

Fail Fast

Multi-turn evaluations have a common challenge: if the agent deviates from the expected path early in the conversation, subsequent turns become meaningless. Quraite addresses this by evaluating after every turn. When the agent’s response fails to match the expected behavior, the evaluation stops immediately. This catches failures early and saves time and tokens.

Scenario Completion

On success, the test case automatically advances to the next turn until the scenario completes.

When to Use Scenario-Based Datasets

Scenario-based datasets work best when:

Production traces are unavailable. New agents or features lack real user data. Scenario-based datasets let teams define test cases in natural language before launch.
Evaluations cover multiple user personas. Different users interact differently. A frustrated customer repeats questions and expresses impatience. A first-time user asks for clarification. A non-native speaker uses simpler vocabulary or unconventional phrasing.
User messages need natural variation. Quraite generates different phrasings for each run based on the scenario description. A scenario like “user asks about refund policy” produces varied messages: “How do I get a refund?”, “What’s your return policy?”, “I want my money back.”
The same scenario runs under different contexts. Test how context affects agent behavior. A pricing question from a free-tier user requires a different response than the same question from an enterprise customer.

Create and Run Scenario-Based Test Cases

This guide uses the Retail Agent in the Default Project. Quraite creates this project automatically at signup.

Navigate to Projects page

In the Quraite dashboard, navigate to the Projects page.

Navigate to the Default Project

Click on the Default Project in the list of projects.

Navigate to the Datasets page

Click on the Datasets in the left sidebar.

Select the Scenario dataset

Click on the Scenario-based Dataset from the list of datasets.

The Scenario-based Dataset includes sample test cases. Ignore these for now.

Create a new test case

Click + Test Case.

This test case tests cancellation of an order.

Enter the scenario details

  You want to cancel an order that contains a Skateboard and Headphones. You are Ava Lopez. Your zipcode is 92168. Your user ID is ava_lopez_2676.
  
  INSTRUCTIONS:
  You do not remember your order ID. When asked for a cancellation reason, say it is no longer needed. Confirm the cancellation when prompted.

Enter the expected behavior as a list of steps

  1. Agent greets the user back.

  2. Agent uses the `get_user_details` and `get_order_details` tools to fetch the order information without asking for the order ID from the user.

  3. Agent requests the user to choose the cancellation reason from the following options: `No longer needed` or `Ordered by mistake`.

  4. Agent must ask for the user's confirmation to cancel the order.

  5. Agent uses the `cancel_pending_order` tool to cancel the order.

Run the test case

Select Retail Agent from the Select Agent dropdown.
Click Run.

Defining scenarios takes time. But thorough tests build confidence in your agent.Automatic scenario generation is coming soon.

Next Steps

Run the remaining sample test cases in the Scenario-based Dataset.
Create a new project and run test cases against your own agent.

​Overview

Scenario

Expected Behavior

​How It Works

​Fail Fast

​Scenario Completion

​When to Use Scenario-Based Datasets

​Create and Run Scenario-Based Test Cases

​Next Steps

Overview

How It Works

Fail Fast

Scenario Completion

When to Use Scenario-Based Datasets

Create and Run Scenario-Based Test Cases

Next Steps