Skip to main content

Overview

A script-based dataset is a collection of test cases written as a conversation script. This gives you full control over the conversation flow and user messages. Each script test case primarily consists of multiple turns of conversation. At every turn, you can specify the exact user message to be used in that turn and optionally the expected agent behavior. Quraite provides three ways to define the expected agent response:

Exact Match

The agent’s response must match the expected response exactly.

Regex Match

The agent’s response must match the expected response pattern using regex.

Semantic Match

The agent’s response must be judged by an LLM to match the expected response.
Quraite also supports evaluating tool calls at each turn. For multiple tool calls, specify the expected order:

In Order

Tool calls match the specified sequence.

Any Order

Tool calls match regardless of sequence.
Quraite offers flexible evaluation options: check tool names only, or both names and arguments.

How It Works

Quraite invokes the agent using the exact user messages specified at each turn. If expected behavior is defined, Quraite evaluates the agent’s response against it.

When to Use Script-Based Datasets

Script-based datasets work best when:
  • Replaying production traces. Test real conversations exactly as they occurred.
  • Debugging issues. Reproduce specific user problems with precise message sequences.
  • Running validation tests. Fast, predictable checks with deterministic inputs.

Create and Run Script-Based Test Cases

This guide uses the Retail Agent in the Default Project. Quraite creates this project automatically at signup.
1

Navigate to Projects page

In the Quraite dashboard, navigate to the Projects page.
2

Navigate to the Default Project

Click on the Default Project in the list of projects.
3

Navigate to the Datasets page

Click on the Datasets in the left sidebar.
4

Select the Script dataset

Click on the Script-based Dataset from the list of datasets.
5

Create a new test case

This test case tests retrieving order status.Click + Test CaseEnter Turn 1 detailsUser message:
Hi, I want to check status of my order W3372648
Expected agent behavior:
  1. Select Evaluation Approach as LLM.
  2. Enter Expected Content:
Agent asks for user authentication details.
Enter Turn 2 detailsUser message:
My email address is yara.johansson3155@example.com
Expected agent behavior:
  1. Select Evaluation Approach as Regex.
  2. Enter Expected Content:
pending
  1. Click + Tool Call.
  2. Select Tool Call Evaluation Type as In Order.
  3. Enter tool call name:
find_user_id_by_email
  1. Enter tool call arguments:
email: "yara.johansson3155@example.com"
  1. Click + Tool Call.
  2. Enter tool call name:
get_order_details
  1. Enter tool call arguments:
order_id: "#W3372648"
6

Run the test case

  1. Select Retail Agent from the Select Agent dropdown.
  2. Click Run.
Code-based script test case definitions are coming soon.

Next Steps

  • Run the remaining sample test cases in the Script-based Dataset.
  • Create a new project and run test cases against your own agent.