Overview
An Experiment is a single execution of a dataset against a specific agent. It binds together an agent, a dataset (or subset of it), and a set of metrics to produce a structured evaluation run. Each experiment run produces:- Trace-level metrics — latency, cost, and token usage
- Turn-by-turn conversation logs — including all tool calls
- Per-testcase pass/fail signals — with reasoning
- Metric evaluations
“How does this agent perform on this dataset, with this model or prompt and these metrics?”
How Experiments Are Executed
When an experiment runs, Quraite loads the dataset and selected agent, then invokes the agent against every testcase. How each testcase is executed depends on its type:- Script-based testcases replay the predefined turns in sequence
- Scenario-based testcases dynamically generate user messages and continue turns until the scenario concludes
- In both cases, the agent endpoint is called once per turn
Configurability
The following can be configured when running an experiment:- Concurrent testcases — how many testcases run in parallel
- Concurrent metrics per testcase — how many metrics are evaluated simultaneously

