Regular testing is one of the most effective ways to maintain response quality. Run batch tests after every significant change to your agent’s configuration or knowledge base.
Accessing batch tests
Navigate to Testing in the left sidebar. You’ll see a list of all existing tests and their current status.Creating a test
Enter test details
Give your test a descriptive name and select the agent you want to test. The name should reflect the purpose of the test — for example, “Post-KB Update — Product X” or “Guardrail Compliance Check”.
Add questions
Add the questions you want your agent to answer. You can use any of the three methods below — or combine them.
Adding questions
Compass provides three ways to add questions to a batch test:- Generate
- Upload CSV
- Manual
Let Compass’s AI generate questions automatically based on your agent’s configuration, knowledge base, and guardrails. This is the fastest way to create comprehensive test coverage.Simply click Generate and Compass will create a set of relevant questions tailored to your agent. You can review, edit, or remove any generated question before running the test.
Test statuses
Each batch test moves through the following statuses:| Status | Description |
|---|---|
| Draft | The test has been created but not yet run. You can still add or edit questions. |
| Scheduled | The test is queued and will begin shortly. |
| Running | The test is actively in progress. A progress bar shows completion status. |
| Completed | All questions have been processed and results are available for review. |
| Failed | The test encountered an error and could not complete. |
While a test is running, the progress bar updates every few seconds. You can navigate away and return — the test will continue running in the background.
Reviewing results
Once a test completes, click into it to review the results. You can filter the results by:- Answer status — Success, Failed, Timeout, or Pending.
- Rating — Good, Acceptable, Poor, or Unrated.
Rating system
Each answer can be rated to track quality over time:| Rating | Meaning |
|---|---|
| Good | The response is accurate, compliant, and well-written. |
| Acceptable | The response is adequate but could be improved. |
| Poor | The response is inaccurate, non-compliant, or unhelpful. |
| Unrated | The response has not yet been reviewed. |
Question detail
Click any question in the results to see the full detail view:- Question — The full text of the test question.
- Answer — The complete agent response.
- Rating — Set or change the rating.
- Notes — Add internal notes about the response quality or any issues identified.
Test actions
From the test list or within a test, you can perform the following actions:- Rename — Change the test name.
- Duplicate — Create a copy of the test with all its questions (useful for re-running after changes).
- Export CSV — Download the test results as a CSV file for offline review or reporting.
- Delete — Permanently remove the test and its results.
- Add questions — Add additional questions to an existing test at any time.
Best practices
Test after KB changes
Run batch tests after updating your Knowledge Base to ensure the agent still provides accurate responses with the new content.
Test guardrail changes
After modifying guardrails or compliance settings, run tests that specifically probe the boundaries you’ve configured.
Build a regression suite
Create a standard set of test questions that you run regularly. Duplicate tests to quickly re-run them after making changes.
Combine methods
Use AI-generated questions for broad coverage and manual questions for targeted edge cases. This gives you the best of both worlds.

