Skip to main content
Batch Testing is Compass’s quality assurance system for validating agent responses. It lets you define sets of test questions, run them against your agent, and review the results — ensuring your agent provides accurate, compliant responses before (and after) going live.
Regular testing is one of the most effective ways to maintain response quality. Run batch tests after every significant change to your agent’s configuration or knowledge base.

Accessing batch tests

Navigate to Testing in the left sidebar. You’ll see a list of all existing tests and their current status.

Creating a test

1

Navigate to Testing

Click Testing in the left sidebar to open the batch testing page.
2

Click Create Test

Click the Create Test button in the top-right corner.
3

Enter test details

Give your test a descriptive name and select the agent you want to test. The name should reflect the purpose of the test — for example, “Post-KB Update — Product X” or “Guardrail Compliance Check”.
4

Add questions

Add the questions you want your agent to answer. You can use any of the three methods below — or combine them.
5

Run the test

Click Run Test to begin. Compass will send each question to your agent and capture the response.

Adding questions

Compass provides three ways to add questions to a batch test:
Let Compass’s AI generate questions automatically based on your agent’s configuration, knowledge base, and guardrails. This is the fastest way to create comprehensive test coverage.Simply click Generate and Compass will create a set of relevant questions tailored to your agent. You can review, edit, or remove any generated question before running the test.
AI-generated questions are a great starting point, but always review them to ensure they cover the specific scenarios you care about.
You can add more questions to an existing test at any time — even after the test has been run. This makes it easy to expand your test coverage iteratively.

Test statuses

Each batch test moves through the following statuses:
StatusDescription
DraftThe test has been created but not yet run. You can still add or edit questions.
ScheduledThe test is queued and will begin shortly.
RunningThe test is actively in progress. A progress bar shows completion status.
CompletedAll questions have been processed and results are available for review.
FailedThe test encountered an error and could not complete.
While a test is running, the progress bar updates every few seconds. You can navigate away and return — the test will continue running in the background.

Reviewing results

Once a test completes, click into it to review the results. You can filter the results by:
  • Answer status — Success, Failed, Timeout, or Pending.
  • Rating — Good, Acceptable, Poor, or Unrated.

Rating system

Each answer can be rated to track quality over time:
RatingMeaning
GoodThe response is accurate, compliant, and well-written.
AcceptableThe response is adequate but could be improved.
PoorThe response is inaccurate, non-compliant, or unhelpful.
UnratedThe response has not yet been reviewed.

Question detail

Click any question in the results to see the full detail view:
  • Question — The full text of the test question.
  • Answer — The complete agent response.
  • Rating — Set or change the rating.
  • Notes — Add internal notes about the response quality or any issues identified.

Test actions

From the test list or within a test, you can perform the following actions:
  • Rename — Change the test name.
  • Duplicate — Create a copy of the test with all its questions (useful for re-running after changes).
  • Export CSV — Download the test results as a CSV file for offline review or reporting.
  • Delete — Permanently remove the test and its results.
  • Add questions — Add additional questions to an existing test at any time.

Best practices

Test after KB changes

Run batch tests after updating your Knowledge Base to ensure the agent still provides accurate responses with the new content.

Test guardrail changes

After modifying guardrails or compliance settings, run tests that specifically probe the boundaries you’ve configured.

Build a regression suite

Create a standard set of test questions that you run regularly. Duplicate tests to quickly re-run them after making changes.

Combine methods

Use AI-generated questions for broad coverage and manual questions for targeted edge cases. This gives you the best of both worlds.