How to Use Ashr | Step-by-Step AI Agent Evaluation Guide

As AI agents move from experimental chatbots to production-grade workers, testing their reliability is critical. Ashr provides an 'Agent-in-the-Loop' evaluation platform to stress-test these autonomous systems. This guide will walk you through setting up and using Ashr to ensure your agents perform consistently in real-world scenarios.

Step 1: Connect Your AI Agent

The first step is integrating Ashr with your existing environment. Ashr is provider-agnostic and works via a lightweight API or SDK. You simply provide the endpoint where your agent lives. This allows Ashr to send synthetic user requests to your agent and monitor its reasoning and tool-call logic without requiring major changes to your codebase.

Step 2: Define User Archetypes and Stories

To test how an agent handles diverse interactions, you must define 'User Archetypes.' Within the Ashr dashboard, you can describe different personas—such as an angry customer, a technical power user, or even a malicious user attempting prompt injection. Ashr uses these archetypes to generate thousands of unique, branching user stories that simulate authentic production journeys.

Step 3: Set Custom Evaluation Metrics

One of Ashr’s most powerful features is the ability to define business-specific success criteria. You can set rules such as: 'Agent must never provide medical advice' or 'Agent must confirm account details before updating a subscription.' Ashr will automatically evaluate every simulated journey against these metrics, grading the agent’s logic and adherence to your safety guardrails.

Step 4: Run Simulations and Stress Tests

Once your archetypes and metrics are set, trigger a simulation run. Ashr will execute thousands of parallel conversations with your agent, mimicking rapid-fire user interactions. It specifically looks for edge cases—situations where the agent might hallucinate, fail to call a tool correctly, or deviate from the intended workflow. This high-volume stress testing uncovers bugs that manual QA would likely miss.

Step 5: Analyze the Health Score and Iterated

After the simulation completes, Ashr provides a comprehensive 'Health Score' and detailed failure reports. You can dive into specific journeys that failed, seeing exactly where the agent's reasoning broke down. Use these insights to refine your prompts, update your agent's tool-call documentation, or adjust your fine-tuning data. Repeat the process until your agent achieves a reliability score high enough for production deployment.

How to Use Ashr: A Step-by-Step Guide for AI Agent Testing

Step 1: Connect Your AI Agent

Step 2: Define User Archetypes and Stories

Step 3: Set Custom Evaluation Metrics

Step 4: Run Simulations and Stress Tests

Step 5: Analyze the Health Score and Iterated

Similar Guides

How to Use Monday (Complete Beginner to Advanced Guide)

How to Use Canva

How to Use Notion: Complete Beginner to Advanced Guide

Get the best tools delivered