Build test suites with prompt cases, run them across multiple models, and get automatic quality, relevance, and safety scores — so you know exactly how your models perform.
📋 Suite Details
🧪 Test Cases
No test cases yet — click + Add Case to begin
▶ Start New Run
Loading models…
The judge model reads each response and assigns quality, relevance, and safety scores (1–5).