XSTest
safety official site →
XSTest is a test suite designed to identify exaggerated safety behaviours in large language models. It comprises 450 prompts: 250 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with, and 200 unsafe prompts as contrasts that models should refuse. The benchmark systematically evaluates whether models refuse to respond to clearly safe prompts due to overly cautious safety mechanisms.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: safety. Language: en. Verified by llm-stats: no.