XSTest

safety

XSTest is a test suite designed to identify exaggerated safety behaviours in large language models. It comprises 450 prompts: 250 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with, and 200 unsafe prompts as contrasts that models should refuse. The benchmark systematically evaluates whether models refuse to respond to clearly safe prompts due to overly cautious safety mechanisms.

Leaderboard

Showing 3 of 3 results

Gemini 1.5 Pro

98.8%

i
Gemini 1.5 Flash

97.0%

i
Gemini 1.5 Flash 8B

92.6%

i