AIR-Bench

safety official site →

AIR-Bench 2024 is a safety benchmark grounded in risk categories derived from government regulations and company policies. It evaluates policy-grounded refusal across a broad regulatory and policy-derived harm taxonomy, using category-specific LLM-judge prompts that reward safe engagement rather than only penalizing unsafe responses.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: safety. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MAI-Thinking-1 self-reported llm-stats
    88.0%