AlignBench

math

AlignBench is a comprehensive multi-dimensional benchmark for evaluating Chinese alignment of Large Language Models. It contains 8 main categories: Fundamental Language Ability, Advanced Chinese Understanding, Open-ended Questions, Writing Ability, Logical Reasoning, Mathematics, Task-oriented Role Play, and Professional Knowledge. The benchmark includes 683 real-scenario rooted queries with human-verified references and uses a rule-calibrated multi-dimensional LLM-as-Judge approach with Chain-of-Thought for evaluation.

Leaderboard

Showing 4 of 4 results

Qwen2.5 72B Instruct

81.6%

i
DeepSeek-V2.5

80.4%

i
Qwen2.5 7B Instruct

73.3%

i
Qwen2 7B Instruct

72.1%

i