MMLU

math

Massive Multitask Language Understanding benchmark testing knowledge across 57 diverse subjects including STEM, humanities, social sciences, and professional domains

Leaderboard

Showing 20 of 105 results

GPT-5

92.5%

i
o1

91.8%

i
GPT-4.5

90.8%

i
o1-preview

90.8%

i
Qwen3 VL 235B A22B Thinking

90.6%

i
Sarvam-105B

90.6%

i
Claude 3.5 Sonnet

90.4%

i
Claude 3.5 Sonnet

90.4%

i
Kimi K2 0905

90.2%

i
GPT-4.1

90.2%

i
GPT OSS 120B

90.0%

i
LongCat-Flash-Chat

89.7%

i
Kimi K2 Instruct

89.5%

i
Kimi K2-Instruct-0905

89.5%

i
MiMo-V2.5-Pro

89.4%

i
Qwen3 VL 235B A22B Instruct

88.8%

i
GPT-4o

88.7%

i
Claude 3.5 Sonnet

88.7%

i
GPT-4o

88.7%

i
Qwen3 VL 32B Thinking

88.7%

i