IMO-AnswerBench

math

IMO-AnswerBench is a benchmark for evaluating mathematical reasoning capabilities on International Mathematical Olympiad (IMO) problems, focusing on answer generation and verification.

Leaderboard

Showing 15 of 15 results

Qwen3.7 Max

90.0%

i
DeepSeek-V4-Pro-Max

89.8%

i
DeepSeek-V4-Flash-Max

88.4%

i
Kimi K2.6

86.0%

i
Step-3.5-Flash

85.4%

i
GLM-5.1

83.8%

i
Qwen3.6 Plus

83.8%

i
GLM-4.7

82.0%

i
Kimi K2.5

81.8%

i
Qwen3.5-397B-A17B

80.9%

i
Qwen3.6-27B

80.8%

i
Qwen3.6-35B-A3B

78.9%

i
Kimi K2-Thinking-0905

78.6%

i
LongCat-Flash-Thinking-2601

78.6%

i
DeepSeek-V3.2

78.3%

i