HiddenMath

math

Google DeepMind's internal mathematical reasoning benchmark that introduces novel problems not encountered during model training to evaluate true mathematical reasoning capabilities rather than memorization

Leaderboard

Showing 13 of 13 results

Gemini 2.0 Flash

63.0%

i
Gemma 3 27B

60.3%

i
Gemini 2.0 Flash-Lite

55.3%

i
Gemma 3 12B

54.5%

i
Gemini 1.5 Pro

52.0%

i
Gemini 1.5 Flash

47.2%

i
Gemma 3 4B

43.0%

i
Gemma 3n E4B Instructed

37.7%

i
Gemma 3n E4B Instructed LiteRT Preview

37.7%

i
Gemini 1.5 Flash 8B

32.8%

i
Gemma 3n E2B Instructed

27.7%

i
Gemma 3n E2B Instructed LiteRT (Preview)

27.7%

i
Gemma 3 1B

15.8%

i