AGIEval

math

A human-centric benchmark for evaluating foundation models on standardized exams including college entrance exams (Gaokao, SAT), law school admission tests (LSAT), math competitions, lawyer qualification tests, and civil service exams. Contains 20 tasks (18 multiple-choice, 2 cloze) designed to assess understanding, knowledge, reasoning, and calculation abilities in real-world academic and professional contexts.

Leaderboard

Showing 10 of 10 results

Mistral Small 3 24B Base

65.8%

i
Ministral 3 (14B Base 2512)

64.8%

i
Ministral 3 (8B Base 2512)

59.1%

i
Hermes 3 70B

56.2%

i
Gemma 2 27B

55.1%

i
Gemma 2 9B

52.8%

i
Ministral 3 (3B Base 2512)

51.1%

i
Granite 3.3 8B Base

49.3%

i
Ministral 8B Instruct

48.3%

i
ERNIE 4.5

28.5%

i