AGIEval
math official site →
A human-centric benchmark for evaluating foundation models on standardized exams including college entrance exams (Gaokao, SAT), law school admission tests (LSAT), math competitions, lawyer qualification tests, and civil service exams. Contains 20 tasks (18 multiple-choice, 2 cloze) designed to assess understanding, knowledge, reasoning, and calculation abilities in real-world academic and professional contexts.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, legal, math, reasoning. Language: en. Verified by llm-stats: no.