OpenAI MMLU
math official site →
MMLU (Massive Multitask Language Understanding) is a comprehensive benchmark that measures a text model's multitask accuracy across 57 diverse academic and professional subjects. The test covers elementary mathematics, US history, computer science, law, morality, business ethics, clinical knowledge, and many other domains spanning STEM, humanities, social sciences, and professional fields. To attain high accuracy, models must possess extensive world knowledge and problem-solving ability.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: chemistry, economics, finance, general, healthcare, legal, math, physics, psychology, reasoning. Language: en. Verified by llm-stats: no.