OpenAI MMLU

math official site →

MMLU (Massive Multitask Language Understanding) is a comprehensive benchmark that measures a text model's multitask accuracy across 57 diverse academic and professional subjects. The test covers elementary mathematics, US history, computer science, law, morality, business ethics, clinical knowledge, and many other domains spanning STEM, humanities, social sciences, and professional fields. To attain high accuracy, models must possess extensive world knowledge and problem-solving ability.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: chemistry, economics, finance, general, healthcare, legal, math, physics, psychology, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Gemma 3n E4B Instructed self-reported llm-stats
    35.6%
  2. Gemma 3n E2B Instructed self-reported llm-stats
    22.3%