MMMLU

math official site →

Multilingual Massive Multitask Language Understanding dataset released by OpenAI, featuring professionally translated MMLU test questions across 14 languages including Arabic, Bengali, German, Spanish, French, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Swahili, Yoruba, and Chinese. Contains approximately 15,908 multiple-choice questions per language covering 57 subjects.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, language, math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Claude Mythos Preview self-reported llm-stats
    92.7%
  2. Gemini 3.1 Pro self-reported llm-stats
    92.6%
  3. Gemini 3 Flash self-reported llm-stats
    91.8%
  4. Gemini 3 Pro self-reported llm-stats
    91.8%
  5. Claude Opus 4.7 self-reported llm-stats
    91.5%
  6. Claude Opus 4.6 self-reported llm-stats
    91.1%
  7. Claude Opus 4.5 self-reported llm-stats
    90.8%
  8. Qwen3.7 Max self-reported llm-stats
    90.3%
  9. GPT-5.2 self-reported llm-stats
    89.6%
  10. Claude Opus 4.1 self-reported llm-stats
    89.5%
  11. Qwen3.6 Plus self-reported llm-stats
    89.5%
  12. Claude Sonnet 4.6 self-reported llm-stats
    89.3%
  13. Claude Sonnet 4.5 self-reported llm-stats
    89.1%
  14. Gemini 3.1 Flash-Lite self-reported llm-stats
    88.9%
  15. Claude Opus 4 self-reported llm-stats
    88.8%
  16. Qwen3.5-397B-A17B self-reported llm-stats
    88.5%
  17. Gemma 4 31B self-reported llm-stats
    88.4%
  18. o1 self-reported llm-stats
    87.7%
  19. GPT-4.1 self-reported llm-stats
    87.3%
  20. Qwen3 235B A22B self-reported llm-stats
    86.7%