MMMU

reasoning official site →

MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: general, healthcare, multimodal, reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3.6 Plus self-reported llm-stats
    86.0%
  2. GPT-5.1 self-reported llm-stats
    85.4%
  3. GPT-5.1 Instant self-reported llm-stats
    85.4%
  4. GPT-5.1 Thinking self-reported llm-stats
    85.4%
  5. GPT-5 self-reported llm-stats
    84.2%
  6. Qwen3.5-122B-A10B self-reported llm-stats
    83.9%
  7. o3 self-reported llm-stats
    82.9%
  8. Qwen3.6-27B self-reported llm-stats
    82.9%
  9. Qwen3.5-27B self-reported llm-stats
    82.3%
  10. Gemini 2.5 Pro Preview 06-05 self-reported llm-stats
    82.0%
  11. o4-mini self-reported llm-stats
    81.6%
  12. Qwen3.5-35B-A3B self-reported llm-stats
    81.4%
  13. Gemini 2.5 Flash self-reported llm-stats
    79.7%
  14. Gemini 2.5 Pro self-reported llm-stats
    79.6%
  15. Grok-3 self-reported llm-stats
    78.0%
  16. o1 self-reported llm-stats
    77.6%
  17. 76.1%
  18. Gemini 2.0 Flash Thinking self-reported llm-stats
    75.4%
  19. GPT-4.5 self-reported llm-stats
    75.2%
  20. Claude 3.7 Sonnet self-reported llm-stats
    75.0%