MMMU (validation)

reasoning official site →

Validation set of the Massive Multi-discipline Multimodal Understanding and Reasoning benchmark. Features college-level multimodal questions across 6 core disciplines (Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, Tech & Engineering) spanning 30 subjects and 183 subfields with diverse image types including charts, diagrams, maps, and tables.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: general, healthcare, multimodal, reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Claude Opus 4.5 self-reported llm-stats
    80.7%
  2. Claude Opus 4.1 self-reported llm-stats
    77.1%
  3. Claude Opus 4 self-reported llm-stats
    76.5%
  4. Claude Haiku 4.5 self-reported llm-stats
    73.2%
  5. Claude Haiku 4.5 self-reported llm-stats
    73.2%