MMMU (val)

reasoning official site →

Validation set of the Massive Multi-discipline Multimodal Understanding and Reasoning benchmark. Features college-level multimodal questions across 6 core disciplines (Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, Tech & Engineering) spanning 30 subjects and 183 subfields with diverse image types including charts, diagrams, maps, and tables.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: general, healthcare, multimodal, reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3 VL 32B Thinking self-reported llm-stats
    78.1%
  2. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    76.0%
  3. Qwen3 VL 32B Instruct self-reported llm-stats
    76.0%
  4. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    74.2%
  5. Qwen3 VL 8B Thinking self-reported llm-stats
    74.1%
  6. Qwen3 VL 4B Thinking self-reported llm-stats
    70.8%
  7. Qwen3 VL 8B Instruct self-reported llm-stats
    69.6%
  8. Qwen3 VL 4B Instruct self-reported llm-stats
    67.4%
  9. Gemma 3 27B self-reported llm-stats
    64.9%
  10. Gemma 3 12B self-reported llm-stats
    59.6%
  11. Gemma 3 4B self-reported llm-stats
    48.8%