MMMUval

reasoning official site →

Validation set for MMMU (Massive Multi-discipline Multimodal Understanding and Reasoning) benchmark, designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning across Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: general, healthcare, multimodal, reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    80.6%
  2. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    78.7%
  3. Claude Sonnet 4.5 self-reported llm-stats
    77.8%
  4. Qwen2-VL-72B-Instruct self-reported llm-stats
    64.5%