MLVU-M

general

MLVU-M benchmark

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3 VL 32B Instruct self-reported llm-stats
    82.1%
  2. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    81.3%
  3. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    78.9%
  4. Qwen3 VL 8B Instruct self-reported llm-stats
    78.1%
  5. Qwen3 VL 4B Thinking self-reported llm-stats
    75.7%
  6. Qwen3 VL 4B Instruct self-reported llm-stats
    75.3%
  7. Qwen3 VL 8B Thinking self-reported llm-stats
    75.1%
  8. Qwen2.5 VL 72B Instruct self-reported llm-stats
    74.6%