MMVet

math official site →

MM-Vet is an evaluation benchmark that examines large multimodal models on complicated multimodal tasks requiring integrated capabilities. It assesses six core vision-language capabilities: recognition, knowledge, spatial awareness, language generation, OCR, and math through questions that require one or more of these capabilities.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: general, math, multimodal, reasoning, spatial_reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5 VL 72B Instruct self-reported llm-stats
    76.2%
  2. Qwen2.5 VL 7B Instruct self-reported llm-stats
    67.1%