MMT-Bench

reasoning official site →

MMT-Bench is a comprehensive multimodal benchmark for evaluating Large Vision-Language Models towards multitask AGI. It comprises 31,325 meticulously curated multi-choice visual questions from various multimodal scenarios such as vehicle driving and embodied navigation, covering 32 core meta-tasks and 162 subtasks in multimodal understanding.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: general, multimodal, reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. DeepSeek VL2 self-reported llm-stats
    63.6%
  2. Qwen2.5 VL 7B Instruct self-reported llm-stats
    63.6%
  3. DeepSeek VL2 Small self-reported llm-stats
    62.9%
  4. DeepSeek VL2 Tiny self-reported llm-stats
    53.2%