MMBench-Video

reasoning official site →

A long-form multi-shot benchmark for holistic video understanding that incorporates approximately 600 web videos from YouTube spanning 16 major categories, with each video ranging from 30 seconds to 6 minutes. Includes roughly 2,000 original question-answer pairs covering 26 fine-grained capabilities.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, reasoning, video, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5 VL 72B Instruct self-reported llm-stats
    2.0%
  2. Qwen2.5 VL 32B Instruct self-reported llm-stats
    1.9%
  3. Qwen2.5 VL 7B Instruct self-reported llm-stats
    1.8%