VideoMMMU

reasoning

Video-MMMU evaluates Large Multimodal Models' ability to acquire knowledge from expert-level professional videos across six disciplines through three cognitive stages: perception, comprehension, and adaptation. Contains 300 videos and 900 human-annotated questions spanning Art, Business, Science, Medicine, Humanities, and Engineering.

Leaderboard

Showing 20 of 25 results

Gemini 3 Pro

87.6%

i
Gemini 3 Flash

86.9%

i
Kimi K2.5

86.6%

i
GPT-5.2

85.9%

i
Gemini 3.1 Flash-Lite

84.8%

i
GPT-5

84.6%

i
MiniMax M3

84.6%

i
Qwen3.6-27B

84.4%

i
Qwen3.6 Plus

84.0%

i
Qwen3.6-35B-A3B

83.7%

i
Gemini 2.5 Pro Preview 06-05

83.6%

i
o3

83.3%

i
Qwen3.5-27B

82.3%

i
Qwen3.5-122B-A10B

82.0%

i
Qwen3.5-35B-A3B

80.4%

i
Qwen3 VL 235B A22B Thinking

80.0%

i
Qwen3 VL 32B Thinking

79.0%

i
Qwen3 VL 30B A3B Thinking

75.0%

i
Qwen3 VL 235B A22B Instruct

74.7%

i
Qwen3 VL 8B Thinking

72.8%

i