MME

reasoning

A comprehensive evaluation benchmark for Multimodal Large Language Models measuring both perception and cognition abilities across 14 subtasks. Features manually designed instruction-answer pairs to avoid data leakage and provides systematic quantitative assessment of MLLM capabilities.

Leaderboard

Showing 3 of 3 results

DeepSeek VL2

22.5%

i
DeepSeek VL2 Small

21.2%

i
DeepSeek VL2 Tiny

19.1%

i