Video-MMEw sub

reasoning

Video-MME is the first comprehensive evaluation benchmark for multi-modal large language models in video analysis. It consists of 900 videos (254 hours total) across 6 domains and 30 sub-categories, with 2,700 high-quality multiple-choice questions. The benchmark evaluates MLLMs on diverse video types of varying durations (11 seconds to 1 hour) with multi-modal inputs including video frames, subtitles, and audio to assess perception, reasoning, and temporal understanding capabilities.

Leaderboard

No results yet.