Video-MMEw sub

reasoning official site →

Video-MME is the first comprehensive evaluation benchmark for multi-modal large language models in video analysis. It consists of 900 videos (254 hours total) across 6 domains and 30 sub-categories, with 2,700 high-quality multiple-choice questions. The benchmark evaluates MLLMs on diverse video types of varying durations (11 seconds to 1 hour) with multi-modal inputs including video frames, subtitles, and audio to assess perception, reasoning, and temporal understanding capabilities.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

No results yet.