Video-MME (long, no subtitles)

multimodal official site →

Video-MME is the first-ever comprehensive evaluation benchmark for Multi-modal Large Language Models (MLLMs) in video analysis. This variant focuses on long-term videos (30min-60min) without subtitle inputs, testing robust contextual dynamics across 6 primary visual domains with 30 subfields including knowledge, film & television, sports competition, life record, and multilingual content.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, video, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. GPT-4.1 self-reported llm-stats
    72.0%