EgoSchema

reasoning

A diagnostic benchmark for very long-form video language understanding consisting of over 5000 human curated multiple choice questions based on 3-minute video clips from Ego4D, covering a broad range of natural human activities and behaviors

Leaderboard

Showing 9 of 9 results

Qwen2-VL-72B-Instruct

77.9%

i
Qwen2.5 VL 72B Instruct

76.2%

i
GPT-4o

72.2%

i
Nova Pro

72.1%

i
Gemini 2.0 Flash

71.5%

i
Nova Lite

71.4%

i
Qwen2.5-Omni-7B

68.6%

i
Gemini 2.0 Flash-Lite

67.2%

i
Gemini 1.0 Pro

55.7%

i