TempCompass

reasoning

TempCompass is a comprehensive benchmark for evaluating temporal perception capabilities of Video Large Language Models (Video LLMs). It constructs conflicting videos that share identical static content but differ in specific temporal aspects to prevent models from exploiting single-frame bias. The benchmark evaluates multiple temporal aspects including action, motion, speed, temporal order, and attribute changes across diverse task formats including multi-choice QA, yes/no QA, caption matching, and caption generation.

Leaderboard

Showing 2 of 2 results

Qwen2.5 VL 72B Instruct

74.8%

i
Qwen2.5 VL 7B Instruct

71.7%

i