QVHighlights

multimodal official site →

QVHighlights is a video moment retrieval benchmark for detecting moments and highlights in videos via natural language queries. Given a query, the model must localize the start and end times of relevant moments in the video, evaluated using metrics such as Recall@1 at a 0.5 IoU threshold.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, video, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Nova 2 Lite self-reported llm-stats
    77.2%
  2. Nova 2 Omni self-reported llm-stats
    76.7%
  3. Nova 2 Pro self-reported llm-stats
    76.7%