QVHighlights
multimodal official site →
QVHighlights is a video moment retrieval benchmark for detecting moments and highlights in videos via natural language queries. Given a query, the model must localize the start and end times of relevant moments in the video, evaluated using metrics such as Recall@1 at a 0.5 IoU threshold.
Methodology
Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, video, vision. Language: en. Verified by llm-stats: no.