VATEX

multimodal official site →

VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. Contains over 41,250 videos and 825,000 captions in both English and Chinese, with over 206,000 English-Chinese parallel translation pairs. Supports multilingual video captioning and video-guided machine translation tasks.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: language, multimodal, video, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Nova Lite self-reported llm-stats
    77.8%
  2. Nova Pro self-reported llm-stats
    77.8%