VATEX
multimodal official site →
VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. Contains over 41,250 videos and 825,000 captions in both English and Chinese, with over 206,000 English-Chinese parallel translation pairs. Supports multilingual video captioning and video-guided machine translation tasks.
Methodology
Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: language, multimodal, video, vision. Language: en. Verified by llm-stats: no.