VQAv2 (test)

reasoning

VQA v2.0 (Visual Question Answering v2.0) is a balanced dataset designed to counter language priors in visual question answering. It consists of complementary image pairs where the same question yields different answers, forcing models to rely on visual understanding rather than language bias. The dataset contains 1,105,904 questions across 204,721 COCO images, requiring understanding of vision, language, and commonsense knowledge.

Leaderboard

Showing 1 of 1 result

Llama 3.2 11B Instruct

75.2%

i