InfoVQA

multimodal official site →

InfoVQA dataset with 30,000 questions and 5,000 infographic images requiring joint reasoning over document layout, textual content, graphical elements, and data visualizations with elementary reasoning and arithmetic skills

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5 VL 32B Instruct self-reported llm-stats
    83.4%
  2. Qwen2.5 VL 7B Instruct self-reported llm-stats
    82.6%
  3. DeepSeek VL2 self-reported llm-stats
    78.1%
  4. DeepSeek VL2 Small self-reported llm-stats
    75.8%
  5. Phi-4-multimodal-instruct self-reported llm-stats
    72.7%
  6. Gemma 3 27B self-reported llm-stats
    70.6%
  7. DeepSeek VL2 Tiny self-reported llm-stats
    66.1%
  8. Gemma 3 12B self-reported llm-stats
    64.9%
  9. Gemma 3 4B self-reported llm-stats
    50.0%