ScienceQA

math official site →

ScienceQA is the first large-scale multimodal science question answering benchmark with 21,208 multiple-choice questions covering 3 subjects (natural science, language science, social science), 26 topics, 127 categories, and 379 skills. The benchmark includes both text and image modalities, featuring detailed explanations and Chain-of-Thought reasoning to diagnose multi-hop reasoning ability.

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: math, multimodal, reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Phi-3.5-vision-instruct self-reported llm-stats
    91.3%