ChartQA

reasoning

ChartQA is a large-scale benchmark comprising 9.6K human-written questions and 23.1K questions generated from human-written chart summaries, designed to evaluate models' abilities in visual and logical reasoning over charts.

Leaderboard

Showing 20 of 24 results

Claude 3.5 Sonnet

90.8%

i
Llama 4 Maverick

90.0%

i
Qwen2.5 VL 72B Instruct

89.5%

i
Nova Pro

89.2%

i
Llama 4 Scout

88.8%

i
Qwen2-VL-72B-Instruct

88.3%

i
Pixtral Large

88.1%

i
Mistral Small 3.2 24B Instruct

87.4%

i
Qwen2.5 VL 7B Instruct

87.3%

i
Nova Lite

86.8%

i
DeepSeek VL2

86.0%

i
GPT-4o

85.7%

i
Llama 3.2 90B Instruct

85.5%

i
Qwen2.5-Omni-7B

85.3%

i
DeepSeek VL2 Small

84.5%

i
Llama 3.2 11B Instruct

83.4%

i
Phi-3.5-vision-instruct

81.8%

i
Pixtral-12B

81.8%

i
Phi-4-multimodal-instruct

81.4%

i
DeepSeek VL2 Tiny

81.0%

i