CharXiv-D
reasoning official site →
CharXiv-D is the descriptive questions subset of the CharXiv benchmark, designed to assess multimodal large language models' ability to extract basic information from scientific charts. It contains descriptive questions covering information extraction, enumeration, pattern recognition, and counting across 2,323 diverse charts from arXiv papers, all curated and verified by human experts.
Methodology
Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: multimodal, reasoning, structured_output, vision. Language: en. Verified by llm-stats: no.