Grok-1.5V

A multimodal model capable of processing text and visual information, including documents, diagrams, charts, screenshots, and photographs. Notable for strong real-world spatial understanding capabilities.

Benchmark results

Benchmark Score Tags Source
AI2D 88.3% self-reported llm-stats link →
ChartQA 76.1% self-reported llm-stats link →
DocVQA 85.6% self-reported llm-stats link →
MathVista 52.8% self-reported llm-stats link →
MMMU 53.6% self-reported llm-stats link →
RealWorldQA 68.7% self-reported llm-stats link →
TextVQA 78.1% self-reported llm-stats link →