Grok-1.5V
A multimodal model capable of processing text and visual information, including documents, diagrams, charts, screenshots, and photographs. Notable for strong real-world spatial understanding capabilities.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 88.3% | self-reported llm-stats | link → |
| ChartQA | 76.1% | self-reported llm-stats | link → |
| DocVQA | 85.6% | self-reported llm-stats | link → |
| MathVista | 52.8% | self-reported llm-stats | link → |
| MMMU | 53.6% | self-reported llm-stats | link → |
| RealWorldQA | 68.7% | self-reported llm-stats | link → |
| TextVQA | 78.1% | self-reported llm-stats | link → |