Pixtral Large

A 124B parameter multimodal model built on top of Mistral Large 2, featuring frontier-level image understanding capabilities. Excels at understanding documents, charts, and natural images while maintaining strong text-only performance. Features a 123B multimodal decoder and 1B parameter vision encoder with a 128K context window supporting up to 30 high-resolution images.

Benchmark results

Benchmark Score Tags Source
AI2D 93.8% self-reported llm-stats link →
ChartQA 88.1% self-reported llm-stats link →
DocVQA 93.3% self-reported llm-stats link →
MathVista 69.4% self-reported llm-stats link →
MM-MT-Bench 74 self-reported llm-stats link →
MMMU 64.0% self-reported llm-stats link →
VQAv2 80.9% self-reported llm-stats link →