Llama 3.2 90B Instruct

Llama 3.2 90B is a large multimodal language model optimized for visual recognition, image reasoning, and captioning tasks. It supports a context length of 128,000 tokens and is designed for deployment on edge and mobile devices, offering state-of-the-art performance in image understanding and generative tasks.

Benchmark results

Benchmark Score Tags Source
AI2D 92.3% self-reported llm-stats link →
ChartQA 85.5% self-reported llm-stats link →
DocVQA 90.1% self-reported llm-stats link →
GPQA 46.7% self-reported llm-stats link →
InfographicsQA 56.8% self-reported llm-stats link →
MATH 68.0% self-reported llm-stats link →
MathVista 57.3% self-reported llm-stats link →
MGSM 86.9% self-reported llm-stats link →
MMLU 86.0% self-reported llm-stats link →
MMMU 60.3% self-reported llm-stats link →
MMMU-Pro 45.2% self-reported llm-stats link →
TextVQA 73.5% self-reported llm-stats link →
VQAv2 78.1% self-reported llm-stats link →