Llama 3.2 90B Instruct
Llama 3.2 90B is a large multimodal language model optimized for visual recognition, image reasoning, and captioning tasks. It supports a context length of 128,000 tokens and is designed for deployment on edge and mobile devices, offering state-of-the-art performance in image understanding and generative tasks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AI2D | 92.3% | self-reported llm-stats | link → |
| ChartQA | 85.5% | self-reported llm-stats | link → |
| DocVQA | 90.1% | self-reported llm-stats | link → |
| GPQA | 46.7% | self-reported llm-stats | link → |
| InfographicsQA | 56.8% | self-reported llm-stats | link → |
| MATH | 68.0% | self-reported llm-stats | link → |
| MathVista | 57.3% | self-reported llm-stats | link → |
| MGSM | 86.9% | self-reported llm-stats | link → |
| MMLU | 86.0% | self-reported llm-stats | link → |
| MMMU | 60.3% | self-reported llm-stats | link → |
| MMMU-Pro | 45.2% | self-reported llm-stats | link → |
| TextVQA | 73.5% | self-reported llm-stats | link → |
| VQAv2 | 78.1% | self-reported llm-stats | link → |