Llama 4 Scout
Llama 4 Scout is a natively multimodal model capable of processing both text and images. It features a 17 billion activated parameter (109B total) mixture-of-experts (MoE) architecture with 16 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 10 million token context window.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| ChartQA | 88.8% | self-reported llm-stats | link → |
| DocVQA | 94.4% | self-reported llm-stats | link → |
| GPQA | 57.2% | self-reported llm-stats | link → |
| LiveCodeBench | 32.8% | self-reported llm-stats | link → |
| MATH | 50.3% | self-reported llm-stats | link → |
| MathVista | 70.7% | self-reported llm-stats | link → |
| MBPP | 67.8% | self-reported llm-stats | link → |
| MGSM | 90.6% | self-reported llm-stats | link → |
| MMLU | 79.6% | self-reported llm-stats | link → |
| MMLU-Pro | 74.3% | self-reported llm-stats | link → |
| MMMU | 69.4% | self-reported llm-stats | link → |
| TydiQA | 31.5% | self-reported llm-stats | link → |