Llama 4 Scout

Llama 4 Scout is a natively multimodal model capable of processing both text and images. It features a 17 billion activated parameter (109B total) mixture-of-experts (MoE) architecture with 16 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation. The model includes a 10 million token context window.

Benchmark results

Benchmark Score Tags Source
ChartQA 88.8% self-reported llm-stats link →
DocVQA 94.4% self-reported llm-stats link →
GPQA 57.2% self-reported llm-stats link →
LiveCodeBench 32.8% self-reported llm-stats link →
MATH 50.3% self-reported llm-stats link →
MathVista 70.7% self-reported llm-stats link →
MBPP 67.8% self-reported llm-stats link →
MGSM 90.6% self-reported llm-stats link →
MMLU 79.6% self-reported llm-stats link →
MMLU-Pro 74.3% self-reported llm-stats link →
MMMU 69.4% self-reported llm-stats link →
TydiQA 31.5% self-reported llm-stats link →