Llama 4 Scout

Llama 4 Scout is a natively multimodal model capable of processing both text and images. It features a 17 billion activated parameter (109B total) mixture-of-experts (MoE) architecture with 16 experts, supporting a wide range of multimodal tasks such as conversational interaction, image analysis, and code generation.

DocVQA

94.4%

i
MGSM

90.6%

i
ChartQA

88.8%

i
MMLU

79.6%

i
MMLU-Pro

74.3%

i
MathVista

70.7%

i
MMMU

69.4%

i
MBPP

67.8%

i
GPQA

57.2%

i
MATH

50.3%

i
LiveCodeBench

32.8%

i
TydiQA

31.5%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.10/Mtok	$0.30/Mtok	328K tokens context 16K tokens max output	100.0% 5m 100.0%	444 ms p50 TTFT 38 tok/s p50	fp8
Groq	available	$0.11/Mtok cache $0.06/Mtok	$0.34/Mtok	131K tokens context 8K tokens max output	99.9% 5m 100.0%	371 ms p50 TTFT 234 tok/s p50
Novita	available	$0.18/Mtok	$0.59/Mtok	131K tokens context 131K tokens max output	100.0% 5m 100.0%	614 ms p50 TTFT 37 tok/s p50	bf16
Google	available	$0.25/Mtok	$0.70/Mtok	1.3M tokens context 8K tokens max output	99% 5m 100.0%	981 ms p50 TTFT 72 tok/s p50