Llama 3.2 3B Instruct

Llama 3.2 3B Instruct is a large language model that supports a context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge.

NIH/Multi-needle

84.7%

i
ARC-C

78.6%

i
GSM8k

77.7%

i
IFEval

77.4%

i
HellaSwag

69.8%

i
BFCL v2

67.0%

i
MMLU

63.4%

i
InfiniteBench/En.MC

63.3%

i
MGSM

58.2%

i
MATH

48.0%

i
Open-rewrite

40.1%

i
Nexus

34.3%

i
GPQA

32.8%

i
InfiniteBench/En.QA

19.8%

i
TLDR9+ (test)

19.0%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
Parasail	available	$0.05/Mtok	$0.33/Mtok	131K tokens context 131K tokens max output	100.0% 5m 100.0%	212 ms p50 TTFT 89 tok/s p50	bf16
Cloudflare	available	$0.05/Mtok	$0.34/Mtok	80K tokens context 80K tokens max output	99.9% 5m 100.0%	212 ms p50 TTFT 272 tok/s p50