Llama 3.3 70B Instruct

Llama 3.3 is a multilingual large language model optimized for dialogue use cases across multiple languages. It is a pretrained and instruction-tuned generative model with 70 billion parameters, outperforming many open-source and closed chat models on common industry benchmarks.

IFEval

92.1%

i
MGSM

91.1%

i
HumanEval

88.4%

i
MBPP EvalPlus

87.6%

i
MMLU

86.0%

i
BFCL v2

77.3%

i
MATH

77.0%

i
MMLU-Pro

68.9%

i
GPQA

50.5%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.10/Mtok	$0.32/Mtok	131K tokens context 16K tokens max output	97% 5m 99%	332 ms p50 TTFT 12 tok/s p50	fp8
AkashML	available	$0.13/Mtok	$0.40/Mtok	131K tokens context 128K tokens max output	99.2% 5m 100.0%	539 ms p50 TTFT 24 tok/s p50	fp8
Parasail	available	$0.22/Mtok cache $0.11/Mtok	$0.50/Mtok	131K tokens context 16K tokens max output	99.6% 5m 99.9%	581 ms p50 TTFT 42 tok/s p50	fp8
Groq	available	$0.59/Mtok cache $0.29/Mtok	$0.79/Mtok	131K tokens context 33K tokens max output	99% 5m 99.5%	279 ms p50 TTFT 158 tok/s p50
WandB	available	$0.71/Mtok cache $0.71/Mtok	$0.71/Mtok	128K tokens context 128K tokens max output	100.0% 5m 100.0%	208 ms p50 TTFT 69 tok/s p50	fp16
Google	available	$0.72/Mtok	$0.72/Mtok	128K tokens context 8K tokens max output	100.0%	293 ms p50 TTFT 58 tok/s p50
Nebius	-2	$0.13/Mtok	$0.40/Mtok	131K tokens context	94% 5m 99.7%	543 ms p50 TTFT 19 tok/s p50	fp8
Novita	-2	$0.14/Mtok	$0.40/Mtok	6K tokens context 120K tokens max output	90% 5m 94%	645 ms p50 TTFT 19 tok/s p50	bf16
Cloudflare	-2	$0.29/Mtok	$2.25/Mtok	24K tokens context 24K tokens max output	93% 5m 96%	627 ms p50 TTFT 18 tok/s p50	fp8
SambaNova	-2	$0.45/Mtok	$0.90/Mtok	16K tokens context 3K tokens max output	93% 5m 98%	719 ms p50 TTFT 56 tok/s p50	bf16
Together	-2	$1.04/Mtok	$1.04/Mtok	131K tokens context 2K tokens max output	89%	710 ms p50 TTFT 16 tok/s p50	fp8