Llama 3.1 70B Instruct

Llama 3.1 70B Instruct is a large language model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks.

GSM-8K (CoT)

95.1%

i
ARC-C

94.8%

i
API-Bank

90.0%

i
IFEval

87.5%

i
Multilingual MGSM (CoT)

86.9%

i
MBPP ++ base version

86.0%

i
MMLU (CoT)

86.0%

i
BFCL

84.8%

i
MMLU

83.6%

i
HumanEval

80.5%

i
DROP

79.6%

i
MATH (CoT)

68.0%

i
MMLU-Pro

66.4%

i
Multipl-E HumanEval

65.5%

i
Multipl-E MBPP

62.0%

i
Nexus

56.7%

i
GPQA

41.7%

i
Gorilla Benchmark API Bench

29.7%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
Amazon Bedrock	available	$0.72/Mtok	$0.72/Mtok	131K tokens context 8K tokens max output	100.0% 5m 99.9%	419 ms p50 TTFT 28 tok/s p50
WandB	available	$0.80/Mtok cache $0.80/Mtok	$0.80/Mtok	128K tokens context 128K tokens max output	100.0% 5m 100.0%	310 ms p50 TTFT 31 tok/s p50	bf16
DeepInfra	-2	$0.40/Mtok	$0.40/Mtok	131K tokens context 16K tokens max output	95% 5m 86%	346 ms p50 TTFT 20 tok/s p50	fp8