Llama 3.1 8B Instruct

Llama 3.1 8B Instruct is a multilingual large language model optimized for dialogue use cases. It features a 128K context length, state-of-the-art tool use, and strong reasoning capabilities.

GSM-8K (CoT)

84.5%

i
ARC-C

83.4%

i
API-Bank

82.6%

i
IFEval

80.4%

i
BFCL

76.1%

i
MMLU (CoT)

73.0%

i
MBPP EvalPlus (base)

72.8%

i
HumanEval

72.6%

i
MMLU

69.4%

i
Multilingual MGSM (CoT)

68.9%

i
DROP

59.5%

i
Multipl-E MBPP

52.4%

i
MATH (CoT)

51.9%

i
Multipl-E HumanEval

50.8%

i
MMLU-Pro

48.3%

i
Nexus

38.5%

i
GPQA

30.4%

i
Gorilla Benchmark API Bench

8.2%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.02/Mtok	$0.03/Mtok	131K tokens context 16K tokens max output	99.9% 5m 99.9%	611 ms p50 TTFT 20 tok/s p50	fp8
Novita	available	$0.02/Mtok	$0.05/Mtok	16K tokens context 16K tokens max output	100.0% 5m 100.0%	468 ms p50 TTFT 53 tok/s p50	fp8
Groq	available	$0.05/Mtok cache $0.02/Mtok	$0.08/Mtok	131K tokens context 131K tokens max output	100.0% 5m 100.0%	227 ms p50 TTFT 114 tok/s p50
Cloudflare	available	$0.15/Mtok	$0.29/Mtok	32K tokens context 32K tokens max output	100.0% 5m 100.0%	494 ms p50 TTFT 16 tok/s p50	fp8
WandB	available	$0.22/Mtok cache $0.22/Mtok	$0.22/Mtok	128K tokens context 128K tokens max output	100.0% 5m 100.0%	165 ms p50 TTFT 137 tok/s p50	bf16