GPT OSS 20B

The gpt-oss-20b model (technically 20.9B parameters) achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.

MMLU

85.3%

i
CodeForces

74.3%

i
CodeForces

74.3%

i
GPQA

71.5%

i
TAU-bench Retail

54.8%

i
HealthBench

42.5%

i
Humanity's Last Exam

10.9%

i
Humanity's Last Exam

10.9%

i
HealthBench Hard

10.8%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
WandB	available	$0.03/Mtok cache $0.03/Mtok	$0.13/Mtok	131K tokens context 131K tokens max output	99% 5m 99%	275 ms p50 TTFT 140 tok/s p50	fp4
DeepInfra	available	$0.03/Mtok	$0.14/Mtok	131K tokens context 131K tokens max output	100.0% 5m 100.0%	268 ms p50 TTFT 110 tok/s p50	bf16
Novita	available	$0.04/Mtok	$0.15/Mtok	131K tokens context 33K tokens max output	99.7% 5m 100.0%	370 ms p50 TTFT 121 tok/s p50	fp4
Phala	available	$0.04/Mtok	$0.15/Mtok	131K tokens context 131K tokens max output	96% 5m 97%	631 ms p50 TTFT 56 tok/s p50
SiliconFlow	available	$0.04/Mtok	$0.18/Mtok	131K tokens context 8K tokens max output	97% 5m 100.0%	1,317 ms p50 TTFT 50 tok/s p50	fp8
Together	available	$0.05/Mtok	$0.20/Mtok	131K tokens context 33K tokens max output	—	233 ms p50 TTFT 103 tok/s p50
Amazon Bedrock	available	$0.07/Mtok	$0.15/Mtok	131K tokens context 33K tokens max output	100.0% 5m 100.0%	525 ms p50 TTFT 318 tok/s p50
Fireworks	available	$0.07/Mtok cache $0.04/Mtok	$0.30/Mtok	131K tokens context 33K tokens max output	99.5% 5m 99.3%	297 ms p50 TTFT 131 tok/s p50
Groq	available	$0.07/Mtok cache $0.04/Mtok	$0.30/Mtok	131K tokens context 66K tokens max output	99.9% 5m 99.8%	415 ms p50 TTFT 275 tok/s p50
DekaLLM	-2	$0.03/Mtok	$0.14/Mtok	131K tokens context 33K tokens max output	83% 5m 86%	924 ms p50 TTFT 19 tok/s p50	bf16
Parasail	-5	$0.04/Mtok cache $0.02/Mtok	$0.20/Mtok	131K tokens context 131K tokens max output	56% 5m 98%	372 ms p50 TTFT 237 tok/s p50	fp4
Google	-5	$0.07/Mtok	$0.25/Mtok	131K tokens context 33K tokens max output	—	—