GPT OSS 20B
The gpt-oss-20b model (technically 20.9B parameters) achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Note: While referred to as '20b' for simplicity, it technically has 20.9B parameters.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| CodeForces | 74.3% | self-reported llm-stats | link → |
| CodeForces | 74.3% | self-reported llm-stats | link → |
| GPQA | 71.5% | self-reported llm-stats | link → |
| HealthBench | 42.5% | self-reported llm-stats | link → |
| HealthBench Hard | 10.8% | self-reported llm-stats | link → |
| Humanity's Last Exam | 10.9% | self-reported llm-stats | link → |
| Humanity's Last Exam | 10.9% | self-reported llm-stats | link → |
| MMLU | 85.3% | self-reported llm-stats | link → |
| TAU-bench Retail | 54.8% | self-reported llm-stats | link → |