GPT OSS 20B

The gpt-oss-20b model (technically 20.9B parameters) achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure. Both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench (even outperforming proprietary models like OpenAI o1 and GPT‑4o). Note: While referred to as '20b' for simplicity, it technically has 20.9B parameters.

Benchmark results

Benchmark Score Tags Source
CodeForces 74.3% self-reported llm-stats link →
CodeForces 74.3% self-reported llm-stats link →
GPQA 71.5% self-reported llm-stats link →
HealthBench 42.5% self-reported llm-stats link →
HealthBench Hard 10.8% self-reported llm-stats link →
Humanity's Last Exam 10.9% self-reported llm-stats link →
Humanity's Last Exam 10.9% self-reported llm-stats link →
MMLU 85.3% self-reported llm-stats link →
TAU-bench Retail 54.8% self-reported llm-stats link →