GPT-5.4
GPT-5.4 is OpenAI's most capable and efficient frontier model for professional work. It combines industry-leading coding capabilities with native computer-use, up to 1M tokens of context, full-resolution vision processing, tool search for large tool ecosystems, and improved reasoning across spreadsheets, presentations, and documents. It is the most token-efficient reasoning model in the GPT-5 series.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| ARC-AGI | 93.7% | self-reported llm-stats | link → |
| ARC-AGI v2 | 73.3% | self-reported llm-stats | link → |
| BrowseComp | 82.7% | self-reported llm-stats | link → |
| Finance Agent | 56.0% | self-reported llm-stats | link → |
| FrontierMath | 47.6% | self-reported llm-stats | link → |
| GPQA | 92.8% | self-reported llm-stats | link → |
| Graphwalks BFS <128k | 93.0% | self-reported llm-stats | link → |
| Graphwalks BFS >128k | 21.4% | self-reported llm-stats | link → |
| Graphwalks parents <128k | 89.8% | self-reported llm-stats | link → |
| Graphwalks parents >128k | 32.4% | self-reported llm-stats | link → |
| Humanity's Last Exam | 39.8% | self-reported llm-stats | link → |
| MCP Atlas | 67.2% | self-reported llm-stats | link → |
| MMMU-Pro | 81.2% | self-reported llm-stats | link → |
| OmniDocBench 1.5 | 89.1% | self-reported llm-stats | link → |
| OSWorld-Verified | 75.0% | self-reported llm-stats | link → |
| SWE-Bench Pro | 57.7% | self-reported llm-stats | link → |
| Tau2 Telecom | 98.9% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 75.1% | self-reported llm-stats | link → |
| Toolathlon | 54.6% | self-reported llm-stats | link → |