GPT-5.4

GPT-5.4 is OpenAI's most capable and efficient frontier model for professional work. It combines industry-leading coding capabilities with native computer-use, up to 1M tokens of context, full-resolution vision processing, tool search for large tool ecosystems, and improved reasoning across spreadsheets, presentations, and documents. It is the most token-efficient reasoning model in the GPT-5 series.

Benchmark results

Benchmark Score Tags Source
ARC-AGI 93.7% self-reported llm-stats link →
ARC-AGI v2 73.3% self-reported llm-stats link →
BrowseComp 82.7% self-reported llm-stats link →
Finance Agent 56.0% self-reported llm-stats link →
FrontierMath 47.6% self-reported llm-stats link →
GPQA 92.8% self-reported llm-stats link →
Graphwalks BFS <128k 93.0% self-reported llm-stats link →
Graphwalks BFS >128k 21.4% self-reported llm-stats link →
Graphwalks parents <128k 89.8% self-reported llm-stats link →
Graphwalks parents >128k 32.4% self-reported llm-stats link →
Humanity's Last Exam 39.8% self-reported llm-stats link →
MCP Atlas 67.2% self-reported llm-stats link →
MMMU-Pro 81.2% self-reported llm-stats link →
OmniDocBench 1.5 89.1% self-reported llm-stats link →
OSWorld-Verified 75.0% self-reported llm-stats link →
SWE-Bench Pro 57.7% self-reported llm-stats link →
Tau2 Telecom 98.9% self-reported llm-stats link →
Terminal-Bench 2.0 75.1% self-reported llm-stats link →
Toolathlon 54.6% self-reported llm-stats link →