GPT-5

GPT-5 is a flagship model from OpenAI designed for coding, reasoning, and agentic tasks across domains. It is optimized for coding and agentic tasks with higher reasoning capabilities and medium speed.

Benchmark results

Benchmark Score Tags Source
Aider-Polyglot 88.0% self-reported llm-stats link →
AIME 2025 94.6% self-reported llm-stats link →
BrowseComp 54.9% self-reported llm-stats link →
BrowseComp Long Context 128k 90.0% self-reported llm-stats link →
BrowseComp Long Context 256k 88.8% self-reported llm-stats link →
CharXiv-R 81.1% self-reported llm-stats link →
COLLIE 99.0% self-reported llm-stats link →
ERQA 65.7% self-reported llm-stats link →
FActScore 1.0% self-reported llm-stats link →
FrontierMath 26.3% self-reported llm-stats link →
GPQA 85.7% self-reported llm-stats link →
Graphwalks BFS <128k 78.3% self-reported llm-stats link →
Graphwalks parents <128k 73.3% self-reported llm-stats link →
HealthBench Hard 1.6% self-reported llm-stats link →
HMMT 2025 93.3% self-reported llm-stats link →
HumanEval 93.4% self-reported llm-stats link →
Humanity's Last Exam 24.8% self-reported llm-stats link →
Internal API instruction following (hard) 64.0% self-reported llm-stats link →
LongFact Concepts 0.7% self-reported llm-stats link →
LongFact Objects 0.8% self-reported llm-stats link →
MATH 84.7% self-reported llm-stats link →
MMLU 92.5% self-reported llm-stats link →
MMMU 84.2% self-reported llm-stats link →
MMMU-Pro 78.4% self-reported llm-stats link →
Multi-Challenge 69.6% self-reported llm-stats link →
MultiChallenge (o3-mini grader) 69.6% self-reported llm-stats link →
OpenAI-MRCR: 2 needle 128k 95.2% self-reported llm-stats link →
OpenAI-MRCR: 2 needle 256k 86.8% self-reported llm-stats link →
Scale MultiChallenge 69.6% self-reported llm-stats link →
SWE-Bench Verified 74.9% self-reported llm-stats link →
SWE-Lancer (IC-Diamond subset) 100.0% self-reported llm-stats link →
Tau2 Airline 62.6% self-reported llm-stats link →
Tau2 Retail 81.1% self-reported llm-stats link →
Tau2 Telecom 96.7% self-reported llm-stats link →
VideoMME w sub. 86.7% self-reported llm-stats link →
VideoMMMU 84.6% self-reported llm-stats link →