GPT-5.1

The best model for coding and agentic tasks with configurable reasoning effort. GPT-5.1 is our flagship model for coding and agentic tasks with configurable reasoning and non-reasoning effort.

Benchmark results

Benchmark Score Tags Source
AIME 2025 94.0% self-reported llm-stats link →
BrowseComp Long Context 128k 90.0% self-reported llm-stats link →
FrontierMath 26.7% self-reported llm-stats link →
GPQA 88.1% self-reported llm-stats link →
MMMU 85.4% self-reported llm-stats link →
SWE-Bench Verified 76.3% self-reported llm-stats link →
Tau2 Airline 67.0% self-reported llm-stats link →
Tau2 Retail 77.9% self-reported llm-stats link →
Tau2 Telecom 95.6% self-reported llm-stats link →