GPT-5.1
The best model for coding and agentic tasks with configurable reasoning effort. GPT-5.1 is our flagship model for coding and agentic tasks with configurable reasoning and non-reasoning effort.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 94.0% | self-reported llm-stats | link → |
| BrowseComp Long Context 128k | 90.0% | self-reported llm-stats | link → |
| FrontierMath | 26.7% | self-reported llm-stats | link → |
| GPQA | 88.1% | self-reported llm-stats | link → |
| MMMU | 85.4% | self-reported llm-stats | link → |
| SWE-Bench Verified | 76.3% | self-reported llm-stats | link → |
| Tau2 Airline | 67.0% | self-reported llm-stats | link → |
| Tau2 Retail | 77.9% | self-reported llm-stats | link → |
| Tau2 Telecom | 95.6% | self-reported llm-stats | link → |