GPT-4.5
GPT-4.5 is OpenAI's most advanced model, offering improved reasoning, coding, and creative capabilities with faster performance and longer context handling than GPT-4. It features enhanced instruction following, reduced hallucinations, and better factual accuracy.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Aider-Polyglot Edit | 44.9% | self-reported llm-stats | link → |
| AIME 2024 | 36.7% | self-reported llm-stats | link → |
| CharXiv-D | 90.0% | self-reported llm-stats | link → |
| CharXiv-R | 55.4% | self-reported llm-stats | link → |
| COLLIE | 72.3% | self-reported llm-stats | link → |
| ComplexFuncBench | 63.0% | self-reported llm-stats | link → |
| GPQA | 69.5% | self-reported llm-stats | link → |
| Graphwalks BFS <128k | 72.3% | self-reported llm-stats | link → |
| Graphwalks parents <128k | 72.6% | self-reported llm-stats | link → |
| GSM8k | 97.0% | self-reported llm-stats | link → |
| HumanEval | 88.0% | self-reported llm-stats | link → |
| IFEval | 88.2% | self-reported llm-stats | link → |
| Internal API instruction following (hard) | 54.0% | self-reported llm-stats | link → |
| MathVista | 72.3% | self-reported llm-stats | link → |
| MMLU | 90.8% | self-reported llm-stats | link → |
| MMMLU | 85.1% | self-reported llm-stats | link → |
| MMMU | 75.2% | self-reported llm-stats | link → |
| Multi-Challenge | 43.8% | self-reported llm-stats | link → |
| Multi-IF | 70.8% | self-reported llm-stats | link → |
| MultiChallenge (o3-mini grader) | 50.1% | self-reported llm-stats | link → |
| OpenAI-MRCR: 2 needle 128k | 38.5% | self-reported llm-stats | link → |
| SimpleQA | 62.5% | self-reported llm-stats | link → |
| SWE-Bench Verified | 38.0% | self-reported llm-stats | link → |
| SWE-Lancer | 37.3% | self-reported llm-stats | link → |
| SWE-Lancer (IC-Diamond subset) | 17.4% | self-reported llm-stats | link → |
| TAU-bench Airline | 50.0% | self-reported llm-stats | link → |
| TAU-bench Retail | 68.4% | self-reported llm-stats | link → |