GPT-5.3 Codex
GPT-5.3-Codex is OpenAI's most capable coding model, combining frontier agentic coding capabilities, improvements in aesthetics, and context compaction. It sets new state-of-the-art results on Terminal-Bench 2.0 (77.3%), OSWorld-Verified (64.7%), and SWE-Lancer IC Diamond (81.4%). First model classified as High capability for cybersecurity under OpenAI's Preparedness Framework. Available in the Codex app and API.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| Cybersecurity CTFs | 77.6% | self-reported llm-stats | link → |
| OSWorld-Verified | 64.7% | self-reported llm-stats | link → |
| SWE-Bench Pro | 56.8% | self-reported llm-stats | link → |
| SWE-Lancer (IC-Diamond subset) | 81.4% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 77.3% | self-reported llm-stats | link → |