GPT-5.3 Codex

GPT-5.3-Codex is OpenAI's most capable coding model, combining frontier agentic coding capabilities, improvements in aesthetics, and context compaction. It sets new state-of-the-art results on Terminal-Bench 2.0 (77.3%), OSWorld-Verified (64.7%), and SWE-Lancer IC Diamond (81.4%). First model classified as High capability for cybersecurity under OpenAI's Preparedness Framework. Available in the Codex app and API.

Benchmark results

Benchmark Score Tags Source
Cybersecurity CTFs 77.6% self-reported llm-stats link →
OSWorld-Verified 64.7% self-reported llm-stats link →
SWE-Bench Pro 56.8% self-reported llm-stats link →
SWE-Lancer (IC-Diamond subset) 81.4% self-reported llm-stats link →
Terminal-Bench 2.0 77.3% self-reported llm-stats link →