GPT-5.2

GPT‑5.2 introduces substantial gains in professional knowledge work, outperforming experts on GDPval with 70.9% wins or ties, and setting new highs in coding (SWE‑Bench Pro 55.6%), science (GPQA Diamond ~92–93%), math (AIME 2025: 100%), long‑context accuracy up to 256k tokens, and reliable tool‑calling (Tau2 Telecom 98.7%). It rolls out as Instant, Thinking, and Pro—faster, more structured, and less error‑prone—priced at $1.75/1M input and $14/1M output tokens, with Pro variants supporting xhigh reasoning for top‑quality, end‑to‑end execution.

AIME 2025

100.0%

i
HMMT 2025

99.4%

i
Tau2 Telecom

98.7%

i
Graphwalks BFS <128k

94.0%

i
GPQA

92.4%

i
BrowseComp Long Context 128k

92.0%

i
BrowseComp Long Context 256k

89.8%

i
MMMLU

89.6%

i
Graphwalks parents <128k

89.0%

i
ScreenSpot Pro

86.3%

i
ARC-AGI

86.2%

i
VideoMMMU

85.9%

i
CharXiv-R

82.1%

i
Tau2 Retail

82.0%

i
SWE-Bench Verified

80.0%

i
MMMU-Pro

79.5%

i
SWE-Lancer (IC-Diamond subset)

74.6%

i
BrowseComp

65.8%

i
MCP Atlas

60.6%

i
ARC-AGI v2

52.9%

i
Toolathlon

46.3%

i
FrontierMath

40.3%

i
Humanity's Last Exam

34.5%

i