o1

A research preview model focused on mathematical and logical reasoning capabilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows enhanced capabilities in formal reasoning while maintaining strong general capabilities.

GSM8k

97.1%

i
MATH

96.4%

i
GPQA Physics

92.8%

i
MMLU

91.8%

i
MGSM

89.3%

i
HumanEval

88.1%

i
MMMLU

87.7%

i
GPQA

78.0%

i
MMMU

77.6%

i
AIME 2024

74.3%

i
MathVista

71.8%

i
TAU-bench Retail

70.8%

i
GPQA Biology

69.2%

i
LiveBench

67.0%

i
GPQA Chemistry

64.7%

i
TAU-bench Airline

50.0%

i
SimpleQA

47.0%

i
SWE-Bench Verified

41.0%

i
FrontierMath

5.5%

i