o1-preview

A research preview model focused on mathematical and logical reasoning capabilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows enhanced capabilities in formal reasoning while maintaining strong general capabilities.

Benchmark results

Benchmark Score Tags Source
AIME 2024 42.0% self-reported llm-stats link →
GPQA 73.3% self-reported llm-stats link →
LiveBench 52.3% self-reported llm-stats link →
MATH 85.5% self-reported llm-stats link →
MGSM 90.8% self-reported llm-stats link →
MMLU 90.8% self-reported llm-stats link →
SimpleQA 42.4% self-reported llm-stats link →
SWE-Bench Verified 41.3% self-reported llm-stats link →