o1

Reasoning-tuned model.

Benchmark results

Benchmark Score Tags Source
AIME 2024 83.3%
GPQA Diamond 78.0%
HumanEval 92.4%
MATH 94.8%