o1-preview
A research preview model focused on mathematical and logical reasoning capabilities, demonstrating improved performance on tasks requiring step-by-step reasoning, mathematical problem-solving, and code generation. The model shows enhanced capabilities in formal reasoning while maintaining strong general capabilities.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2024 | 42.0% | self-reported llm-stats | link → |
| GPQA | 73.3% | self-reported llm-stats | link → |
| LiveBench | 52.3% | self-reported llm-stats | link → |
| MATH | 85.5% | self-reported llm-stats | link → |
| MGSM | 90.8% | self-reported llm-stats | link → |
| MMLU | 90.8% | self-reported llm-stats | link → |
| SimpleQA | 42.4% | self-reported llm-stats | link → |
| SWE-Bench Verified | 41.3% | self-reported llm-stats | link → |