AIME 2025

math official site →

All 30 problems from the 2025 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Grok-4 Heavy self-reported llm-stats
    100.0%
  2. Gemini 3 Pro self-reported llm-stats
    100.0%
  3. GPT-5.2 self-reported llm-stats
    100.0%
  4. GPT-5.2 Pro self-reported llm-stats
    100.0%
  5. Kimi K2-Thinking-0905 self-reported llm-stats
    100.0%
  6. Claude Opus 4.6 self-reported llm-stats
    99.8%
  7. Gemini 3 Flash self-reported llm-stats
    99.7%
  8. GPT-5.1 High self-reported llm-stats
    99.6%
  9. LongCat-Flash-Thinking-2601 self-reported llm-stats
    99.6%
  10. Nemotron 3 Nano (30B A3B) self-reported llm-stats
    99.2%
  11. GPT OSS 20B High self-reported llm-stats
    98.7%
  12. GPT-5.1 Medium self-reported llm-stats
    98.4%
  13. Seed 2.0 Pro self-reported llm-stats
    98.3%
  14. Step-3.5-Flash self-reported llm-stats
    97.3%
  15. MAI-Thinking-1 self-reported llm-stats
    97.0%
  16. GPT-5.1 Codex High self-reported llm-stats
    96.7%
  17. Sarvam-105B self-reported llm-stats
    96.7%
  18. Sarvam-30B self-reported llm-stats
    96.7%
  19. Kimi K2.5 self-reported llm-stats
    96.1%
  20. DeepSeek-V3.2-Speciale self-reported llm-stats
    96.0%