AIME 2024

math official site →

American Invitational Mathematics Examination 2024, consisting of 30 challenging mathematical reasoning problems from AIME I and AIME II competitions. Each problem requires an integer answer between 0-999 and tests advanced mathematical reasoning across algebra, geometry, combinatorics, and number theory. Used as a benchmark for evaluating mathematical reasoning capabilities in large language models at Olympiad-level difficulty.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Grok-3 Mini self-reported llm-stats
    95.8%
  2. o4-mini self-reported llm-stats
    93.4%
  3. Grok-3 self-reported llm-stats
    93.3%
  4. LongCat-Flash-Thinking self-reported llm-stats
    93.3%
  5. Gemini 2.5 Pro self-reported llm-stats
    92.0%
  6. o3 self-reported llm-stats
    91.6%
  7. DeepSeek-R1-0528 self-reported llm-stats
    91.4%
  8. GLM-4.5 self-reported llm-stats
    91.0%
  9. GLM-4.5-Air self-reported llm-stats
    89.4%
  10. Gemini 2.5 Flash self-reported llm-stats
    88.0%
  11. o3-mini self-reported llm-stats
    87.3%
  12. DeepSeek R1 Distill Llama 70B self-reported llm-stats
    86.7%
  13. DeepSeek R1 Zero self-reported llm-stats
    86.7%
  14. o1-pro self-reported llm-stats
    86.0%
  15. MiniMax M1 80K self-reported llm-stats
    86.0%
  16. Qwen3 235B A22B self-reported llm-stats
    85.7%
  17. MiniCPM-SALA self-reported llm-stats
    83.8%
  18. 83.3%
  19. DeepSeek R1 Distill Qwen 32B self-reported llm-stats
    83.3%
  20. DeepSeek R1 Distill Qwen 7B self-reported llm-stats
    83.3%