MATH

math official site →

MATH dataset contains 12,500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels (1-5) across seven mathematical subjects including Prealgebra, Algebra, Number Theory, Counting and Probability, Geometry, Intermediate Algebra, and Precalculus.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. o3-mini self-reported llm-stats
    97.9%
  2. o1 self-reported llm-stats
    96.4%
  3. 94.8%
  4. Mistral Large 3 self-reported llm-stats
    90.4%
  5. Gemini 2.0 Flash self-reported llm-stats
    89.7%
  6. Kimi K2 0905 self-reported llm-stats
    89.1%
  7. Gemma 3 27B self-reported llm-stats
    89.0%
  8. Gemini 2.0 Flash-Lite self-reported llm-stats
    86.8%
  9. Gemini 1.5 Pro self-reported llm-stats
    86.5%
  10. o1-preview self-reported llm-stats
    85.5%
  11. GPT-5 self-reported llm-stats
    84.7%
  12. Gemma 3 12B self-reported llm-stats
    83.8%
  13. Qwen2.5 32B Instruct self-reported llm-stats
    83.1%
  14. Qwen2.5 72B Instruct self-reported llm-stats
    83.1%
  15. Qwen2.5 VL 32B Instruct self-reported llm-stats
    82.2%
  16. Phi 4 self-reported llm-stats
    80.4%
  17. Qwen2.5 14B Instruct self-reported llm-stats
    80.0%
  18. Claude 3.5 Sonnet self-reported llm-stats
    78.3%
  19. Gemini 1.5 Flash self-reported llm-stats
    77.9%
  20. 77.0%