MATH-500

math official site →

MATH-500 is a subset of the MATH dataset containing 500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels across seven mathematical subjects including Prealgebra, Algebra, Number Theory, Counting and Probability, Geometry, Intermediate Algebra, and Precalculus.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. LongCat-Flash-Thinking self-reported llm-stats
    99.2%
  2. Sarvam-105B self-reported llm-stats
    98.6%
  3. GLM-4.5 self-reported llm-stats
    98.2%
  4. GLM-4.5-Air self-reported llm-stats
    98.1%
  5. Nemotron Nano 9B v2 self-reported llm-stats
    97.8%
  6. Kimi K2 Instruct self-reported llm-stats
    97.4%
  7. Kimi K2-Instruct-0905 self-reported llm-stats
    97.4%
  8. Llama 3.1 Nemotron Ultra 253B v1 self-reported llm-stats
    97.0%
  9. Sarvam-30B self-reported llm-stats
    97.0%
  10. LongCat-Flash-Lite self-reported llm-stats
    96.8%
  11. MiniMax M1 80K self-reported llm-stats
    96.8%
  12. Llama-3.3 Nemotron Super 49B v1 self-reported llm-stats
    96.6%
  13. LongCat-Flash-Chat self-reported llm-stats
    96.4%
  14. Claude 3.7 Sonnet self-reported llm-stats
    96.2%
  15. Kimi-k1.5 self-reported llm-stats
    96.2%
  16. MiniMax M1 40K self-reported llm-stats
    96.0%
  17. DeepSeek R1 Zero self-reported llm-stats
    95.9%
  18. Llama 3.1 Nemotron Nano 8B V1 self-reported llm-stats
    95.4%
  19. Phi 4 Mini Reasoning self-reported llm-stats
    94.6%
  20. DeepSeek R1 Distill Llama 70B self-reported llm-stats
    94.5%