FrontierMath

math official site →

A benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians, covering most major branches of modern mathematics from number theory and real analysis to algebraic geometry and category theory.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. GPT-5.4 self-reported llm-stats
    47.6%
  2. GPT-5.2 self-reported llm-stats
    40.3%
  3. GPT-5.5 Pro self-reported llm-stats
    39.6%
  4. GPT-5.5 self-reported llm-stats
    35.4%
  5. GPT-5.1 self-reported llm-stats
    26.7%
  6. GPT-5.1 Instant self-reported llm-stats
    26.7%
  7. GPT-5.1 Thinking self-reported llm-stats
    26.7%
  8. GPT-5 self-reported llm-stats
    26.3%
  9. GPT-5 mini self-reported llm-stats
    22.1%
  10. o3 self-reported llm-stats
    15.8%
  11. GPT-5 nano self-reported llm-stats
    9.6%
  12. o3-mini self-reported llm-stats
    9.2%
  13. MAI-Code-1-Flash self-reported llm-stats
    6.3%
  14. o1 self-reported llm-stats
    5.5%