OmniMath

math official site →

A Universal Olympiad Level Mathematic Benchmark for Large Language Models containing 4,428 competition-level problems with rigorous human annotation, categorized into over 33 sub-domains and spanning more than 10 distinct difficulty levels

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Phi 4 Reasoning Plus self-reported llm-stats
    81.9%
  2. Phi 4 Reasoning self-reported llm-stats
    76.6%