LiveBench

math official site →

LiveBench is a challenging, contamination-limited LLM benchmark that addresses test set contamination by releasing new questions monthly based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. It comprises tasks across math, coding, reasoning, language, instruction following, and data analysis with verifiable, objective ground-truth answers.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. o3-mini self-reported llm-stats
    84.6%
  2. Qwen3 235B A22B self-reported llm-stats
    77.1%
  3. Kimi K2 Instruct self-reported llm-stats
    76.4%
  4. Kimi K2-Instruct-0905 self-reported llm-stats
    76.4%
  5. Qwen3 32B self-reported llm-stats
    74.9%
  6. Qwen3 30B A3B self-reported llm-stats
    74.3%
  7. QwQ-32B self-reported llm-stats
    73.1%
  8. o1 self-reported llm-stats
    67.0%
  9. o1-preview self-reported llm-stats
    52.3%
  10. Qwen2.5 72B Instruct self-reported llm-stats
    52.3%
  11. Phi 4 self-reported llm-stats
    47.6%
  12. Qwen2.5 7B Instruct self-reported llm-stats
    35.9%
  13. Qwen2.5-Omni-7B self-reported llm-stats
    29.6%