LiveCodeBench v5

reasoning official site →

LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Gemini 2.5 Pro self-reported llm-stats
    75.6%
  2. Gemini 2.5 Flash self-reported llm-stats
    63.9%
  3. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    61.4%
  4. MiniCPM-SALA self-reported llm-stats
    60.5%
  5. Gemini 2.0 Flash-Lite self-reported llm-stats
    28.9%
  6. Gemma 3n E4B Instructed self-reported llm-stats
    25.7%
  7. 25.7%
  8. Gemma 3n E2B Instructed self-reported llm-stats
    18.6%
  9. 18.6%