LiveCodeBench

coding

LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.

Leaderboard

Showing 20 of 74 results

DeepSeek-V4-Pro-Max

93.5%

i
DeepSeek-V4-Flash-Max

91.6%

i
DeepSeek-V3.2 (Thinking)

83.3%

i
DeepSeek-V3.2

83.3%

i
MiniMax M2

83.0%

i
LongCat-Flash-Thinking-2601

82.8%

i
Nemotron 3 Super (120B A12B)

81.2%

i
Grok-3 Mini

80.4%

i
Grok 4 Fast

80.0%

i
Grok-3

79.4%

i
Grok-4 Heavy

79.4%

i
LongCat-Flash-Thinking

79.4%

i
Grok-4

79.0%

i
MiniMax M2.1

78.0%

i
Nova 2 Pro

74.6%

i
DeepSeek-V3.2-Exp

74.1%

i
DeepSeek-R1-0528

73.3%

i
GLM-4.5

72.9%

i
Nemotron Nano 9B v2

71.1%

i
Nova 2 Lite

71.0%

i