LiveCodeBench v6

reasoning

LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.

Leaderboard

Showing 20 of 50 results

Qwen3.7 Max

91.6%

i
Kimi K2.6

89.6%

i
Seed 2.0 Pro

87.8%

i
MAI-Thinking-1

87.7%

i
Qwen3.6 Plus

87.1%

i
Step-3.5-Flash

86.4%

i
Kimi K2.5

85.0%

i
GLM-4.7

84.9%

i
Qwen3.6-27B

83.9%

i
Qwen3.5-397B-A17B

83.6%

i
Kimi K2-Thinking-0905

83.1%

i
GLM-4.6

82.8%

i
GPT OSS 120B High

81.9%

i
Seed 2.0 Lite

81.7%

i
K-EXAONE-236B-A23B

80.7%

i
Qwen3.5-27B

80.7%

i
MiMo-V2-Flash

80.6%

i
Qwen3.6-35B-A3B

80.4%

i
Gemma 4 31B

80.0%

i
Qwen3.5-122B-A10B

78.9%

i