MBPP

reasoning

MBPP (Mostly Basic Python Problems) is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers. Each problem consists of a task description, code solution, and 3 automated test cases covering programming fundamentals and standard library functionality.

Leaderboard

Showing 20 of 33 results

Sarvam-30B

92.7%

i
Llama-3.3 Nemotron Super 49B v1

91.3%

i
Qwen2.5-Coder 32B Instruct

90.2%

i
MiniCPM-SALA

89.1%

i
Qwen2.5 72B Instruct

88.2%

i
Llama 3.1 Nemotron Nano 8B V1

84.6%

i
Qwen2.5 32B Instruct

84.0%

i
Qwen2.5 VL 32B Instruct

84.0%

i
Qwen2.5-Coder 7B Instruct

83.5%

i
Qwen2.5 14B Instruct

82.0%

i
Qwen3 235B A22B

81.4%

i
Phi-3.5-MoE-instruct

80.8%

i
Qwen2 72B Instruct

80.2%

i
Qwen2.5 7B Instruct

79.2%

i
Codestral-22B

78.2%

i
Llama 4 Maverick

77.6%

i
Gemini Diffusion

76.0%

i
Mistral Small 3.1 24B Instruct

74.7%

i
Gemma 3 27B

74.4%

i
Qwen2.5-Omni-7B

73.2%

i