MBPP EvalPlus (base)

reasoning

MBPP (Mostly Basic Python Problems) is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers. EvalPlus extends MBPP with significantly more test cases (35x) for more rigorous evaluation of LLM-synthesized code, providing high-quality and precise evaluation.

Leaderboard

Showing 1 of 1 result

Llama 3.1 8B Instruct

72.8%

i