MBPP+

reasoning official site →

MBPP+ is an enhanced version of MBPP (Mostly Basic Python Problems) with significantly more test cases (35x) for more rigorous evaluation. MBPP is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers, covering programming fundamentals and standard library functionality.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5 32B Instruct self-reported llm-stats
    67.2%
  2. Qwen2.5 14B Instruct self-reported llm-stats
    63.2%
  3. ERNIE 4.5 self-reported llm-stats
    40.2%