MBPP pass@1

reasoning

MBPP (Mostly Basic Python Problems) is a benchmark of 974 crowd-sourced Python programming problems designed to be solvable by entry-level programmers. Each problem consists of a task description, code solution, and 3 automated test cases. This variant uses pass@1 evaluation metric measuring the percentage of problems solved correctly on the first attempt.

Leaderboard

Showing 1 of 1 result

Ministral 8B Instruct

70.0%

i