HumanEval Plus

coding

Enhanced version of HumanEval that extends the original test cases by 80x using EvalPlus framework for rigorous evaluation of LLM-synthesized code functional correctness, detecting previously undetected wrong code

Leaderboard

Showing 1 of 1 result

Mistral Small 3.2 24B Instruct

92.9%

i