Multipl-E HumanEval

language

MultiPL-E is a scalable and extensible approach to benchmarking neural code generation that translates unit test-driven code generation benchmarks across multiple programming languages. It extends the HumanEval benchmark to 18 additional programming languages, enabling evaluation of code generation models across diverse programming paradigms and providing insights into how models generalize programming knowledge across language boundaries.

Leaderboard

Showing 3 of 3 results

Llama 3.1 405B Instruct

75.2%

i
Llama 3.1 70B Instruct

65.5%

i
Llama 3.1 8B Instruct

50.8%

i