MultiPL-E
language official site →
MultiPL-E is a scalable and extensible system for translating unit test-driven code generation benchmarks to multiple programming languages. It extends HumanEval and MBPP Python benchmarks to 18 additional programming languages, enabling evaluation of neural code generation models across diverse programming paradigms and language features.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, language. Language: en. Verified by llm-stats: no.