MultiPL-E

language

MultiPL-E is a scalable and extensible system for translating unit test-driven code generation benchmarks to multiple programming languages. It extends HumanEval and MBPP Python benchmarks to 18 additional programming languages, enabling evaluation of neural code generation models across diverse programming paradigms and language features.

Leaderboard

Showing 13 of 13 results

Qwen3-235B-A22B-Instruct-2507

87.9%

i
Qwen3-Next-80B-A3B-Instruct

87.8%

i
Qwen3 VL 235B A22B Instruct

86.1%

i
Kimi K2 Instruct

85.7%

i
Kimi K2-Instruct-0905

85.7%

i
Qwen2.5 32B Instruct

75.4%

i
Qwen2.5 72B Instruct

75.1%

i
Qwen2.5 14B Instruct

72.8%

i
Qwen2.5 7B Instruct

70.4%

i
Qwen2 72B Instruct

69.2%

i
Qwen3 235B A22B

65.9%

i
Qwen2.5-Omni-7B

65.8%

i
Qwen2 7B Instruct

59.1%

i