Instruct HumanEval

general

Instruction-based variant of HumanEval benchmark for evaluating large language models' code generation capabilities with functional correctness using pass@k metric on programming problems

Leaderboard

Showing 1 of 1 result

Llama 3.1 Nemotron 70B Instruct

73.8%

i