CRUXEval-Output-CoT

reasoning

CRUXEval-O (output prediction) with Chain-of-Thought prompting. Part of the CRUXEval benchmark consisting of 800 Python functions (3-13 lines) designed to evaluate code reasoning, understanding, and execution capabilities. The output prediction task requires models to predict the output of a given Python function with specific inputs, evaluated using chain-of-thought reasoning methodology.

Leaderboard

Showing 1 of 1 result

Qwen2.5-Coder 7B Instruct

56.0%

i