CFEval

coding

CFEval benchmark for evaluating code generation and problem-solving capabilities

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 10000. Categories: code. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3-235B-A22B-Thinking-2507 self-reported llm-stats
    2,134
  2. Qwen3-Next-80B-A3B-Thinking self-reported llm-stats
    2,071