OJBench

reasoning official site →

OJBench is a competition-level code benchmark designed to assess the competitive-level code reasoning abilities of large language models. It comprises 232 programming competition problems from NOI and ICPC, categorized into Easy, Medium, and Hard difficulty levels. The benchmark evaluates models' ability to solve complex competitive programming challenges using Python and C++.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Kimi K2.6 self-reported llm-stats
    60.6%
  2. Kimi K2-Thinking-0905 self-reported llm-stats
    48.7%
  3. Qwen3.5-27B self-reported llm-stats
    40.1%
  4. Qwen3.5-122B-A10B self-reported llm-stats
    39.5%
  5. Qwen3.5-35B-A3B self-reported llm-stats
    36.0%
  6. Qwen3-235B-A22B-Thinking-2507 self-reported llm-stats
    32.5%
  7. Qwen3-Next-80B-A3B-Thinking self-reported llm-stats
    29.7%
  8. Kimi K2 Instruct self-reported llm-stats
    27.1%
  9. Kimi K2-Instruct-0905 self-reported llm-stats
    27.1%