Natural2Code

reasoning official site →

NaturalCodeBench (NCB) is a challenging code benchmark designed to mirror the complexity and variety of real-world coding tasks. It comprises 402 high-quality problems in Python and Java, selected from natural user queries from online coding services, covering 6 different domains.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Gemini 2.0 Flash self-reported llm-stats
    92.9%
  2. Gemini 1.5 Pro self-reported llm-stats
    85.4%
  3. Gemma 3 27B self-reported llm-stats
    84.5%
  4. Gemma 3 12B self-reported llm-stats
    80.7%
  5. Gemini 1.5 Flash self-reported llm-stats
    79.8%
  6. Gemini 1.5 Flash 8B self-reported llm-stats
    75.5%
  7. Gemma 3 4B self-reported llm-stats
    70.3%
  8. Gemma 3 1B self-reported llm-stats
    56.0%