SciCode

coding official site →

SciCode is a research coding benchmark curated by scientists that challenges language models to code solutions for scientific problems. It contains 338 subproblems decomposed from 80 challenging main problems across 16 natural science sub-fields including mathematics, physics, chemistry, biology, and materials science. Problems require knowledge recall, reasoning, and code synthesis skills.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: biology, chemistry, code, math, physics, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Gemini 3.1 Pro self-reported llm-stats
    59.0%
  2. Qwen3.7 Max self-reported llm-stats
    53.5%
  3. Kimi K2.6 self-reported llm-stats
    52.2%
  4. Kimi K2.5 self-reported llm-stats
    48.7%
  5. Kimi K2-Thinking-0905 self-reported llm-stats
    44.8%
  6. Nemotron 3 Super (120B A12B) self-reported llm-stats
    42.0%
  7. GLM-4.5 self-reported llm-stats
    41.7%
  8. MiniMax M2.1 self-reported llm-stats
    39.0%
  9. Mercury 2 self-reported llm-stats
    38.0%
  10. GLM-4.5-Air self-reported llm-stats
    37.3%
  11. MiniMax M2 self-reported llm-stats
    36.0%
  12. Nemotron 3 Nano (30B A3B) self-reported llm-stats
    33.3%