GeneBench

reasoning official site →

GeneBench is an evaluation focused on multi-stage scientific data analysis in genetics and quantitative biology. Tasks require reasoning about ambiguous or noisy data with minimal supervisory guidance, addressing realistic obstacles such as hidden confounders or QC failures, and correctly implementing and interpreting modern statistical methods.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning, science. Language: en. Verified by llm-stats: no.

Leaderboard

  1. GPT-5.5 Pro self-reported llm-stats
    33.2%
  2. GPT-5.5 self-reported llm-stats
    25.0%