BixBench

reasoning official site →

BixBench is a benchmark for real-world bioinformatics and computational biology data analysis. It evaluates AI models on multi-step scientific workflows that require code execution, statistical reasoning, and biological domain knowledge to interpret experimental data.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning, science. Language: en. Verified by llm-stats: no.

Leaderboard

  1. GPT-5.5 self-reported llm-stats
    80.5%