BixBench
reasoning official site →
BixBench is a benchmark for real-world bioinformatics and computational biology data analysis. It evaluates AI models on multi-step scientific workflows that require code execution, statistical reasoning, and biological domain knowledge to interpret experimental data.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning, science. Language: en. Verified by llm-stats: no.