BioMysteryBench

reasoning biologyscience

BioMysteryBench evaluates a model's ability to reason through challenging molecular biology problems, reporting performance on a hard subset and on the subset of problems solved by human experts.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: biology, reasoning, science. Language: en.

Leaderboard