BoolQ

reasoning

BoolQ is a reading comprehension dataset for yes/no questions containing 15,942 naturally occurring examples. Each example consists of a question, passage, and boolean answer, where questions are generated in unprompted and unconstrained settings. The dataset challenges models with complex, non-factoid information requiring entailment-like inference to solve.

Leaderboard

Showing 10 of 10 results

Hermes 3 70B

88.0%

i
Gemma 2 27B

84.8%

i
Phi-3.5-MoE-instruct

84.6%

i
Gemma 2 9B

84.2%

i
Gemma 3n E4B

81.6%

i
Gemma 3n E4B Instructed LiteRT Preview

81.6%

i
Phi 4 Mini

81.2%

i
Phi-3.5-mini-instruct

78.0%

i
Gemma 3n E2B

76.4%

i
Gemma 3n E2B Instructed LiteRT (Preview)

76.4%

i