OpenBookQA

reasoning official site →

OpenBookQA is a question-answering dataset modeled after open book exams for assessing human understanding. It contains 5,957 multiple-choice elementary-level science questions that probe understanding of 1,326 core science facts and their application to novel situations, requiring combination of open book facts with broad common knowledge through multi-hop reasoning.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Phi-3.5-MoE-instruct self-reported llm-stats
    89.6%
  2. Phi-3.5-mini-instruct self-reported llm-stats
    79.2%
  3. Phi 4 Mini self-reported llm-stats
    79.2%
  4. Mistral NeMo Instruct self-reported llm-stats
    60.6%
  5. Hermes 3 70B self-reported llm-stats
    49.4%