PopQA

reasoning official site →

PopQA is an entity-centric open-domain question-answering dataset consisting of 14,000 QA pairs designed to evaluate language models' ability to memorize and recall factual knowledge across entities with varying popularity levels. The dataset probes both parametric memory (stored in model parameters) and non-parametric memory effectiveness, with questions covering 16 diverse relationship types from Wikidata converted to natural language using templates. Created by sampling knowledge triples from Wikidata and converting them to natural language questions, focusing on long-tail entities to understand LMs' strengths and limitations in memorizing factual knowledge.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Granite 3.3 8B Base self-reported llm-stats
    26.2%
  2. Granite 3.3 8B Instruct self-reported llm-stats
    26.2%
  3. IBM Granite 4.0 Tiny Preview self-reported llm-stats
    22.9%