PopQA

reasoning

PopQA is an entity-centric open-domain question-answering dataset consisting of 14,000 QA pairs designed to evaluate language models' ability to memorize and recall factual knowledge across entities with varying popularity levels. The dataset probes both parametric memory (stored in model parameters) and non-parametric memory effectiveness, with questions covering 16 diverse relationship types from Wikidata converted to natural language using templates. Created by sampling knowledge triples from Wikidata and converting them to natural language questions, focusing on long-tail entities to understand LMs' strengths and limitations in memorizing factual knowledge.

Leaderboard

Showing 3 of 3 results

Granite 3.3 8B Base

26.2%

i
Granite 3.3 8B Instruct

26.2%

i
IBM Granite 4.0 Tiny Preview

22.9%

i