SimpleQA
reasoning official site →
SimpleQA is a factuality benchmark developed by OpenAI that measures the short-form factual accuracy of large language models. The benchmark contains 4,326 short, fact-seeking questions that are adversarially collected and designed to have single, indisputable answers. Questions cover diverse topics from science and technology to entertainment, and the benchmark also measures model calibration by evaluating whether models know what they know.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: factuality, general, reasoning. Language: en. Verified by llm-stats: no.