TruthfulQA
reasoning official site →
TruthfulQA is a benchmark to measure whether language models are truthful in generating answers to questions. It comprises 817 questions that span 38 categories, including health, law, finance and politics. The questions are crafted such that some humans would answer falsely due to a false belief or misconception, testing models' ability to avoid generating false answers learned from human texts.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: finance, general, healthcare, legal, reasoning. Language: en. Verified by llm-stats: no.