LongFact Concepts
reasoning official site →
LongFact is a benchmark for evaluating long-form factuality in large language models. It comprises 2,280 fact-seeking prompts spanning 38 topics, designed to test a model's ability to generate accurate, long-form responses. The benchmark uses SAFE (Search-Augmented Factuality Evaluator) to evaluate factual accuracy.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, reasoning. Language: en. Verified by llm-stats: no.