LongFact

factuality official site →

LongFact evaluates factual precision over long-form generations containing many individual claims. Each claim is extracted and verified, and the model is scored on claim-level precision, measuring whether extended responses introduce unsupported or false statements.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: factuality, general. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MAI-Thinking-1 self-reported llm-stats
    98.0%