LongFact

factuality

LongFact evaluates factual precision over long-form generations containing many individual claims. Each claim is extracted and verified, and the model is scored on claim-level precision, measuring whether extended responses introduce unsupported or false statements.

Leaderboard

Showing 1 of 1 result

MAI-Thinking-1

98.0%

i