FACTS Grounding

reasoning

A benchmark evaluating language models' ability to generate factually accurate and well-grounded responses based on long-form input context, comprising 1,719 examples with documents up to 32k tokens requiring detailed responses that are fully grounded in provided documents

Leaderboard

Showing 13 of 13 results

Gemini 2.5 Pro Preview 06-05

87.8%

i
Gemini 2.5 Flash

85.3%

i
Gemini 2.5 Flash-Lite

84.1%

i
Gemini 2.0 Flash

83.6%

i
Gemini 2.0 Flash-Lite

83.6%

i
Gemma 3 12B

75.8%

i
Gemma 3 27B

74.9%

i
Gemini 3 Pro

70.5%

i
Gemma 3 4B

70.1%

i
Gemini 3 Flash

61.9%

i
GLM-5V-Turbo

58.6%

i
Gemini 3.1 Flash-Lite

40.6%

i
Gemma 3 1B

36.4%

i