MRCR v2 (8-needle)

reasoning

MRCR v2 (8-needle) is a variant of the Multi-Round Coreference Resolution benchmark that includes 8 needle items to retrieve from long contexts. This tests models' ability to simultaneously track and reason about multiple pieces of information across extended conversations.

Leaderboard

Showing 10 of 10 results

Claude Opus 4.6

93.0%

i
GPT-5.5

74.0%

i
Gemini 3.1 Flash-Lite

60.1%

i
GPT-5.4 mini

33.6%

i
GPT-5.4 nano

33.1%

i
Gemini 3.5 Flash

26.6%

i
Gemini 3 Pro

26.3%

i
Gemini 3.1 Pro

26.3%

i
Gemini 3 Flash

22.1%

i
Gemini 2.5 Pro Preview 06-05

16.4%

i