MRCR v2 (8-needle)
reasoning official site →
MRCR v2 (8-needle) is a variant of the Multi-Round Coreference Resolution benchmark that includes 8 needle items to retrieve from long contexts. This tests models' ability to simultaneously track and reason about multiple pieces of information across extended conversations.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, long_context, reasoning. Language: en. Verified by llm-stats: no.