MRCR 1M

reasoning

MRCR 1M is a variant of the Multi-Round Coreference Resolution benchmark designed for testing extremely long context capabilities with approximately 1 million tokens. It evaluates models' ability to maintain reasoning and attention across ultra-long conversations.

Leaderboard

Showing 3 of 3 results

DeepSeek-V4-Pro-Max

83.5%

i
DeepSeek-V4-Flash-Max

78.7%

i
Gemini 2.0 Flash-Lite

58.0%

i