OpenAI-MRCR: 2 needle 256k

reasoning

Multi-Round Co-reference Resolution (MRCR) benchmark that tests long-context reasoning by evaluating a model's ability to distinguish between similar outputs, reason about ordering, and reproduce specific content from multi-turn conversations containing multiple writing requests on overlapping topics at 256k tokens.

Leaderboard

Showing 1 of 1 result

GPT-5

86.8%

i