RULER

reasoning

RULER v1 is a synthetic long-context benchmark for measuring how model quality degrades as input length increases. This packaging follows the public standalone NVIDIA RULER implementation with 13 official tasks spanning retrieval, multi-hop tracing, aggregation, and QA.

Leaderboard

Showing 3 of 3 results

Nemotron 3 Super (120B A12B)

91.8%

i
Phi-3.5-MoE-instruct

87.1%

i
Phi-3.5-mini-instruct

84.1%

i