GraphWalks

reasoning

GraphWalks is a synthetic multi-hop long-context reasoning benchmark in which a model is given an edge-list representation of a graph and must traverse it to find neighboring nodes (via breadth-first search) or parent nodes for a given start node. Performance is reported as F1 of the model-predicted answer set versus the ground truth.

Leaderboard

Showing 3 of 3 results

MAI-Thinking-1

90.0%

i
MiMo-V2.5

87.0%

i
MiMo-V2.5-Pro

62.0%

i