GraphWalks
reasoning
GraphWalks is a synthetic multi-hop long-context reasoning benchmark in which a model is given an edge-list representation of a graph and must traverse it to find neighboring nodes (via breadth-first search) or parent nodes for a given start node. Performance is reported as F1 of the model-predicted answer set versus the ground truth.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: long_context, reasoning. Language: en. Verified by llm-stats: no.