Graphwalks BFS <128k
reasoning
A graph reasoning benchmark that evaluates language models' ability to perform breadth-first search (BFS) operations on graphs with context length under 128k tokens, returning nodes reachable at specified depths.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: reasoning, spatial_reasoning. Language: en. Verified by llm-stats: no.