Graphwalks BFS >128k
reasoning
A graph reasoning benchmark that evaluates language models' ability to perform breadth-first search (BFS) operations on graphs with context length over 128k tokens, testing long-context reasoning capabilities.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: long_context, reasoning, spatial_reasoning. Language: en. Verified by llm-stats: no.