Graphwalks parents <128k

reasoning

A graph reasoning benchmark that evaluates language models' ability to find parent nodes in graphs with context length under 128k tokens, requiring understanding of graph structure and edge relationships.

Leaderboard

Showing 11 of 11 results

GPT-5.4

89.8%

i
GPT-5.2

89.0%

i
GPT-5

73.3%

i
GPT-4.5

72.6%

i
GPT-5.4 mini

71.5%

i
GPT-4.1 mini

60.5%

i
o3-mini

58.3%

i
GPT-4.1

58.0%

i
GPT-5.4 nano

50.8%

i
GPT-4o

35.4%

i
GPT-4.1 nano

9.4%

i