ARC-AGI

reasoning

The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is a benchmark designed to test general intelligence and abstract reasoning capabilities through visual grid-based transformation tasks. Each task consists of 2-5 demonstration pairs showing input grids transformed into output grids according to underlying rules, with test-takers required to infer these rules and apply them to novel test inputs. The benchmark uses colored grids (up to 30x30) with 10 discrete colors/symbols, designed to measure human-like general fluid intelligence and skill-acquisition efficiency with minimal prior knowledge.

Leaderboard

Showing 7 of 7 results

GPT-5.5

95.0%

i
GPT-5.4

93.7%

i
GPT-5.2 Pro

90.5%

i
o3

88.0%

i
GPT-5.2

86.2%

i
LongCat-Flash-Thinking

50.3%

i
Qwen3-235B-A22B-Instruct-2507

41.8%

i