ARC-AGI

reasoning official site →

The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) is a benchmark designed to test general intelligence and abstract reasoning capabilities through visual grid-based transformation tasks. Each task consists of 2-5 demonstration pairs showing input grids transformed into output grids according to underlying rules, with test-takers required to infer these rules and apply them to novel test inputs. The benchmark uses colored grids (up to 30x30) with 10 discrete colors/symbols, designed to measure human-like general fluid intelligence and skill-acquisition efficiency with minimal prior knowledge.

Methodology

Imported from llm-stats public benchmark metadata. Modality: image. Max score: 1. Categories: reasoning, spatial_reasoning, vision. Language: en. Verified by llm-stats: no.

Leaderboard

  1. GPT-5.5 self-reported llm-stats
    95.0%
  2. GPT-5.4 self-reported llm-stats
    93.7%
  3. GPT-5.2 Pro self-reported llm-stats
    90.5%
  4. o3 self-reported llm-stats
    88.0%
  5. GPT-5.2 self-reported llm-stats
    86.2%
  6. LongCat-Flash-Thinking self-reported llm-stats
    50.3%
  7. Qwen3-235B-A22B-Instruct-2507 self-reported llm-stats
    41.8%