APEX-Agents

reasoning

APEX-Agents is a benchmark evaluating AI agents on long horizon professional tasks that require sustained reasoning, planning, and execution across complex multi-step workflows.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Gemini 3.1 Pro self-reported llm-stats
    33.5%
  2. Kimi K2.6 self-reported llm-stats
    27.9%
  3. MiniMax M3 self-reported llm-stats
    27.7%