GDPval-AA

reasoning

GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 3000. Categories: agents, finance, general, legal, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Claude Opus 4.8 self-reported llm-stats
    1,890
  2. Gemini 3.5 Flash self-reported llm-stats
    1,656
  3. Claude Sonnet 4.6 self-reported llm-stats
    1,633
  4. Claude Opus 4.6 self-reported llm-stats
    1,606
  5. DeepSeek-V4-Pro-Max self-reported llm-stats
    1,554
  6. MiniMax M2.7 self-reported llm-stats
    1,494
  7. Muse Spark self-reported llm-stats
    1,444
  8. MiMo-V2-Pro self-reported llm-stats
    1,426
  9. MiMo-V2-Omni self-reported llm-stats
    1,410
  10. DeepSeek-V4-Flash-Max self-reported llm-stats
    1,395
  11. Gemini 3.1 Pro self-reported llm-stats
    1,317