GDPval-Rubrics

reasoning

GDPval-Rubrics evaluates AI model performance on economically valuable knowledge work tasks drawn from the public GDPval dataset. It uses pointwise scoring based on public rubrics, with the environment aligned to the GDPval-AA scaffolding.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, finance, general, legal, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. MiniMax M3 self-reported llm-stats
    74.8%