GDPval-Rubrics
reasoning
GDPval-Rubrics evaluates AI model performance on economically valuable knowledge work tasks drawn from the public GDPval dataset. It uses pointwise scoring based on public rubrics, with the environment aligned to the GDPval-AA scaffolding.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: agents, finance, general, legal, reasoning. Language: en. Verified by llm-stats: no.