MMLU-Redux
math official site →
An improved version of the MMLU benchmark featuring manually re-annotated questions to identify and correct errors in the original dataset. Provides more reliable evaluation metrics for language models by addressing dataset quality issues found in the original MMLU.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, language, math, reasoning. Language: en. Verified by llm-stats: no.