MMLU-redux-2.0

math

A curated version of the MMLU benchmark featuring manually re-annotated 5,700 questions across 57 subjects to identify and correct errors in the original dataset. Addresses the 6.49% error rate found in MMLU and provides more reliable evaluation metrics for language models.

Leaderboard

Showing 1 of 1 result

Kimi K2 Base

90.2%

i