Multilingual MMLU

reasoning official site →

MMLU-ProX is a comprehensive multilingual benchmark covering 29 typologically diverse languages, building upon MMLU-Pro. Each language version consists of 11,829 identical questions enabling direct cross-linguistic comparisons. The benchmark evaluates large language models' reasoning capabilities across linguistic and cultural boundaries through challenging, reasoning-focused questions with 10 answer choices.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, language, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. o3-mini self-reported llm-stats
    80.7%
  2. Ministral 3 (14B Base 2512) self-reported llm-stats
    74.2%
  3. Phi 4 Mini self-reported llm-stats
    49.3%