Global-MMLU-Lite

reasoning official site →

A lightweight version of Global MMLU benchmark that evaluates language models across multiple languages while addressing cultural and linguistic biases in multilingual evaluation.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, language, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Gemini 2.5 Pro Preview 06-05 self-reported llm-stats
    89.2%
  2. Gemini 2.5 Pro self-reported llm-stats
    88.6%
  3. Gemini 2.5 Flash self-reported llm-stats
    88.4%
  4. Gemini 2.5 Flash-Lite self-reported llm-stats
    81.1%
  5. Gemini 2.0 Flash-Lite self-reported llm-stats
    78.2%
  6. Gemma 3 27B self-reported llm-stats
    75.1%
  7. Gemma 3 12B self-reported llm-stats
    69.5%
  8. Gemini Diffusion self-reported llm-stats
    69.1%
  9. Gemma 3n E4B Instructed self-reported llm-stats
    64.5%
  10. 64.5%
  11. Gemma 3n E2B Instructed self-reported llm-stats
    59.0%
  12. 59.0%
  13. Gemma 3 4B self-reported llm-stats
    54.5%
  14. Gemma 3 1B self-reported llm-stats
    34.2%