CMMLU

reasoning official site →

CMMLU (Chinese Massive Multitask Language Understanding) is a comprehensive Chinese benchmark that evaluates the knowledge and reasoning capabilities of large language models across 67 different subject topics. The benchmark covers natural sciences, social sciences, engineering, and humanities with multiple-choice questions ranging from basic to advanced professional levels.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, language, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2 72B Instruct self-reported llm-stats
    90.1%
  2. LongCat-Flash-Chat self-reported llm-stats
    84.3%
  3. LongCat-Flash-Lite self-reported llm-stats
    82.5%
  4. MiniCPM-SALA self-reported llm-stats
    81.5%
  5. ERNIE 4.5 self-reported llm-stats
    39.8%