MMLU-STEM

math official site →

STEM-focused subset of the Massive Multitask Language Understanding benchmark, evaluating language models on science, technology, engineering, and mathematics topics including physics, chemistry, mathematics, and other technical subjects.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: chemistry, math, physics, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen2.5 32B Instruct self-reported llm-stats
    80.9%
  2. Qwen2.5 14B Instruct self-reported llm-stats
    76.4%