MMLU Chat

math

Chat-format variant of the Massive Multitask Language Understanding benchmark, evaluating language models across 57 tasks including elementary mathematics, US history, computer science, law, and other professional and academic subjects. This version uses conversational prompting format for model evaluation.

Leaderboard

Showing 1 of 1 result

Llama 3.1 Nemotron 70B Instruct

80.6%

i