MATH (CoT)

math official site →

MATH dataset contains 12,500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels (1-5) across seven mathematical subjects. This variant uses Chain-of-Thought prompting to encourage step-by-step reasoning.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Llama 3.1 70B Instruct self-reported llm-stats
    68.0%
  2. Ministral 3 (14B Base 2512) self-reported llm-stats
    67.6%
  3. Mistral Large 3 self-reported llm-stats
    67.6%
  4. Llama 3.1 8B Instruct self-reported llm-stats
    51.9%