MGSM

math official site →

MGSM (Multilingual Grade School Math) is a benchmark of grade-school math problems. Contains 250 grade-school math problems manually translated from the GSM8K dataset into ten typologically diverse languages: Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, and Telugu. Evaluates multilingual mathematical reasoning capabilities.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Llama 4 Maverick self-reported llm-stats
    92.3%
  2. o3-mini self-reported llm-stats
    92.0%
  3. Claude 3.5 Sonnet self-reported llm-stats
    91.6%
  4. Claude 3.5 Sonnet self-reported llm-stats
    91.6%
  5. Llama 3.3 70B Instruct self-reported llm-stats
    91.1%
  6. o1-preview self-reported llm-stats
    90.8%
  7. Claude 3 Opus self-reported llm-stats
    90.7%
  8. Llama 4 Scout self-reported llm-stats
    90.6%
  9. GPT-4o self-reported llm-stats
    90.5%
  10. o1 self-reported llm-stats
    89.3%
  11. GPT-4 Turbo self-reported llm-stats
    88.5%
  12. Gemini 1.5 Pro self-reported llm-stats
    87.5%
  13. GPT-4o mini self-reported llm-stats
    87.0%
  14. Llama 3.2 90B Instruct self-reported llm-stats
    86.9%
  15. Claude 3.5 Haiku self-reported llm-stats
    85.6%
  16. Qwen3 235B A22B self-reported llm-stats
    83.5%
  17. Claude 3 Sonnet self-reported llm-stats
    83.5%
  18. Gemini 1.5 Flash self-reported llm-stats
    82.6%
  19. Phi 4 self-reported llm-stats
    80.6%
  20. Claude 3 Haiku self-reported llm-stats
    75.1%