PolyMath-en

math official site →

PolyMath is a multilingual mathematical reasoning benchmark covering 18 languages and 4 difficulty levels from easy to hard, ensuring difficulty comprehensiveness, language diversity, and high-quality translation. The benchmark evaluates mathematical reasoning capabilities of large language models across diverse linguistic contexts, making it a highly discriminative multilingual mathematical benchmark.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Kimi K2 Instruct self-reported llm-stats
    65.1%
  2. Kimi K2-Instruct-0905 self-reported llm-stats
    65.1%