WMT24++

language official site →

WMT24++ is a comprehensive multilingual machine translation benchmark that expands the WMT24 dataset to cover 55 languages and dialects. It includes human-written references and post-edits across four domains (literary, news, social, and speech) to evaluate machine translation systems and large language models across diverse linguistic contexts.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: language. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Nemotron 3 Super (120B A12B) self-reported llm-stats
    86.7%
  2. Nemotron 3 Nano (30B A3B) self-reported llm-stats
    86.2%
  3. Qwen3.7 Max self-reported llm-stats
    85.8%
  4. Qwen3.6 Plus self-reported llm-stats
    84.3%
  5. Qwen3.5-397B-A17B self-reported llm-stats
    78.9%
  6. Qwen3.5-122B-A10B self-reported llm-stats
    78.3%
  7. Qwen3.5-27B self-reported llm-stats
    77.6%
  8. Qwen3.5-35B-A3B self-reported llm-stats
    76.3%
  9. Gemma 3 27B self-reported llm-stats
    53.4%
  10. Gemma 3 12B self-reported llm-stats
    51.6%
  11. Gemma 3n E4B Instructed self-reported llm-stats
    50.1%
  12. 50.1%
  13. Gemma 3 4B self-reported llm-stats
    46.8%
  14. Gemma 3n E2B Instructed self-reported llm-stats
    42.7%
  15. 42.7%
  16. Gemma 3 1B self-reported llm-stats
    35.9%