HLE

math official site →

Humanity's Last Exam (HLE) is a multi-modal academic benchmark with 2,500 questions across mathematics, humanities, and natural sciences, designed to test LLM capabilities at the frontier of human knowledge with unambiguous, verifiable solutions

Methodology

Imported from llm-stats public benchmark metadata. Modality: multimodal. Max score: 1. Categories: reasoning, math. Language: en. Multilingual: no. Verified by llm-stats: no.

Leaderboard

  1. Grok 4 Fast self-reported llm-stats
    20.0%
  2. GLM-4.5 self-reported llm-stats
    17.2%
  3. GLM-4.6 self-reported llm-stats
    17.2%
  4. GLM-4.5-Air self-reported llm-stats
    10.6%
  5. Kimi K2-Instruct-0905 self-reported llm-stats
    4.7%