LiveBench 20241125

math official site →

LiveBench is a challenging, contamination-limited LLM benchmark that addresses test set contamination by releasing new questions monthly based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. It comprises tasks across math, coding, reasoning, language, instruction following, and data analysis with verifiable, objective ground-truth answers.

Methodology

Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: general, math, reasoning. Language: en. Verified by llm-stats: no.

Leaderboard

  1. Qwen3 VL 235B A22B Thinking self-reported llm-stats
    79.6%
  2. Qwen3-235B-A22B-Thinking-2507 self-reported llm-stats
    78.4%
  3. Qwen3-Next-80B-A3B-Thinking self-reported llm-stats
    76.6%
  4. Qwen3-Next-80B-A3B-Instruct self-reported llm-stats
    75.8%
  5. Qwen3-235B-A22B-Instruct-2507 self-reported llm-stats
    75.4%
  6. Qwen3 VL 235B A22B Instruct self-reported llm-stats
    74.8%
  7. Qwen3 VL 32B Thinking self-reported llm-stats
    74.7%
  8. Qwen3 VL 32B Instruct self-reported llm-stats
    72.2%
  9. Qwen3 VL 30B A3B Thinking self-reported llm-stats
    72.1%
  10. Qwen3 VL 8B Thinking self-reported llm-stats
    69.8%
  11. Qwen3 VL 4B Thinking self-reported llm-stats
    68.4%
  12. Qwen3 VL 30B A3B Instruct self-reported llm-stats
    65.4%
  13. Qwen3 VL 8B Instruct self-reported llm-stats
    62.0%
  14. Qwen3 VL 4B Instruct self-reported llm-stats
    60.9%