LiveBench 20241125

math

LiveBench is a challenging, contamination-limited LLM benchmark that addresses test set contamination by releasing new questions monthly based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. It comprises tasks across math, coding, reasoning, language, instruction following, and data analysis with verifiable, objective ground-truth answers.

Leaderboard

Showing 14 of 14 results

Qwen3 VL 235B A22B Thinking

79.6%

i
Qwen3-235B-A22B-Thinking-2507

78.4%

i
Qwen3-Next-80B-A3B-Thinking

76.6%

i
Qwen3-Next-80B-A3B-Instruct

75.8%

i
Qwen3-235B-A22B-Instruct-2507

75.4%

i
Qwen3 VL 235B A22B Instruct

74.8%

i
Qwen3 VL 32B Thinking

74.7%

i
Qwen3 VL 32B Instruct

72.2%

i
Qwen3 VL 30B A3B Thinking

72.1%

i
Qwen3 VL 8B Thinking

69.8%

i
Qwen3 VL 4B Thinking

68.4%

i
Qwen3 VL 30B A3B Instruct

65.4%

i
Qwen3 VL 8B Instruct

62.0%

i
Qwen3 VL 4B Instruct

60.9%

i