SuperGPQA

math

SuperGPQA is a comprehensive benchmark that evaluates large language models across 285 graduate-level academic disciplines. The benchmark contains 25,957 questions covering 13 broad disciplinary areas including Engineering, Medicine, Science, and Law, with specialized fields in light industry, agriculture, and service-oriented domains. It employs a Human-LLM collaborative filtering mechanism with over 80 expert annotators to create challenging questions that assess graduate-level knowledge and reasoning capabilities.

Leaderboard

Showing 20 of 31 results

Qwen3.7 Max

73.6%

i
Qwen3.6 Plus

71.6%

i
Qwen3.5-397B-A17B

70.4%

i
Qwen3.5-122B-A10B

67.1%

i
Qwen3.6-27B

66.0%

i
Qwen3.5-27B

65.6%

i
Qwen3 Max

65.1%

i
Qwen3-235B-A22B-Thinking-2507

64.9%

i
Qwen3.6-35B-A3B

64.7%

i
Qwen3 VL 235B A22B Thinking

64.3%

i
Qwen3.5-35B-A3B

63.4%

i
Qwen3-235B-A22B-Instruct-2507

62.6%

i
Qwen3-Next-80B-A3B-Thinking

60.8%

i
Qwen3 VL 235B A22B Instruct

60.4%

i
Qwen3 VL 32B Thinking

59.0%

i
Qwen3-Next-80B-A3B-Instruct

58.8%

i
Qwen3.5-9B

58.2%

i
Kimi K2 Instruct

57.2%

i
Kimi K2-Instruct-0905

57.2%

i
Qwen3 VL 30B A3B Thinking

56.4%

i