SuperGPQA
math official site →
SuperGPQA is a comprehensive benchmark that evaluates large language models across 285 graduate-level academic disciplines. The benchmark contains 25,957 questions covering 13 broad disciplinary areas including Engineering, Medicine, Science, and Law, with specialized fields in light industry, agriculture, and service-oriented domains. It employs a Human-LLM collaborative filtering mechanism with over 80 expert annotators to create challenging questions that assess graduate-level knowledge and reasoning capabilities.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: chemistry, economics, finance, general, healthcare, legal, math, physics, reasoning. Language: en. Verified by llm-stats: no.