BIG-Bench
math official site →
Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark consisting of 204+ tasks designed to probe large language models and extrapolate their future capabilities. It covers diverse domains including linguistics, mathematics, common-sense reasoning, biology, physics, social bias, software development, and more. The benchmark focuses on tasks believed to be beyond current language model capabilities and includes both English and non-English tasks across multiple languages.
Methodology
Imported from llm-stats public benchmark metadata. Modality: text. Max score: 1. Categories: language, math, reasoning. Language: en. Verified by llm-stats: no.