BIG-Bench

math

Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark consisting of 204+ tasks designed to probe large language models and extrapolate their future capabilities. It covers diverse domains including linguistics, mathematics, common-sense reasoning, biology, physics, social bias, software development, and more. The benchmark focuses on tasks believed to be beyond current language model capabilities and includes both English and non-English tasks across multiple languages.

Leaderboard

Showing 3 of 3 results

Gemini 1.0 Pro

75.0%

i
Gemma 2 27B

74.9%

i
Gemma 2 9B

68.2%

i