Benchmarks

AA-Indexgeneral%3
AA-LCRreasoning%8
ACEBenchreasoning%2
ActivityNetvision%1
AdvancedIFreasoning%2
AGIEvalmath%8
AI2 Reasoning Challenge (ARC)reasoning%1
AI2Dreasoning%31
Aidercoding%4
Aider-Polyglotcoding%21
Aider-Polyglot Editcoding%10
AIMEmath%1
AIME 2024math%51
AIME 2025math%108
AIME 2026math%12
AIR-Benchsafety%1
AITZ_EMreasoning%3
AlignBenchmath%4
AlpacaEval 2.0reasoning%4
AMC_2022_23math%2