Catalogue

Latest models

Model Provider Released Context Weights
U2 Unisound Jun 5, 2026 proprietary
MAI-Code-1-Flash Microsoft Jun 2, 2026 proprietary
MAI-Thinking-1 Microsoft Jun 2, 2026 proprietary
MiniMax M3 MiniMax Jun 1, 2026 open
Claude Opus 4.8 Anthropic May 28, 2026 proprietary
Gemini 3.5 Flash Google May 19, 2026 proprietary

Featured leaderboards

AA-LCR

# Model Score
1 Kimi K2.5 70.0%
2 Qwen3.5-397B-A17B 68.7%
3 Qwen3.6 Plus 68.3%
4 Qwen3.5-122B-A10B 66.9%
5 Qwen3.5-27B 66.1%
# Model Score
1 Mistral Small 3 24B Base 65.8%
2 Ministral 3 (14B Base 2512) 64.8%
3 Hermes 3 70B 56.2%
4 Gemma 2 27B 55.1%
5 Gemma 2 9B 52.8%

AI2D

# Model Score
1 Claude 3.5 Sonnet 94.7%
2 Qwen3.6 Plus 94.4%
3 GPT-4o 94.2%
4 Pixtral Large 93.8%
5 Qwen3.5-122B-A10B 93.3%
# Model Score
1 GPT-5 88.0%
2 Gemini 2.5 Pro Preview 06-05 82.2%
3 o3 81.3%
4 Gemini 2.5 Pro 76.5%
5 DeepSeek-V3.2-Exp 74.5%