Gemma 4 31B
Gemma 4 31B is Google DeepMind's flagship dense multimodal model with 31 billion parameters and a 256K context window. Ranks #3 among open models on Arena AI. Built from the same research as Gemini 3, it features Per-Layer Embeddings, Shared KV Cache, alternating sliding-window and global attention, and variable aspect ratio vision encoding. Achieves an estimated LMArena text score of 1452.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2026 | 89.2% | self-reported llm-stats | link → |
| BIG-Bench Extra Hard | 74.4% | self-reported llm-stats | link → |
| GPQA | 84.3% | self-reported llm-stats | link → |
| Humanity's Last Exam | 26.5% | self-reported llm-stats | link → |
| LiveCodeBench v6 | 80.0% | self-reported llm-stats | link → |
| MathVision | 85.6% | self-reported llm-stats | link → |
| MedXpertQA | 61.3% | self-reported llm-stats | link → |
| MMLU-Pro | 85.2% | self-reported llm-stats | link → |
| MMMLU | 88.4% | self-reported llm-stats | link → |
| MMMU-Pro | 76.9% | self-reported llm-stats | link → |
| MRCR v2 | 66.4% | self-reported llm-stats | link → |
| t2-bench | 86.4% | self-reported llm-stats | link → |