Gemini 3 Pro
Gemini 3 Pro is the first model in the new Gemini 3 series. It is best for complex tasks that require broad world knowledge and advanced reasoning across modalities. Gemini 3 Pro uses dynamic thinking by default to reason through prompts, and features a 1 million-token input context window with 64k output tokens.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 100.0% | self-reported llm-stats | link → |
| ARC-AGI v2 | 31.1% | self-reported llm-stats | link → |
| CharXiv-R | 81.4% | self-reported llm-stats | link → |
| FACTS Grounding | 70.5% | self-reported llm-stats | link → |
| Global PIQA | 93.4% | self-reported llm-stats | link → |
| GPQA | 91.9% | self-reported llm-stats | link → |
| Humanity's Last Exam | 45.8% | self-reported llm-stats | link → |
| LiveCodeBench Pro | 2,439 | self-reported llm-stats | link → |
| MathArena Apex | 23.4% | self-reported llm-stats | link → |
| MMMLU | 91.8% | self-reported llm-stats | link → |
| MMMU-Pro | 81.0% | self-reported llm-stats | link → |
| MRCR v2 (8-needle) | 26.3% | self-reported llm-stats | link → |
| OmniDocBench 1.5 | 11.5% | self-reported llm-stats | link → |
| ScreenSpot Pro | 72.7% | self-reported llm-stats | link → |
| SimpleQA | 72.1% | self-reported llm-stats | link → |
| SWE-Bench Verified | 76.2% | self-reported llm-stats | link → |
| t2-bench | 85.4% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 54.2% | self-reported llm-stats | link → |
| Vending-Bench 2 | 5,478.16 | self-reported llm-stats | link → |
| VideoMMMU | 87.6% | self-reported llm-stats | link → |