Nova 2 Pro
Amazon Nova 2 Pro is Amazon's highest-intelligence model for complex workloads, simultaneously processing text, documents, images, video, and audio for large-scale multimodal reasoning. It features hybrid reasoning with configurable extended thinking and excels at high-accuracy tasks, multi-step planning, long-document analysis, advanced math, and autonomous agentic and software-engineering tasks. It supports up to 1M tokens of context and matches or exceeds Claude Sonnet 4.5, GPT-5/GPT-5.1, and Gemini 2.5 Pro/Gemini 3 Pro across broad benchmarks.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 92.3% | self-reported llm-stats | link → |
| BFCL-V4 | 61.6% | self-reported llm-stats | link → |
| GPQA | 81.4% | self-reported llm-stats | link → |
| IFBench | 80.2% | self-reported llm-stats | link → |
| LiveCodeBench | 74.6% | self-reported llm-stats | link → |
| LongCodeBench | 84.0% | self-reported llm-stats | link → |
| MMLU-Pro | 81.6% | self-reported llm-stats | link → |
| MMMU-Pro | 63.5% | self-reported llm-stats | link → |
| Multi-Challenge | 77.7% | self-reported llm-stats | link → |
| OCRBench_V2 | 64.5% | self-reported llm-stats | link → |
| QVHighlights | 76.7% | self-reported llm-stats | link → |
| RealKIE-FCC | 67.0% | self-reported llm-stats | link → |
| ScreenSpot | 88.1% | self-reported llm-stats | link → |
| SWE-Bench Verified | 70.0% | self-reported llm-stats | link → |
| Tau2 Airline | 65.2% | self-reported llm-stats | link → |
| Tau2 Retail | 77.7% | self-reported llm-stats | link → |
| Tau2 Telecom | 92.7% | self-reported llm-stats | link → |
| Terminal-Bench | 41.3% | self-reported llm-stats | link → |