Nova 2 Omni
Amazon Nova 2 Omni is Amazon's first unified multimodal reasoning model that processes text, documents, images, video, and audio inputs and generates both text and images from a single model, eliminating multi-model coordination complexity. It delivers strong multimodal perception, core reasoning, agentic tool use, and high-quality image generation and editing, with configurable extended thinking. It supports a 1M token context window, 200+ languages for text, and 10 languages for speech input.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 92.1% | self-reported llm-stats | link → |
| BFCL-V4 | 58.3% | self-reported llm-stats | link → |
| CoVoST2 | 40.7% | self-reported llm-stats | link → |
| IFBench | 68.7% | self-reported llm-stats | link → |
| MAVERIX | 66.6% | self-reported llm-stats | link → |
| MMAU | 75.3% | self-reported llm-stats | link → |
| MMLU-Pro | 80.7% | self-reported llm-stats | link → |
| MMMU-Pro | 61.4% | self-reported llm-stats | link → |
| Multi-Challenge | 75.5% | self-reported llm-stats | link → |
| OCRBench_V2 | 58.2% | self-reported llm-stats | link → |
| QVHighlights | 76.7% | self-reported llm-stats | link → |
| RealKIE-FCC | 59.8% | self-reported llm-stats | link → |
| RefCOCOg | 86.3% | self-reported llm-stats | link → |
| ScreenSpot | 85.4% | self-reported llm-stats | link → |
| Tau2 Airline | 68.8% | self-reported llm-stats | link → |
| Tau2 Retail | 78.3% | self-reported llm-stats | link → |
| Tau2 Telecom | 80.0% | self-reported llm-stats | link → |
| Video-MME | 77.9% | self-reported llm-stats | link → |