Nova 2 Omni

Amazon Nova 2 Omni is Amazon's first unified multimodal reasoning model that processes text, documents, images, video, and audio inputs and generates both text and images from a single model, eliminating multi-model coordination complexity. It delivers strong multimodal perception, core reasoning, agentic tool use, and high-quality image generation and editing, with configurable extended thinking. It supports a 1M token context window, 200+ languages for text, and 10 languages for speech input.

Benchmark results

Benchmark Score Tags Source
AIME 2025 92.1% self-reported llm-stats link →
BFCL-V4 58.3% self-reported llm-stats link →
CoVoST2 40.7% self-reported llm-stats link →
IFBench 68.7% self-reported llm-stats link →
MAVERIX 66.6% self-reported llm-stats link →
MMAU 75.3% self-reported llm-stats link →
MMLU-Pro 80.7% self-reported llm-stats link →
MMMU-Pro 61.4% self-reported llm-stats link →
Multi-Challenge 75.5% self-reported llm-stats link →
OCRBench_V2 58.2% self-reported llm-stats link →
QVHighlights 76.7% self-reported llm-stats link →
RealKIE-FCC 59.8% self-reported llm-stats link →
RefCOCOg 86.3% self-reported llm-stats link →
ScreenSpot 85.4% self-reported llm-stats link →
Tau2 Airline 68.8% self-reported llm-stats link →
Tau2 Retail 78.3% self-reported llm-stats link →
Tau2 Telecom 80.0% self-reported llm-stats link →
Video-MME 77.9% self-reported llm-stats link →