Nemotron 3 Super (120B A12B)

Nemotron 3 Super is a 120B total / 12B active parameter hybrid Mamba-Attention Mixture-of-Experts model optimized for agentic reasoning, coding, planning, tool calling, and long-context analysis. It introduces LatentMoE (projecting tokens into a compressed latent space for expert routing, enabling 4x more experts at the same inference cost), Multi-Token Prediction for native speculative decoding (up to 3x faster generation), and native NVFP4 pretraining on Blackwell. The hybrid architecture interleaves Mamba-2 layers for linear-time sequence processing with strategically placed Transformer attention layers as global anchors, supporting a 1M-token context window. Pre-trained on 25 trillion tokens and post-trained with multi-environment RL across 21 configurations using NeMo Gym/RL with 1.2 million rollouts. Achieves up to 5x higher throughput than previous Nemotron Super and 2.2x higher throughput than GPT-OSS-120B while maintaining comparable accuracy.

Benchmark results

Benchmark Score Tags Source
AA-LCR 58.3% self-reported llm-stats link →
AIME 2025 90.2% self-reported llm-stats link →
Arena-Hard v2 73.9% self-reported llm-stats link →
Bird-SQL (dev) 41.8% self-reported llm-stats link →
BrowseComp 31.3% self-reported llm-stats link →
GPQA 82.7% self-reported llm-stats link →
HMMT 2025 94.7% self-reported llm-stats link →
Humanity's Last Exam 22.8% self-reported llm-stats link →
IFBench 72.6% self-reported llm-stats link →
LiveCodeBench 81.2% self-reported llm-stats link →
MMLU-Pro 83.7% self-reported llm-stats link →
MMLU-ProX 79.4% self-reported llm-stats link →
Multi-Challenge 55.2% self-reported llm-stats link →
RULER 91.8% self-reported llm-stats link →
SciCode 42.0% self-reported llm-stats link →
SWE-bench Multilingual 45.8% self-reported llm-stats link →
SWE-Bench Verified 53.7% self-reported llm-stats link →
Tau2 Airline 56.3% self-reported llm-stats link →
Tau2 Retail 62.8% self-reported llm-stats link →
Tau2 Telecom 64.4% self-reported llm-stats link →
Terminal-Bench 25.8% self-reported llm-stats link →
Terminal-Bench 2.0 31.0% self-reported llm-stats link →
WMT24++ 86.7% self-reported llm-stats link →