Nemotron 3 Super (120B A12B)

Nemotron 3 Super is a 120B total / 12B active parameter hybrid Mamba-Attention Mixture-of-Experts model optimized for agentic reasoning, coding, planning, tool calling, and long-context analysis. It introduces LatentMoE (projecting tokens into a compressed latent space for expert routing, enabling 4x more experts at the same inference cost), Multi-Token Prediction for native speculative decoding (up to 3x faster generation), and native NVFP4 pretraining on Blackwell.

HMMT 2025

94.7%

i
RULER

91.8%

i
AIME 2025

90.2%

i
WMT24++

86.7%

i
MMLU-Pro

83.7%

i
GPQA

82.7%

i
LiveCodeBench

81.2%

i
MMLU-ProX

79.4%

i
Arena-Hard v2

73.9%

i
IFBench

72.6%

i
Tau2 Telecom

64.4%

i
Tau2 Retail

62.8%

i
AA-LCR

58.3%

i
Tau2 Airline

56.3%

i
Multi-Challenge

55.2%

i
SWE-Bench Verified

53.7%

i
SWE-bench Multilingual

45.8%

i
SciCode

42.0%

i
Bird-SQL (dev)

41.8%

i
BrowseComp

31.3%

i
Terminal-Bench 2.0

31.0%

i
Terminal-Bench

25.8%

i
Humanity's Last Exam

22.8%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DigitalOcean	available	$0.21/Mtok cache $0.06/Mtok	$0.45/Mtok	1.0M tokens context 262K tokens max output	95% 5m 99%	8,787 ms p50 TTFT 5.0 tok/s p50
Nebius	available	$0.30/Mtok	$0.90/Mtok	262K tokens context 262K tokens max output	99.9% 5m 100.0%	841 ms p50 TTFT 215 tok/s p50	fp4
DekaLLM	-2	$0.08/Mtok	$0.45/Mtok	262K tokens context 262K tokens max output	87% 5m 88%	17,104 ms p50 TTFT 2.0 tok/s p50	fp8
DeepInfra	-2	$0.08/Mtok	$0.40/Mtok	262K tokens context 16K tokens max output	90% 5m 97%	5,574 ms p50 TTFT 16 tok/s p50	bf16