Nemotron 3 Nano (30B A3B)

Nemotron 3 Nano is a 31.6B hybrid MoE model optimized for fast, long‑context agentic reasoning. It mixes Mamba‑2 and Transformer layers with a sparse MoE router (~3.6B active params per token) to deliver up to 4× higher throughput than Nemotron 2 and strong accuracy across math, coding, and tools.

AIME 2025

99.2%

i
WMT24++

86.2%

i
MMLU-Pro

78.3%

i
GPQA

75.0%

i
LiveCodeBench v6

68.3%

i
Arena-Hard v2

67.7%

i
MMLU-ProX

59.5%

i
Tau2 Retail

56.9%

i
Tau2 Airline

48.0%

i
Tau2 Telecom

42.2%

i
SWE-Bench Verified

38.8%

i
Multi-Challenge

38.5%

i
SciCode

33.3%

i
Humanity's Last Exam

15.5%

i
Terminal-Bench

8.5%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
DeepInfra	available	$0.05/Mtok	$0.20/Mtok	262K tokens context 228K tokens max output	99.2% 5m 99.1%	1,247 ms p50 TTFT 67 tok/s p50	fp4
Novita	available	$0.05/Mtok	$0.20/Mtok	262K tokens context 33K tokens max output	100.0% 5m 100.0%	545 ms p50 TTFT 170 tok/s p50	fp4
Nebius	available	$0.06/Mtok	$0.24/Mtok	262K tokens context	96% 5m 100.0%	299 ms p50 TTFT 134 tok/s p50	fp8