Mercury 2

Mercury 2 is the fastest reasoning LLM, built on diffusion-based language model (dLLM) architecture. Instead of generating text token-by-token, it refines multiple text blocks simultaneously, achieving over 1,000 tokens per second on Nvidia Blackwell GPUs — 5x faster than leading speed-optimized LLMs.

AIME 2025

91.1%

i
GPQA

74.0%

i
IFBench

71.0%

i
LiveCodeBench

67.0%

i
Tau2 Airline

53.0%

i
SciCode

38.0%

i

Pricing, uptime, and speed via OpenRouter — updated Jul 17, 2026, 04:19 AM.

Provider	Status	Input	Output	Limits	Uptime	Speed	Notes
Inception	available	$0.25/Mtok cache $0.02/Mtok	$0.75/Mtok	128K tokens context 50K tokens max output	100.0% 5m 100.0%	639 ms p50 TTFT 352 tok/s p50