Mercury 2

Mercury 2 is the fastest reasoning LLM, built on diffusion-based language model (dLLM) architecture. Instead of generating text token-by-token, it refines multiple text blocks simultaneously, achieving over 1,000 tokens per second on Nvidia Blackwell GPUs — 5x faster than leading speed-optimized LLMs. Supports tool usage and JSON output with 128K context window.

Benchmark results

Benchmark Score Tags Source
AIME 2025 91.1% self-reported llm-stats link →
GPQA 74.0% self-reported llm-stats link →
IFBench 71.0% self-reported llm-stats link →
LiveCodeBench 67.0% self-reported llm-stats link →
SciCode 38.0% self-reported llm-stats link →
Tau2 Airline 53.0% self-reported llm-stats link →