Mercury 2
Mercury 2 is the fastest reasoning LLM, built on diffusion-based language model (dLLM) architecture. Instead of generating text token-by-token, it refines multiple text blocks simultaneously, achieving over 1,000 tokens per second on Nvidia Blackwell GPUs — 5x faster than leading speed-optimized LLMs. Supports tool usage and JSON output with 128K context window.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 91.1% | self-reported llm-stats | link → |
| GPQA | 74.0% | self-reported llm-stats | link → |
| IFBench | 71.0% | self-reported llm-stats | link → |
| LiveCodeBench | 67.0% | self-reported llm-stats | link → |
| SciCode | 38.0% | self-reported llm-stats | link → |
| Tau2 Airline | 53.0% | self-reported llm-stats | link → |