Grok-4
Grok 4, announced by xAI in summer 2025, represents a major leap in AI capabilities, described as 'the smartest AI in the world.' Built on version 6 of xAI's foundation model, it uses 100x more training compute than Grok 2 and 10x more reinforcement learning compute than Grok 3. The model achieves PhD-level performance across all academic disciplines simultaneously, scoring perfect on standardized tests like the SAT and near-perfect on graduate exams like the GRE. Unlike Grok 3, tool usage is built into the training process rather than relying on generalization. Trained using 200,000 GPUs, Grok 4 excels at complex reasoning, mathematical problem-solving, and coding tasks, though it has acknowledged weaknesses in multimodal capabilities that are being addressed in the next version.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| AIME 2025 | 91.7% | self-reported llm-stats | link → |
| ARC-AGI v2 | 15.9% | self-reported llm-stats | link → |
| GPQA | 87.5% | self-reported llm-stats | link → |
| HMMT25 | 90.0% | self-reported llm-stats | link → |
| Humanity's Last Exam | 40.0% | self-reported llm-stats | link → |
| LiveCodeBench | 79.0% | self-reported llm-stats | link → |
| USAMO25 | 37.5% | self-reported llm-stats | link → |