Claude Mythos Preview
Claude Mythos Preview is an unreleased general-purpose frontier model from Anthropic, a new tier above Opus (internal codename 'Capybara'). It identified thousands of zero-day vulnerabilities across every major operating system and web browser as part of Project Glasswing, a cross-industry cybersecurity initiative with 12 partners including AWS, Apple, Microsoft, and Google. State-of-the-art on SWE-bench Verified (93.9%), GPQA Diamond (94.6%), USAMO (97.6%), Terminal-Bench 2.0 (82.0%), CyberGym (83.1%), and Cybench (100% pass@1, saturated). Represents a 4.3x increase over the previous trendline for model performance. Deployed under ASL-3 Standard. Best-aligned Claude model to date per Anthropic's risk report, with the first-ever 24-hour internal alignment review before deployment. Not planned for general availability. Pricing for participants: $25/$125 per million tokens (input/output). 244-page system card.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| BrowseComp | 86.9% | self-reported llm-stats | link → |
| CharXiv-R | 93.2% | self-reported llm-stats | link → |
| CyBench | 100.0% | self-reported llm-stats | link → |
| CyberGym | 83.1% | self-reported llm-stats | link → |
| FigQA | 89.0% | self-reported llm-stats | link → |
| GPQA | 94.6% | self-reported llm-stats | link → |
| Graphwalks BFS >128k | 80.0% | self-reported llm-stats | link → |
| Humanity's Last Exam | 64.7% | self-reported llm-stats | link → |
| MMMLU | 92.7% | self-reported llm-stats | link → |
| OSWorld-Verified | 79.6% | self-reported llm-stats | link → |
| SWE-bench Multilingual | 87.3% | self-reported llm-stats | link → |
| SWE-Bench Multimodal | 59.0% | self-reported llm-stats | link → |
| SWE-Bench Pro | 77.8% | self-reported llm-stats | link → |
| SWE-Bench Verified | 93.9% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 82.0% | self-reported llm-stats | link → |
| USAMO25 | 97.6% | self-reported llm-stats | link → |