Claude Opus 4.7
Claude Opus 4.7 is Anthropic's latest Opus-class model, a direct upgrade to Opus 4.6 with notable improvements in advanced software engineering, particularly on the most difficult tasks. It handles complex, long-running agentic workflows with rigor and consistency, follows instructions more literally and precisely, and verifies its own outputs before reporting back. Substantially improved vision supports high-resolution images up to 2,576 pixels on the long edge (~3.75 megapixels, over 3x prior Claude models), unlocking dense screenshot reading, complex diagram extraction, and pixel-perfect references. Better file system-based memory enables coherent multi-session work. Introduces a new 'xhigh' effort level between 'high' and 'max' for finer control over the reasoning/latency tradeoff, and ships with task budgets (public beta) on the Claude Platform. Uses an updated tokenizer (inputs may map to ~1.0-1.35x more tokens than Opus 4.6). Released with automated safeguards that detect and block prohibited or high-risk cybersecurity uses. Available across Claude products, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing: $5/$25 per million tokens (input/output), unchanged from Opus 4.6.
Benchmark results
| Benchmark | Score | Tags | Source |
|---|---|---|---|
| BrowseComp | 79.3% | self-reported llm-stats | link → |
| CharXiv-R | 91.0% | self-reported llm-stats | link → |
| CyberGym | 73.1% | self-reported llm-stats | link → |
| Finance Agent | 64.4% | self-reported llm-stats | link → |
| GPQA | 94.2% | self-reported llm-stats | link → |
| GPQA Diamond | 83.3% | ||
| HumanEval | 95.0% | ||
| Humanity's Last Exam | 54.7% | self-reported llm-stats | link → |
| MCP Atlas | 77.3% | self-reported llm-stats | link → |
| MMMLU | 91.5% | self-reported llm-stats | link → |
| MMMU | 76.1% | ||
| OSWorld-Verified | 78.0% | self-reported llm-stats | link → |
| SWE-Bench Pro | 64.3% | self-reported llm-stats | link → |
| SWE-Bench Verified | 87.6% | self-reported llm-stats | link → |
| Terminal-Bench 2.0 | 69.4% | self-reported llm-stats | link → |