Claude Opus 4.7

Claude Opus 4.7 is Anthropic's latest Opus-class model, a direct upgrade to Opus 4.6 with notable improvements in advanced software engineering, particularly on the most difficult tasks. It handles complex, long-running agentic workflows with rigor and consistency, follows instructions more literally and precisely, and verifies its own outputs before reporting back. Substantially improved vision supports high-resolution images up to 2,576 pixels on the long edge (~3.75 megapixels, over 3x prior Claude models), unlocking dense screenshot reading, complex diagram extraction, and pixel-perfect references. Better file system-based memory enables coherent multi-session work. Introduces a new 'xhigh' effort level between 'high' and 'max' for finer control over the reasoning/latency tradeoff, and ships with task budgets (public beta) on the Claude Platform. Uses an updated tokenizer (inputs may map to ~1.0-1.35x more tokens than Opus 4.6). Released with automated safeguards that detect and block prohibited or high-risk cybersecurity uses. Available across Claude products, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing: $5/$25 per million tokens (input/output), unchanged from Opus 4.6.

Benchmark results

Benchmark Score Tags Source
BrowseComp 79.3% self-reported llm-stats link →
CharXiv-R 91.0% self-reported llm-stats link →
CyberGym 73.1% self-reported llm-stats link →
Finance Agent 64.4% self-reported llm-stats link →
GPQA 94.2% self-reported llm-stats link →
GPQA Diamond 83.3%
HumanEval 95.0%
Humanity's Last Exam 54.7% self-reported llm-stats link →
MCP Atlas 77.3% self-reported llm-stats link →
MMMLU 91.5% self-reported llm-stats link →
MMMU 76.1%
OSWorld-Verified 78.0% self-reported llm-stats link →
SWE-Bench Pro 64.3% self-reported llm-stats link →
SWE-Bench Verified 87.6% self-reported llm-stats link →
Terminal-Bench 2.0 69.4% self-reported llm-stats link →