Claude Opus 4.7 underperforms Opus 4.6 and Mythos Preview in cybersecurity vulnerability reproduction.
Source Videos (1)
19 Claude Opus 4.7 Insights You Wouldn’t Get From the Headlines
AI Explained
Related Claims
Claude Mythos autonomously found and chained together several vulnerabilities in the Linux kernel, allowing an attacker to escalate from an ordinary user to complete control of the machine.
Claude Opus 4.7 scores worse than Opus 4.6 on Simple Bench, a benchmark designed to test common sense.
On SWEBench Pro, Claude Mythos achieved a 78% score, while Opus previously scored 53% and GPT 5.4 scored 57.7%.
Claude Opus 4.7 shows clear improvement over Opus 4.6 in some long context reasoning measures but is a regression in others, such as finding the fourth poem across 1 million tokens.
Anthropic's testing found that Mythos Preview can identify and exploit zero-day vulnerabilities in every major operating system and web browser when directed by a user.