Claude Opus 4.7 scores worse than Opus 4.6 on Simple Bench, a benchmark designed to test common sense.
other
1
Videos
90%
Confidence
4/18/2026
First Seen
4/18/2026
Last Seen
Source Videos (1)
19 Claude Opus 4.7 Insights You Wouldn’t Get From the Headlines
AI Explained
1:07
Related Claims
Claude Mythos achieved an 82% score on the terminal bench, an increase from the previous 65%.
other1 video
On SWEBench Pro, Claude Mythos achieved a 78% score, while Opus previously scored 53% and GPT 5.4 scored 57.7%.
other1 video
Claude Opus 4.7 shows clear improvement over Opus 4.6 in some long context reasoning measures but is a regression in others, such as finding the fourth poem across 1 million tokens.
other1 video
Claude Opus 4.7 underperforms Opus 4.6 and Mythos Preview in cybersecurity vulnerability reproduction.
tech1 video
Anthropic intentionally reduced Claude Opus 4.7's cybersecurity vulnerability finding capabilities during training, as stated on page 48 of its system card.
tech1 video