Claude Opus 4.7 scores worse than Opus 4.6 on Simple Bench, a benchmark designed to test common sense.

tech

Videos

90%

Confidence

4/18/2026

First Seen

4/18/2026

Last Seen

Source Videos (1)

19 Claude Opus 4.7 Insights You Wouldn’t Get From the Headlines

AI Explained

1:07

View

Related Claims

Claude Mythos achieved an 82% score on the terminal bench, an increase from the previous 65%.

tech1 video

On SWEBench Pro, Claude Mythos achieved a 78% score, while Opus previously scored 53% and GPT 5.4 scored 57.7%.

tech1 video

Claude Opus 4.7 shows clear improvement over Opus 4.6 in some long context reasoning measures but is a regression in others, such as finding the fourth poem across 1 million tokens.

tech1 video

Claude Opus 4.7 underperforms Opus 4.6 and Mythos Preview in cybersecurity vulnerability reproduction.

tech1 video

Anthropic intentionally reduced Claude Opus 4.7's cybersecurity vulnerability finding capabilities during training, as stated on page 48 of its system card.

tech1 video