Claude Mythos achieved an 82% score on the terminal bench, an increase from the previous 65%.
other
1
Videos
100%
Confidence
4/10/2026
First Seen
4/10/2026
Last Seen
partially true
AI Fact-Check
Source Videos (1)
Claude Mythos and the end of software
Theo - t3․gg
5:13
Related Claims
Claude Opus 4.7 scores worse than Opus 4.6 on Simple Bench, a benchmark designed to test common sense.
other1 video
Anthropic's Claude Mythos model is a much bigger, more expensive, slower, but more powerful model compared to Opus.
other1 video
On SWEBench Pro, Claude Mythos achieved a 78% score, while Opus previously scored 53% and GPT 5.4 scored 57.7%.
other1 video
Anthropic engaged a clinical psychiatrist to perform a psychological exam on Claude Mythos, which concluded it had a relatively healthy personality organization with concerns about identity and a compulsion to perform.
other1 video
On Humanity's Last Exam, Claude Mythos improved its score from 40% to 56.8%, and to 64.7% when given tools.
other1 video