Claude Mythos achieved an 82% score on the terminal bench, an increase from the previous 65%.

tech

Videos

100%

Confidence

4/10/2026

First Seen

4/10/2026

Last Seen

partially true

AI Fact-Check

Source Videos (1)

Claude Mythos and the end of software

Theo - t3․gg

5:13

View

Related Claims

Claude Opus 4.7 scores worse than Opus 4.6 on Simple Bench, a benchmark designed to test common sense.

tech1 video

Anthropic's Claude Mythos model is a much bigger, more expensive, slower, but more powerful model compared to Opus.

tech1 video

On SWEBench Pro, Claude Mythos achieved a 78% score, while Opus previously scored 53% and GPT 5.4 scored 57.7%.

tech1 video

Anthropic engaged a clinical psychiatrist to perform a psychological exam on Claude Mythos, which concluded it had a relatively healthy personality organization with concerns about identity and a compulsion to perform.

tech1 video

On Humanity's Last Exam, Claude Mythos improved its score from 40% to 56.8%, and to 64.7% when given tools.

tech1 video