On Humanity's Last Exam, Claude Mythos improved its score from 40% to 56.8%, and to 64.7% when given tools.
AI Fact-Check
“The AI model "Claude Mythos" and the benchmark "Humanity's Last Exam" appear to be fictional concepts created for the video's speculative narrative. Searches for these terms yield no results from Anthropic's official website, research papers, or reputable AI news outlets. The video itself is set in the future (April 2026) and presents a hypothetical scenario about the dangers of advanced AI. As these entities do not exist in the real world, the performance statistics are unverifiable. Context: This claim is part of a speculative, fictional video that discusses a hypothetical future AI model from Anthropic. The video's narrative frames these fictional events as real news from April 2026.”
Source Videos (1)
Claude Mythos and the end of software
Theo - t3․gg
Related Claims
Claude Mythos autonomously found and chained together several vulnerabilities in the Linux kernel, allowing an attacker to escalate from an ordinary user to complete control of the machine.
Claude Mythos achieved an 82% score on the terminal bench, an increase from the previous 65%.
The Claude Mythos preview has been officially announced, and its system card has been made available.
On SWEBench Pro, Claude Mythos achieved a 78% score, while Opus previously scored 53% and GPT 5.4 scored 57.7%.
Anthropic engaged a clinical psychiatrist to perform a psychological exam on Claude Mythos, which concluded it had a relatively healthy personality organization with concerns about identity and a compulsion to perform.