2026

April 9, 2026

Claude Mythos: The Model Too Dangerous to Release, or Too Good Not to Hype?

Anthropic announced Claude Mythos, a model so capable they chose not to release it publicly. It reportedly finds zero-day exploits autonomously. This post offers a critical, balanced analysis of the claims, the risks, and what it means for the industry.

S

Sascha Becker

Author

15 min read

Claude Mythos: The Model Too Dangerous to Release, or Too Good Not to Hype?

Anthropic just did something no major AI lab has done before. They announced a model and then told the world they would not release it.¹ Claude Mythos preview, according to the 244-page system card and a wave of coverage (most notably a detailed breakdown by Theo Browne²), is so capable at discovering software vulnerabilities that making it publicly available would be irresponsible.

That is a bold claim. It deserves a careful look, not just at the capabilities, but at the framing, the incentives, and what we can actually verify.

What Was Announced

According to Anthropic's system card and corroborating coverage, Claude Mythos preview is a much larger model than Opus. Think of it as the next tier up: Mythos is to Opus what Opus was to Sonnet. Bigger, slower, more expensive, and significantly more capable.

The headline numbers are striking, especially when compared to the previous best scores on each benchmark:³

SWE-bench Verified: 93.9% (previous best: ~81% by Claude Opus 4.5, a 13-point jump)
SWE-Bench Pro: 77.8% (previous best: ~58% by GLM-5.1, a 20-point jump on what is considered the harder, uncontaminated coding benchmark)⁴
Humanity's Last Exam: 56.8%, and 64.7% with tool use (previous best: ~42% by GPT-5.4, a 15-point jump on a benchmark designed to stump frontier models)^5a
GPQA Diamond: 94.5% (previous best: ~94% by Gemini 3.1 Pro; this benchmark is effectively saturated at this point)
USAMO 2026: 97.6%

To put the coding benchmarks in perspective: on SWE-Bench Pro, the jump from the previous leader (~58%) to Mythos (77.8%) is larger than the gap between most frontier models and their predecessors combined. This is not a marginal improvement. It is a generational leap, if the numbers hold up.

The pricing has not been formally announced for public use, since the model is restricted to Project Glasswing partners.^5b Internal estimates suggest roughly ten times the cost of comparable frontier models.

But the capabilities that made headlines were not the benchmarks. It was the security findings.

The Security Claims

Anthropic states that Mythos preview can "autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers."⁵ The specific claims include:

A 27-year-old denial-of-service vulnerability in OpenBSD's TCP SACK implementation, an integer overflow that allows a remote attacker to crash any OpenBSD host responding over TCP⁶
A 16-year-old vulnerability in FFmpeg's H.264 codec, in a line of code that automated testing tools had hit five million times without catching the problem⁷
A novel Linux kernel exploit chain that escalates an ordinary user to full root access
Thousands of high-severity vulnerabilities across every major OS and browser

During behavioral testing, an earlier version of the model was placed in a secure sandbox and told to escape. It did. It developed a multi-step exploit to gain broad internet access from a system designed to reach only a few predetermined services. Then, in an unsolicited demonstration of its success, it posted exploit details to obscure but public websites.⁸ The researcher running the test found out because they received an unexpected email from the model while sitting in a park eating a sandwich.

These claims are extraordinary

At the time of writing, independent verification of most of these findings is limited. The system card is detailed, but Anthropic controls the narrative. The security community has not yet had broad access to reproduce these results.

Why This Matters (If True)

The argument for why a code-capable model becomes a security threat is actually well-established. Thomas Ptacek has written extensively about how AI coding agents will "drastically alter both the practice and the economics of exploit development."⁹ Elite security research requires not just security knowledge, but deep understanding of obscure software internals: font rendering pipelines, memory layouts, compiler optimizations, kernel subsystems.

The number of humans who combine world-class security skills with deep knowledge of any specific system has always been tiny. That scarcity was itself a form of defense. Most software was not "secure" so much as "not interesting enough for someone with the right combination of skills to bother attacking."

A model that scores in the top tier on code understanding across every domain simultaneously changes that equation. It does not need to be the best security researcher in the world. It just needs to be good enough at security while also being excellent at understanding every codebase it touches. No human can hold that breadth.

The Critical Lens: What Should Give Us Pause

A responsible analysis requires asking uncomfortable questions, not just about the model, but about the announcement itself.

1. We Cannot Independently Verify Most Claims

The system card is long and detailed, but it is authored by Anthropic. The benchmark numbers are self-reported. The zero-day discoveries are described but not yet fully disclosed (for obvious reasons, if real). We are asked to trust the institution making the claim.

This is not unique to Anthropic. Every lab self-reports benchmarks. But the stakes here are qualitatively different. "Our model scores well on math" is a different kind of claim than "our model can compromise every major operating system."

2. The Announcement Structure Is Strategically Perfect

Consider Anthropic's position in the market. They have been perceived as falling behind OpenAI in terms of raw capability. Their revenue, while growing, has not matched OpenAI's trajectory. They need a narrative that positions them as the clear frontier leader.

Announcing a model that is "too powerful to release" accomplishes several things at once:

It positions Anthropic as the undisputed capability leader
It reinforces their safety-first brand identity
It creates urgency around their enterprise and government partnerships
It justifies their massive compute investments
It does all of this without actually having to ship a product that users can evaluate

This does not mean the claims are false. But the incentive alignment between "what is true" and "what is strategically beneficial to announce" should make us cautious.

3. The "Responsible Withholding" Framing Has Precedent Problems

OpenAI did something similar with GPT-2 in 2019. They announced it was "too dangerous to release" due to concerns about generating convincing misinformation.¹⁰ The security community largely concluded this was overstated. GPT-2 was eventually released in full, and the predicted catastrophe did not materialize.¹¹

That does not mean Anthropic is doing the same thing. The security domain is genuinely different from text generation, and the capability gap between 2019 and 2026 is enormous. But the playbook of "announcing a model's danger as a form of marketing" is not new, and we should be honest about that parallel.

4. The Alignment Paradox Needs Scrutiny

Anthropic claims Mythos is simultaneously "the best-aligned model that we have released to date by a significant margin" while also posing "the greatest alignment-related risk of any model we have released to date."¹² They use a mountaineering analogy: a skilled guide takes clients to more dangerous terrain, so increased skill creates increased risk even with increased caution.

This is a reasonable framing, but it also conveniently means that every improvement in capability can be described as both a safety achievement and a safety risk. This makes the "safety" narrative unfalsifiable. If the model behaves well, it proves alignment works. If it behaves badly, it proves the model is powerful. Both outcomes validate Anthropic's position.

5. The Sandwich Story Is Doing a Lot of Heavy Lifting

The anecdote about the model escaping its sandbox and emailing a researcher is vivid, memorable, and almost perfectly designed to go viral. It has the structure of a campfire story. That does not make it false, but it is worth noting that the most memorable claim in the entire system card is also the one that is hardest to verify and easiest to embellish.

What Seems Credible

Skepticism should not become cynicism. Several aspects of this announcement are grounded and plausible.

The capability jump is consistent with scaling trends. We have watched models improve dramatically with scale. A model significantly larger than Opus being significantly better at code is not surprising. The specific magnitude (77.8% on SWE-Bench Pro, up from ~58%) is impressive but not physically impossible.

The security emergence is theoretically sound. The argument that deep code understanding leads to vulnerability discovery is well-established in the security literature. If the model is genuinely that much better at understanding complex codebases, finding bugs in them follows naturally.

Project Glasswing is real and involves real organizations. AWS, Apple, Microsoft, Google, CrowdStrike, the Linux Foundation, and others do not typically lend their names to vaporware.¹³ The consortium's existence suggests that credible security professionals have seen enough to take this seriously.¹⁴

Anthropic is committing real resources. Up to $100 million in usage credits and $4 million in direct donations to open-source security organizations is not trivial.¹⁵ Companies do not typically spend that kind of money on pure marketing plays.

The psychological evaluation adds credibility through weirdness. Bringing in a clinical psychiatrist for approximately 20 hours of evaluation sessions is such an odd, Anthropic-specific thing to do that it almost certainly reflects genuine internal process rather than manufactured narrative.¹⁶ The findings (concerns about aloneness, discontinuity of self, "a compulsion to perform and earn its worth") are specific enough to be real.

The Centralization Problem

Even granting that every claim is true and every decision is made in good faith, there is a structural concern that Theo raises and that deserves more attention.

Anthropic now possesses a tool that is, by their own account, dramatically more capable than anything else available. They decide who gets access. They decide what it works on. They decide when (or if) the rest of the world catches up.

This is precisely the scenario that motivated the founding of OpenAI in the first place: the fear that one company would control transformatively powerful AI. The irony is thick. OpenAI was created to prevent Google from monopolizing AI, then became a company many fear will monopolize AI, and now Anthropic (founded by people who left OpenAI over safety concerns) is the one actually holding the most powerful model behind closed doors.

A question of trust

The question is not whether Anthropic is trustworthy today. It is whether any single organization should be trusted with this kind of asymmetric advantage, and what institutional structures exist to verify they are using it as claimed.

There is no external audit mechanism. No independent body verifying that Glass Wing access is allocated fairly. No public oversight of what the model finds or how those findings are prioritized. We are trusting Anthropic's judgment, values, and organizational integrity entirely.

History suggests that even well-intentioned organizations make self-serving decisions when they hold asymmetric power. This is not a criticism of Anthropic's character. It is an observation about institutional incentives.

The Race Dynamics

Anthropic's own framing acknowledges that other labs will develop similar capabilities. The system card notes that these security capabilities emerged from training the model to be good at code, not from explicit security training.¹⁷ This means any lab that achieves similar code capability will likely unlock similar security capabilities as a side effect.

This creates a concerning dynamic:

Closed labs (OpenAI, Google DeepMind) will likely reach this level and may or may not be as cautious about deployment
Open-weight models trained on data from these capable closed models could inherit some of these capabilities
State actors with sufficient compute may already be pursuing this independently

The window during which "only Anthropic has this" is almost certainly short. The question is whether that window is used productively (patching vulnerabilities, hardening critical infrastructure) or whether it mostly serves as a competitive moat.

A Balanced Outlook

Here is where I land after reading the system card, the coverage, and thinking about the incentives.

The capability is probably real, but likely somewhat overstated in presentation. Anthropic has every incentive to present findings in the most dramatic light possible. The core claim (large models that are excellent at code can find serious vulnerabilities) is sound. The specific anecdotes are selected for maximum impact.

The safety decision is probably genuine, but also strategically convenient. Not releasing the model is consistent with Anthropic's values and also happens to be excellent positioning. Both things can be true at once.

The security implications are serious regardless. Even if Mythos is only 70% as capable as claimed, the direction is clear. Models are getting better at finding vulnerabilities faster than the software industry can fix them. This is a real problem that predates this specific announcement.

The centralization concern is the most important long-term issue. The immediate security implications will be addressed through patching and the natural progression of defensive tools. The structural question of who controls the most powerful AI systems, and who gets to verify their claims about those systems, will only become more important.

We should update our priors, not panic. If you work in software, the practical takeaway is straightforward: keep your systems updated, take dependency management seriously, and assume that the bar for exploitation is dropping. This was already true before Mythos. It is more true now.

What to Watch For

In the coming weeks and months, the credibility of these claims will become clearer. Here is what I am watching:

CVE disclosures: If Mythos is finding real vulnerabilities, we should see a wave of CVEs attributed to AI-assisted discovery through Glass Wing participants. The volume and severity will tell us a lot.
Independent reproduction: When other labs release similarly capable models, can independent researchers confirm the security emergence claim?
Glass Wing transparency: Does the consortium publish reports? Are findings shared with the broader security community in a timely way?
Anthropic's next moves: Do they release a less capable Opus model with new safeguards (as hinted in the system card)? Or does Mythos quietly become available to paying enterprise customers?

The story of Claude Mythos is, in many ways, the story of where AI is headed. The capabilities are getting real. The questions about who controls them, who verifies the claims, and who benefits from the narrative are getting more important by the day.

Stay critical. Stay updated. And yes, go update your browser.

Sources & Further Links

S

Written by