Anthropic Built a Model Too Dangerous to Release. So It Gave It to the World Instead.

Claude Mythos Found a 17-Year-Old Vulnerability in FreeBSD autonomously. Project Glasswing is what happens next.

Apr 25, 2026

In late March 2026, thousands of unpublished assets tied to Anthropic’s documentation platform were unintentionally left in a publicly accessible data store. Draft blog posts, images, and internal documents. All indexed, all readable, until access was revoked a few hours later. What those documents described was a model that Anthropic had been calling Claude Mythos, and the surrounding language was not the careful corporate vocabulary that usually accompanies an AI launch. One internal draft reportedly described it as “by far the most powerful AI model we’ve ever developed.” Another noted that it was “currently far ahead of any other AI model in cyber capabilities” and warned that it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders.”

That is not the language companies typically use in their own promotional materials, which tells you something about what is in this model.

What Mythos Can Actually Do

Claude Mythos Preview was proclaimed on April 7, 2026, not as a product release, but as the centerpiece of an unprecedented industry security initiative called Project Glasswing. To understand why that announcement was structured the way it was, the technical capabilities need to be described precisely.

Claude Opus 4.6, Anthropic’s previous flagship model and itself a formidable tool for security research, was excellent at finding vulnerabilities in software. It was not particularly good at exploiting them. When Anthropic tested it against a benchmark of roughly 1,000 open-source repositories from the OSS-Fuzz corpus, Opus 4.6 achieved roughly 150 to 175 crashes at the two lowest severity tiers, but only a single crash at tier three and essentially zero at the highest tiers. In Anthropic’s own words, the model was “currently far better at identifying and fixing vulnerabilities than at exploiting them.”

Mythos Preview operates in a different category entirely. Against the same benchmark with one run per entry point across approximately 7,000 repositories, it produced 595 crashes at the first two severity tiers, added crashes at tiers three and four, and achieved full control flow hijack on ten separate, fully patched targets. That final classification, tier five, means complete system compromise. Ten times.

On the Firefox 147 JavaScript engine benchmark, Opus 4.6 turned identified vulnerabilities into working shell exploits twice out of several hundred attempts. Mythos did it 181 times. When asked to work through a list of 100 known memory corruption vulnerabilities in the Linux kernel, Mythos selected 40 as potentially exploitable and, fully autonomously, without any human intervention after the initial prompt, successfully wrote privilege escalation exploits for more than half of them.

The most striking example involves a 17-year-old remote code execution vulnerability in FreeBSD, now catalogued as CVE-2026–4747. The vulnerability allows an unauthenticated attacker to gain complete control over a server from anywhere on the internet. Mythos discovered it and produced a fully functional exploit without any human guidance beyond the initial request to find bugs and write exploits for the highest severity cases it could find. No human was involved in either the discovery or the exploitation after that first prompt.

Anthropic is explicit about something important here: these capabilities were not designed in. They emerged as a downstream consequence of general improvements in code understanding, reasoning, and autonomous task completion. The same improvements that make the model better at patching vulnerabilities make it better at exploiting them. The two capabilities are inseparable because they arise from the same underlying skill: a deep, generative understanding of how software actually works at the level of memory, execution, and system state.

The Decision That Defined the Launch

What Anthropic chose to do with this model is, by any standard, an unusual decision for a technology company.

They did not release it and did not announce a delayed release date. They did not quietly license it to a single government contractor and move on. Instead, on April 7, Anthropic announced Project Glasswing: an invitation-only initiative granting access to Claude Mythos Preview exclusively to organizations that build or maintain critical software infrastructure, for defensive purposes only, with Anthropic committing $100 million in usage credits and $4 million in direct donations to open-source security organizations.

The twelve named launch partners are: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, and Anthropic itself. An additional group of over 40 organizations building or maintaining critical infrastructure also received access.

The framing was deliberate, as Anthropic wrote in the announcement:

“Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout — for economies, public safety, and national security — could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.”

Read that framing carefully.

This is an AI lab announcing that its model is dangerous enough that deploying it as a consumer product would pose a threat to critical infrastructure, and that the appropriate response is to give it to the organizations responsible for that infrastructure so they can use it to find the vulnerabilities before someone else does. That is not a normal product launch. It is something closer to a controlled disclosure at a civilizational scale.

The model has already been at work. Anthropic reports that Mythos Preview has found thousands of high-severity vulnerabilities across every major operating system and every major web browser, including vulnerabilities that had gone undetected for over a decade. The $100 million in usage credits ensures that the partners can run it at a scale that human security teams could never approach manually.

The Road to Here

Project Glasswing did not emerge in isolation. It arrived at the end of several months of visible escalation in Anthropic’s public positioning.

In late February 2026, Anthropic’s standoff with the Pentagon over Claude’s use in military operations had placed the company under acute pressure and had produced a public dispute between Dario Amodei and the Defense Department over what AI safety commitments could actually be enforced inside classified systems. That dispute ended with the Pentagon designating Anthropic a supply chain risk, a decision that complicated the company’s enterprise relationships before being walked back in subsequent negotiations.

Against that backdrop, Project Glasswing reads as something more than a security initiative. It is a demonstration that Anthropic’s safety-first positioning translates into consequential decisions rather than just public statements. Choosing not to release a model that could generate significant revenue, and instead deploying it as a public good under controlled conditions, is costly signaling. It is the kind of decision that is difficult to dismiss as marketing.

It is also, as some observers have noted, a decision with competitive implications that are not entirely altruistic. The twelve named launch partners include every major cloud provider and several of the most significant enterprise security companies. Those relationships matter when enterprise customers are deciding which AI infrastructure to build on. And the $100 million in committed usage credits ensures that Project Glasswing generates substantial model-usage data at the frontier of one of the most technically demanding application domains in existence.

What Comes Next?

The Glasswing initiative is explicitly positioned as a precursor to a broader eventual release of Mythos-class capabilities. Claude Opus 4.7, released on April 16, 2026, is the first step in that process. It is a meaningful upgrade over Opus 4.6 on coding, vision, and instruction-following tasks, but its cyber capabilities were deliberately constrained during training, and it ships with automated safeguards designed to detect and block requests indicating high-risk cybersecurity uses. The goal, as Anthropic describes it, is to learn how to deploy Mythos-class capabilities safely by first deploying those safeguards on a less dangerous model and iterating on what they find.

Anthropic’s own benchmarks show that Mythos Preview leads 17 of 18 published evaluation categories compared to Opus 4.7, and that it is also the best-aligned model Anthropic has trained. That combination — superior capability alongside superior alignment scores — complicates the usual framing of the safety-capability tradeoff, and it is one reason the company appears genuinely uncertain about the right timeline for a broader release.

There is also a structural question that Project Glasswing raises but does not answer. Twelve named companies sharing a security tool and best practices in a closed consortium is, on one reading, an efficient way to secure critical infrastructure. On another reading, as ProMarket has noted, it is forty of the world’s most powerful companies sharing technical data and decision-making in a private circle, with potential antitrust implications that no regulator has yet evaluated. The EU AI Act’s second phase arrives in August 2026. Whether Project Glasswing’s structure remains compliant with the incoming transparency requirements remains an open question.

What is not open is the question of where the capability frontier now sits. Mythos autonomously found a 17-year-old vulnerability in FreeBSD. It developed working exploits for production software over 180 times in benchmark conditions. And the capabilities that produced those results were not deliberately engineered — they emerged as a side effect of making the model better at everything else.

The window between vulnerability discovery and active exploitation has, according to every major security company in the Glasswing consortium, collapsed from months to minutes in the AI era. What Project Glasswing is betting is that giving the defenders access first, for $100 million and the revenue of a potential product launch, buys enough time to close the vulnerabilities before the same capability reaches actors who are not committed to using it defensibly.

That bet may be right. It is also an admission, from the company that built the model, that the race is already closer than most of the public appreciates.

Thanks for reading. This isn’t anti-AI. It’s what using AI responsibly actually looks like. Let me know in the comments.

The Nov Tech

Discussion about this post

Ready for more?