Is Anthropic’s New Claude Model a Cybersecurity Disaster?

Source Domain: nymag.com

Author:

Using an unordered list, summarize the following article with between 4 and 8 key points.

Photo-Illustration: Intelligencer; Photo: Getty Images

In recent months, AI models have become much better at writing, debugging, and testing code. This has scrambled the AI race, making the software-focused Anthropic its leader and sending everyone else in pursuit of a new, more focused goal: improving coding tools and racking up as many excited and frightened enterprise customers as possible.

As these tools have become more capable at writing software, though, they’ve also gotten a lot better at figuring out how to break it. AI coding is rapidly automating parts of software development, but — at least and perhaps even more quickly — it’s automating hacking, too. Malicious actors have been using AI to help speed up and extend their capabilities for well over a year now with limited but real success; at the same time, software companies and open-source projects have been using AI tools to harden their software and find new vulnerabilities. The models’ recent ramp-up in coding capabilities has clarified the situation. If a software developer can now oversee a fleet of coding agents to knock out more features in less time, so too can someone whose objective is finding exploitable flaws in other people’s software in order to exfiltrate data, shut down systems, or hold them hostage for ransom.

This all gets particularly dicey when the technology takes a massive, sudden leap foward, which happens to describe how AI frontier model releases — which come from a tiny group of massive firms — can work. Anthropic says its next model is one such case:

Earlier today we announced Claude Mythos Preview, a new general-purpose language model. This model performs strongly across the board, but it is strikingly capable at computer security tasks. In response, we have launched Project Glasswing, an effort to use Mythos Preview to help secure the world’s most critical software, and to prepare the industry for the practices we all will need to adopt to keep ahead of cyberattackers.

By “strikingly capable at computer security tasks,” Anthropic’s testers mean, among other things, that the new model makes it much, much easier to find exploits in existing software. Testers using this version of Claude were “capable of identifying and then exploiting zero-day vulnerabilities in every major operating system and every major web browser,” while non-experts were able to use it to “find and exploit sophisticated vulnerabilities.” The programmer Simon Willison took a look at some of the examples and reports from Glasswing partners and found Anthropic’s claims compelling. “There’s enough smoke here that I believe there’s a fire,” he wrote. “It’s not surprising to find vulnerabilities in decades-old software, especially given that they’re mostly written in C, but what’s new is that coding agents run by the latest frontier LLMs are proving tirelessly capable at digging up these issues.”

This creates a strange situation in the short term, where one company, in the process of creating a general-purpose AI model with a particular focus on coding, has effectively gained the exclusive ability to exploit countless thousands of pieces of software deployed by competitors, governments, and institutions around the world. (The same company, it should be noted, that the U.S. government recently punished and attempted to designate as a supply-chain risk for raising a different set of safety concerns.) Anthropic’s absurd level of latent power will persist until competitors catch up, and/or until its tools — or tools like it — can systematically protect secure software against empowered hackers. Anthropic hasn’t released this model to the public yet, but it won’t be long, and alternative tools adapted for malicious use will follow closely behind.

So for now, the most obvious way to prepare for security risks created by new models is to use those same to test and patch your own software, ahead of time if possible. Anthropic’s decision to donate “$100M in usage credits” to “defenders” suggests that they’re aware of how strange their position is and how uneasy it might make everyone else. Given that this next model is also likely to be far more capable at straightforwardly productive coding tasks, Anthropic’s business is set to have a very, very, good year, as it continues to raise alarms about … well, its own success.

anthropic: “we have finished training the ultimate god model exposing zero-day vulnerabilities in all software including linux and ffmpeg and we also made ten billion dollars last month”openai: “we have acquired TBPN”— “paula” (@paularambles) April 8, 2026

Anthropic is doing everything it can here to communicate that it didn’t set out to semi-blackmail the entire world — sorry, it just happened! And, in contrast with the massive freak-out about Mythos in the AI and cybersecurity worlds, the company comes across in its release as surprisingly sanguine about long-term implications of automated high-level hacking. While outlining a staggering range of vulnerabilities that the model was able to help uncover in often surprising ways, Anthropic’s own security testers write that most “security tooling has historically benefitted defenders more than attackers,” and that they “believe the same will hold true here too—eventually,” admitting that the interim period “may be tumultuous regardless.”

This is an interesting posture coming from a safety-focused lab that spends a lot of time talking about severe AI risks. Now that one of them is materializing, it’s signaling both a need for urgency and a general sense that things could, through cooperation and planning and market mechanisms, eventually work out, and that widely distributed new capabilities will result in a new “equilibrium” — a word you don’t hear so often from labs that tend to talk in accelerative terms about runaway capabilities. Hackers have new tools, but so do the people building software, who will be able to use them to root out problems and identify lingering security problems, ultimately resulting in a situation not unlike where we are now (which, admittedly, most cybersecurity professionals will tell you has been dire, but not quite apocalyptic, for as long as they can remember).

This is at least spiritually similar to a recent rhetorical pivot among some tech elites away from projections of imminent and rapid labor replacement and toward a more familiar story of “just capitalism, but with greater productivity,” which doesn’t so much dispute those claims as dismiss them as short-term distractions:

The “AI job loss” narratives are all fake. AI = massive ramp in productivity = massive ramp in demand = massive jobs boom. Watch. https://t.co/TXR2KtaICm— Marc Andreessen 🇺🇸 (@pmarca) April 5, 2026

How the next year in cybersecurity unfolds could provide some valuable data about the bigger picture here. On one hand, earlier AI capabilities — generating plausible speech for a variety of contexts, realistic images, videos, etc. — have quickly disrupted, broken, or destabilized systems and markets where they’ve been deployed, sending a shock through education, for example, and oversupplying social networks with slop and other inauthentic material. They’ve mattered. On the other, they’ve made it a bit easier to think, at least, about how those systems might need to change, or their participants might need to adapt, in order to function going forward. One important question is how “tumultuous” each system’s interim period will be, and whether, in the process of achieving a new equilibrium or status quo, it will end up leaving some participants behind. Another is how long it will last, or if it will ever actually end.

As far as systems go — and I use that word loosely here — the cybersecurity status quo seems unusually vulnerable to near-term “tumult,” as countless poorly maintained software deployments quickly become accessible targets to a growing pool of less-skilled hackers. Maybe, as Anthropic suggests, thousands of different actors will snap into action, coordinate, and collectively level-up software security to new heights with new tools and everyone will get on with their lives. That would be a nice story to tell about AI. Or maybe a group of hackers with a jailbroken model will soon be able to hold entire countries hostage from a laptop. Guess we’ll find out!

Vox Media, LLC Terms and Privacy Notice
By submitting your email, you agree to our Terms and Privacy Notice and to receive email correspondence from us.