Infra Play #148: Play stupid games, win stupid prizes
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Anthropic models will not be affected.
We received the directive from the government today at 5:21pm (ET). The letter did not provide specific details of its national security concern. Our understanding is that the government believes it has become aware of a method of bypassing, or “jailbreaking” Fable 5. We reviewed a demonstration of this specific technique being used to identify a small number of previously known, minor vulnerabilities. These vulnerabilities all appear relatively simple, and we have found that other publicly-available models are able to discover them as well without requiring a bypass.
It’s been barely a few days since the nerfed version of Mythos was launched under the new product name Fable. The early reactions have been positive (including my own), and we can only imagine that ARR has skyrocketed, as users started pushing their workloads at 2x the price of Opus tokens.
Most of Friday was spent on the SpaceX IPO, which attracted massive amounts of capital relative to its current revenue. For many, the more interesting moment would be seeing Anthropic go public. Alas, at precisely 5:21pm ET, we hit a fork in the road.
Anthropic’s posture with respect to Fable’s safeguards, as laid out in our launch blog post, is the following:
We have instituted strong safeguards that greatly reduce the likelihood that Fable is misused for tasks related to cybersecurity (among others). In fact, our safeguards are so strong that many users have complained that they are overly broad.
In the weeks leading up to the launch of Fable, Anthropic worked with the US government, the UK AISI, multiple private third-party organizations and internal teams to red-team Fable’s safeguards for thousands of hours in total.
These tests showed that Fable’s safeguards are substantially more effective than those of any previously deployed model.
No testers have yet been able to find a universal jailbreak—a jailbreak method that can very broadly bypass the model’s safeguards, unblocking a wide range of cyber capabilities.
We suspect that perfect jailbreak resistance is not currently possible for any model provider. Every safeguard used in the industry is vulnerable to non-universal jailbreaks (which can elicit some cyber information in specific circumstances), and it is likely that universal jailbreaks will eventually be found in the future. We stated this clearly when we released Fable 5.
Given that perfect jailbreak resistance does not appear to be possible today, Anthropic adopted a defense in depth strategy with Fable 5. We aimed to make jailbreaks either narrow (in the case of non-universal jailbreaks) or very expensive to produce (in the case of universal jailbreaks), and to combine this with thorough monitoring to quickly detect and shut down any successful attacks. This is also why Anthropic has required 30-day retention of customer data with Fable—a policy change that carries real costs for us with customers, but that allows us to research and mitigate jailbreaks.
We stand by this defense in depth strategy. It reduces the risks posed by Fable, making them comparable to the risks of existing models already deployed across the industry.
We have not even received a disclosure of a concerning non-universal potential jailbreak that led to a harmful result. The potential jailbreaks that have been disclosed to us are either entirely benign responses or are minor findings that provide no Mythos-specific uplift.
To date, the government has only given us verbal evidence of a potential narrow, non-universal jailbreak, which essentially consists of asking the model to read a specific codebase and fix any software flaws. Our understanding is that one potential jailbreak was shared with the government. We have reviewed a report that we believe is the basis of the government’s directive and validated that the level of capability displayed there is widely available from other models (including OpenAI’s GPT-5.5), and is used every day by the defenders who keep systems safe. We will share more details over the next 24 hours.
According to Anthropic, the administration claimed that a jailbreak was being utilized in the wild, but that this did not constitute a serious enough reason to stop offering Fable to the public.
This is a bit ironic, since barely a few days earlier, Anthropic published a deep dive on recursive self-improvement of AI that ended with this call to action:
If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing. But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe. Without a global coordination mechanism, companies and governments will have to make difficult decisions about safety while under competitive and geopolitical pressures.
We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret. If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.
A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
None of this is necessarily impossible in principle—the world has built verification regimes for other complex technologies (e.g., the Intermediate-Range Nuclear Forces Treaty)—but those regimes took decades to build both the infrastructure and the trust. We don’t have that long. A unilateral pause by one lab, by contrast, is achievable immediately, but accomplishes much less: it would change who the front-runner is, but it would not create the wider deliberative process that is currently missing.
In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
So who was the main player behind the disclosure of this potential security risk? The Information reports the following:
Amazon CEO Andy Jassy was among the tech leaders who raised concerns to senior Trump administration officials this week about security risks in Anthropic’s most advanced models, according to two people familiar with the conversations.
The calls between the head of Amazon—one of Anthropic’s biggest investors and vendors—and the officials in the last few days helped set in motion the Trump administration’s new export restrictions on Anthropic’s Claude Mythos 5 and Fable 5 models late Friday night, citing national security concerns, the people said. Those restrictions suspend access to those models to foreign nationals. Anthropic said it disabled access to the models for all customers to comply.
An Amazon spokesperson told The Information: “As a leading cloud provider that serves a large number of private and public sector customers, it’s not uncommon for governments to seek our counsel on potential security risks. When they occur, we don’t share the details of these discussions.”
The restrictions are the latest hit to Anthropic in its ongoing standoff with the Trump administration. The relationship between the Claude maker and the government has devolved in recent months following the Pentagon’s move in March to designate the company’s model as a supply chain risk. That move came after the two disagreed on whether Anthropic’s models could be used in cases like mass domestic surveillance or with lethal autonomous weapons.
Amazon's security team reported a serious concern related to jailbreaking the safeguards as implemented on Fable. Given the significant financial interest that Amazon has in Anthropic, logically speaking they wouldn't have pushed in such a direction without at least something material behind it. The other major issue raised was that China-backed groups had direct access to Mythos (i.e. the non-safeguarded model). The government's position was mostly outlined by David Sacks:
I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true:
— As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable.
— Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.)
— A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
— In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.”
— In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety.
— In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community.
— The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority.
— Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.
It’s been barely 3 months since Anthropic’s last spat with the government, which ended with the company being kicked out of all Federal contracts. Rather than trying to recover and move on, farming its considerable fortune of rapid Enterprise adoption, the leadership team has yet again decided to pit rhetoric and stubbornness against the juridical and enforcement power of the government. This is what Politico reported on the actual calls that took place:
The Trump administration’s decision to impose sweeping export controls on Anthropic followed a frantic 24-hour effort by senior officials to convince the company to voluntarily pull a newly released artificial intelligence model that officials believed posed security risks, according to two administration officials and a senior White House official, who like others in this story were granted anonymity because they were not authorized to discuss the episode.
The move, which followed multiple tense calls between Anthropic CEO Dario Amodei and administration officials, including Treasury Secretary Scott Bessent and White House Cyber Director Sean Cairncross, underscores how the White House is wrestling in real-time with regulating fast-moving and potentially dangerous AI models.
The administration’s imposition of export controls forced Anthropic to pull its new AI model, Fable, just days after it was released to the public. Anthropic had given assurances that it was safe but soon after its release, top administration officials developed fresh doubts that the AI’s guardrails were as secure as the company had suggested.
On Thursday, two days after the model’s public release, Amazon CEO Andy Jassy raised concerns to the White House about the ability to bypass the model’s guardrails, according to the two administration officials and the senior White House official.
(Amazon, which is an investor in Anthropic, was responding to an administration request for feedback, said a person familiar with Amazon’s discussions.)
By Friday morning, the issue had reached the highest levels of the White House.
Bessent, Cairncross, chief of staff Susie Wiles and other senior officials met to discuss the model and the administration’s response, according to the administration official and the senior White House official. Bessent joined remotely while traveling to Houston for a previously scheduled public event, one of them said.
Following the meeting, the administration attempted to reach Amodei but was told he was unavailable because he was attending a wellness retreat, one of the administration officials and the senior White House official said.
A spokesperson for Anthropic rejected the claim that he was at a wellness retreat, saying, “this is absolutely false.”
A person close to Anthropic said Amodei was first requested around noon and was on the phone with senior officials within an hour and 15 minutes. While he was out of pocket, Anthropic offered other senior leaders in his place, the person said.
When the administration finally reached Amodei, he participated in three calls with a combination of roughly half a dozen senior administration officials, including Cairncross, Bessent and Commerce Secretary Howard Lutnick, according to the senior White House official and one of the administration officials.
Other senior White House staff and administration officials including Under Secretary of Commerce for Industry and Security Jeffrey Kessler, White House staff secretary Will Scharf, White House deputy chief of staff Richard Walters, and assistant to the president for policy Walker Barrett also participated in some of the calls, according to the senior White House official.
During the calls, Amodei tried to clear up what he assumed was a misunderstanding. He pushed back on the administration’s concerns, defended the guardrails and argued that the type of bypass that occurred, which he believed to be specific, did not pose the same risk as a broader “jailbreak” that would allow it to be used without any of the guardrails put in place by Anthropic.
In a blog post after the export controls were put in place, Anthropic said that “no testers have yet been able to find a universal jailbreak — a jailbreak method that can very broadly bypass the model’s safeguards, unblocking a wide range of cyber capabilities,” and that total avoidance of any jailbreaks isn’t currently possible for them or any other companies. They defended their systems, which they said “are so strong that many users have complained that they are overly broad.”
Cairncross and Bessent were unmoved by Amodei’s arguments. A White House official said Amazon’s findings were run past the National Security Agency and they felt they had “proof.”
They urged Anthropic to voluntarily remove the model and coordinate with the government to address the vulnerabilities, according to the senior White House official and the two administration officials. Amodei asked for more time and information, but he made no commitments to pull the model, and at one point Bessent told Amodei directly that he was making a “bad decision,” according to the senior White House official.
Shortly after the call, the Trump administration imposed its export control on the Fable 5 and Mythos 5 models, citing national security authority and banning its use by foreign nationals, according to Anthropic. The company said the “net effect” of the order was to “abruptly disable” the models for all customers “to ensure compliance.”
“Export controls were a last resort after begging them for hours to work with us,” the senior White House official said. “This was not something we wanted to do, but our hands were tied.”
After publication, one of the people close to Anthropic disputed that the company was given a choice to voluntarily work with the administration.
“The White House gave 90 minutes to take the models down, with no details on the actual threat,” the person said. “There was never any begging — or asking — for them to work with us, just a declared 90 minute deadline.”
White House officials — who had heard Amodei liken the dangers of Anthropic’s technology to a nuclear bomb — were baffled when the CEO said he was unwilling to take the system down to address a known security vulnerability, the senior White House official said. Anthropic has defined itself among the industry as a vocal advocate for AI regulation to counter massive global security risks and job disruption as AI quickly advances.
Three people familiar with the government’s thinking said Amazon wasn’t the only company to raise concerns.
“The crux of the issue was the lack of seriousness that Anthropic was applying to it,” said one of the three people. “Had Anthropic taken it seriously and, rather than dismissing it as isolated, moved to fix or pause access, this would have never happened.”
Let’s be very blunt here. The “safeguards” around Mythos being used as Fable are purely driven by the agentic harness and the automated flags on the server side. What that means is that the system prompt will try to steer the responses in a different direction, and if that fails, then machine learning would ideally flag suspicious usage and then terminate the session. They clearly have struggled to get the false positives to an acceptably low level and then refused to expand the scope of the automated checks due to the poor user experience. The actual “jailbreak” appears to be defense-oriented prompting, i.e. asking the model to perform security audits that can then end up being used for adversarial behavior. This is of course the most obvious outcome possible for anybody who has dabbled in investigating and reporting potential exploits, as doing so can often result not in claiming a bug bounty, but in aggressive legal action from the company it was reported to. Depending on the trust level and perception of the participants in the process, “white hat” actions can end up being labeled as cybercriminal activity.
This is not “patchable” in the traditional sense of software development. It is possible to limit the risk, but that comes with a number of trade-offs. If we come back to the original post by Anthropic:
We suspect that perfect jailbreak resistance is not currently possible for any model provider. Every safeguard used in the industry is vulnerable to non-universal jailbreaks (which can elicit some cyber information in specific circumstances), and it is likely that universal jailbreaks will eventually be found in the future. We stated this clearly when we released Fable 5.
Given that perfect jailbreak resistance does not appear to be possible today, Anthropic adopted a defense in depth strategy with Fable 5. We aimed to make jailbreaks either narrow (in the case of non-universal jailbreaks) or very expensive to produce (in the case of universal jailbreaks), and to combine this with thorough monitoring to quickly detect and shut down any successful attacks. This is also why Anthropic has required 30-day retention of customer data with Fable—a policy change that carries real costs for us with customers, but that allows us to research and mitigate jailbreaks.
I doubt that, realistically speaking, models can be fully controlled in a certain direction. They can be trained differently (so the capability never gets baked into the weights) or they can be steered another way, but there is no real way to control the outputs due to the inherent design of LLMs (i.e. random token prediction based on statistical probabilities). The alternative is trying to control the output at the harness/server side level, but that comes with performance and quality trade-offs (similar to what the financial sector had to deal with post 9/11 and the significantly increased safeguards around financial crime).
This is not a reality that can’t be communicated to the administration. If anything, there have never been more technically competent individuals in critical roles that would influence decision making. Whether enough of them have deep enough cybersecurity background is a different topic, as covered here:
I think the path to this outcome, which I regard as a disastrous one, runs through the present makeup of the AI policy conversation. I have moonlighted in AI security policy meetings since ChatGPT shipped in 2022, and have observed an ecosystem of character types.
There is the sharp Georgetown graduate who works for a national security think tank in Washington, with a background in international policy and game theory, who has never done hands-on cybersecurity and reasons about it in the abstract; these people are genuinely brilliant and a pleasure to talk to.
There is the frontier-lab AI safety constituency, focused on the coarse-grained claim that scaling laws will deliver catastrophic cyber capabilities, and likewise largely untouched by the actual practice of defending or attacking systems.
There is the AI luminary, whose deep fluency in machine learning buys a kind of currency that licenses them to opine on cybersecurity despite having thought very little about it and holding only a toy model of how the domain works.
And there are a few (too few) people I admire deeply who carry real cybersecurity backgrounds into these rooms, and who, predictably, tend to treat most of what I have said above as obvious.
The problem is not that anyone should be evicted from these conversations, which sit upstream of decisions like the Mythos ban. The problem is one of distribution, and the fix is to inject far more people with deep, practical cybersecurity backgrounds into the rooms where the framings and metaphors get set.
It’s important to understand that “my model can find vulnerabilities” has little to do with effective outcomes in the field. I’ll avoid going into a big technical discussion here, but an easier way to understand this:
Using models and people’s time costs money. Any successful breach needs to cover the cost of inference and the time to execute. Most exploitable vulnerabilities do not justify the investment required to identify and leverage them.
A vulnerability in one application simply means that the specific app can be exploited. This does not guarantee lateral movement across a network, which is almost always required to gain access to high-value systems.
Application vulnerabilities have traditionally been tied to 15-25% of significant breaches in recent years. I covered this in my deep dive on Palo Alto Networks.
So when Anthropic goes on a Glasswing World Tour of Cyberfearmongering, touting Mythos as a literal cyberweapon, well, you play stupid games, you win stupid prizes.
At the end of the day, this situation has mostly arisen from the poor communication and negotiation strategy of the Anthropic leadership team. The reality is that you can't disrupt the workforce while simultaneously refusing to cooperate with the US Department of Defense, and then go and fear-monger about it as loudly as possible.
This is also not something that their lawyers will be able to solve for them. It’s a problem that solely Dario and/or Daniela (who appears to be controlling the majority of key decisions) should be addressing, working with the federal government to reach a reasonable solution. That "fix" doesn't need to be fully technical. It's also a question of changing the working dynamic between the two parties.
Sooner or later in life, you’ll have to get deals done with people you disagree with in order to progress. That’s been a foundational cornerstone of what we perceive as the concept of politics. Does Dario or Daniela have the mettle to get a deal done, or will they push the company into the inevitable outcome (destruction or nationalization) for the sake of their egos? Blaming everything on the administration due to political differences is a cop-out here, and a refusal to take responsibility.
If Anthropic is to ever go public (let alone this year), something at the leadership level needs to change.


