Cybersecurity Experts Are Unhappy With Anthropic’s New AI

Table Of Content

Heading 2 Example

Written by:

Lolade

Share 0

Post 0

Share 0

This week, Anthropic dropped its newest model, Fable, a version of Mythos for the public.

Cybersecurity professionals across the internet quickly discovered that Fable’s guardrails are unnecessarily strict.

Tasks that have nothing to do with hacking? Blocked. Asking the model to read a blog post? Blocked. Writing secure code? Also blocked.

Fable

Fable is Anthropic’s public-facing version of Mythos, a powerful AI model the company built with cybersecurity in mind.

Anthropic originally launched Mythos back in April under a program called Project Glasswing.

The release was tightly controlled; only a small group of trusted companies and organizations got access.

Last week, Anthropic expanded Mythos access to hundreds of organizations across 15 countries. But Fable is the version that the general public can now get their hands on.

In practice, though, the tool keeps flagging its own users.

Guardrails

When Fable hits one of its restrictions, it pauses the conversation. It then tells the user that its “safety measures flagged this message for cybersecurity or biology topics.”

Valentina “Chompie” Palmiotti, a well-known security researcher at IBM X-Force, put it plainly on X.

She said Fable “rejects any request that could be tangentially cyber-related. Even innocuous tasks like reading a blog post.”

Researchers and professionals across Reddit and X have posted similar experiences. One researcher said that even asking for a code review sets off the guardrails.

Others noted that common security vocabulary alone seems to be enough to trigger a block.

Matt Suiche, a cybersecurity veteran and member of the technical staff at Tolmo, an AI cybersecurity startup, said that the problem feels keyword-driven.

“If you ask it to write secure code, it assumes it is cybersecurity-related work instead of software engineering best practices, and you get downgraded.”

When Fable hits a guardrail, it falls back to Claude Opus 4.8. So users don’t get a hard stop, they just get a less capable model without always knowing why.

“It seems to be keyword-based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails,” Suiche added.

AI Safety

Anthropic has long been concerned about AI being used to build malware or exploit software vulnerabilities.

The biology restrictions follow similar logic; the company has written publicly about the risks of AI being used to help develop biological weapons.

But there’s a big gap between “preventing AI-assisted cyberattacks” and “blocking a security researcher from reading a blog post.”

Right now, Fable is landing somewhere awkward. It’s restrictive enough to frustrate legitimate professionals, but based on rules blunt enough to catch all the wrong things.

Expert Opinion

Suiche offered a take. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time,” he said.

He also acknowledged that the field is still young. “I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies.”

Workaround

Anthropic does have a path for cybersecurity professionals who want fewer restrictions.

It’s called the Cyber Verification Program. Applicants who get approved face fewer limitations when using Claude for security-related work.

OpenAI runs a similar program, Trusted Access for Cyber, for its own models. The catch, of course, is that verification takes time.

And not everyone working in security will know to apply, or want to jump through hoops just to do their job.

Tags:

AI technology, cybersecurity, Fable

FREE NEWSLETTER

Stop Reading About AI.

Start Using It.

Join 18,000+ people learning how to plug AI into their daily work
and building automations that get real results.

Cybersecurity Experts Are Unhappy With Anthropic’s New AI

Cybersecurity Experts Are Unhappy With Anthropic’s New AI

Fable

Guardrails

AI Safety

Expert Opinion

Workaround

Anthropic to Pay $1.5 Billion in Settlement to Book Authors

Perplexity AI WhatsApp Assistant Features

AI Company Hugging Face Gets Hacked by an AI Agent

Stop Reading About AI.

Start Using It.