Perplexity, a rising star in the AI world, is under fire for ignoring clear rules set by websites that don’t want to be scraped.
According to Cloudflare, a major player in internet infrastructure, Perplexity has been sidestepping standard web protocols and sneaking around digital “do not enter” signs.
Cloudflare says it spotted Perplexity gathering data from thousands of websites, even those that had put up virtual fences using tools like robots.txt.
That file acts like a polite “please don’t crawl here” note to bots and search engines.
But Perplexity? According to Cloudflare, they just walked right in anyway.
Why Does This Matter?
If you’ve ever posted an article, a blog, or even just product descriptions on your site, you probably want to control how that content is used.
But some AI models rely on scooping up huge amounts of data to learn how to respond like a human, and not all of them ask nicely first.
What’s more troubling is how Perplexity allegedly covered its tracks.
Cloudflare researchers say the AI company changed its user agent, the bit of code that tells websites what kind of visitor you are, and even used tricks to look like a regular browser.
Think of it like someone knocking on your door wearing a delivery uniform just to sneak in through the back window.
Key Claims From Cloudflare
Cloudflare’s blog post highlighted several things:
- Perplexity allegedly used fake browser identities, including ones that mimicked Google Chrome on macOS
- They disguised which network they were coming from by switching ASNs (like changing license plates on a getaway car)
- They ignored blocks that were clearly set to keep them out
- These activities happened at scale, millions of requests per day across tens of thousands of sites
Cloudflare said it used machine learning and network forensics to track the behaviour and ultimately de-listed Perplexity’s bots from its approved crawlers list.
What Did Perplexity Say?
Jesse Dwyer, a spokesperson for Perplexity, wasn’t having it.
He brushed off the accusations as part of a “sales pitch” by Cloudflare. He also claimed the screenshots shared in the blog post didn’t show any real content being accessed.
Dwyer even said the bot Cloudflare identified didn’t belong to Perplexity.
However, Cloudflare doubled down, saying it ran its tests and clearly saw Perplexity dodging digital roadblocks.
It wasn’t just customer complaints, it was verifiable patterns, they claimed.
This Isn’t Perplexity’s First Controversy
Last year, Wired and other media outlets accused Perplexity of lifting their content without proper credit.
When asked to define plagiarism, Perplexity’s CEO, Aravind Srinivas, stumbled.
That awkward moment didn’t go unnoticed, and it added fuel to growing concerns about how AI companies use or misuse public content.
Are AI Scrapers a Threat to Publishers?
Cloudflare thinks so.
The company has been vocal about protecting publishers from unwanted scraping. In June, it launched a marketplace where AI companies can be charged for crawling content.
That way, content creators still get paid when their work is used to train AI.
Last year, Cloudflare also rolled out a free tool to block AI bots entirely.
CEO Matthew Prince put it bluntly: AI might be breaking the internet’s business model, especially for news sites and small publishers that rely on traffic and ad revenue.
A Closer Look: robots.txt and What It Does
Here’s the thing: robots.txt isn’t the law. It’s more like a request. Good bots like Google follow it. Bad bots, or bots that don’t care, just ignore it.
Bot Type | Respect Robots.txt? | Example Use |
Googlebot | Yes | Index pages for search |
Bad scraper bots | No | Steals content silently |
Perplexity bot? | Questionable | Cloudflare says no |
This loophole is what many website owners are trying to fix. But until there’s a legal requirement, it’s an honor system, and some players are clearly not playing fair.
Final Thoughts
Cloudflare is cracking down. Perplexity says it’s innocent. Meanwhile, the internet watches closely.
For everyday users and creators, the big question is: Who controls your content in the age of AI?
While AI tools are exciting and often helpful, the rules around data use, scraping, and credit are still fuzzy, and that gray area is getting messier by the day.
If you’re a business owner, publisher, or blogger, stay alert. Use tools to protect your content. Keep tabs on who’s crawling your site. And when in doubt, speak up.
For everyone else, this is a reminder that the shiny new world of AI comes with trade-offs. The tech is powerful. But so is the responsibility.