SEO Guides, Tips & More!
Learn from Our Experience
Cloudflare’s New Block on AI Web Crawlers
Cloudflare announced that a setting to block AI web crawlers would be enabled by default for its customers. This announcement comes almost two years after the setting was originally introduced to protect content creators and prioritize important traffic.
At first, the setting only worked for web crawlers respecting robots.txt, which many AI web crawlers are starting to ignore (see below). Some AI web crawlers also mask themselves as being from search engines. Because of these issues, Cloudflare later updated the feature to apply to any detected AI web crawler and allowed a total ban on them.
Alongside making blocking default, Cloudflare introduced a private beta pay per crawl program which will allow domain owners to charge AI web crawlers based on which AI the web crawler is from and what its purpose is. Through the largely unused HTTP code 402 Payment Required, Cloudflare becomes an intermediary—legally, a Merchant of Record—for a domain that wants to charge AI companies for web crawling.
The problem with AI web crawlers
Web crawlers from AI companies have seen a surge; one study found total AI web crawling reaching close to a third of Googlebot’s billions of requests. The presence of these AI web crawlers and their scraping of internet content has become somewhat controversial, due to a couple factors.
First, AI web crawlers are driving up hosting costs. DIY repair manual website iFixit.com racked up a $5,000 total bandwidth bill in one day, and Read the Docs had an even higher cost due to processing the download of terabytes of HTML data.
Second, AI web crawlers don’t bring in much human traffic. Cloudflare found that most AI web crawlers have very few referrals compared to total HTML page requests. Referrals indicate user requests to view a page. In other words, most of the requests to view a page from an AI web crawler are not specifically tied to a user.
But if the web crawlers aren’t coming on behalf of a user, then what are they doing? Cloudflare data indicates most of the AI web crawlers are being used for training. These web crawlers scrape data off the internet—that is, they request as many webpages as possible and grab the content to improve AI models. They can also end up requesting 404 and other dead-end pages, using up valuable bandwidth without gathering any information or directing a user to a website’s existing content.
AI companies are in hot water for scraping content without financial compensation or permission. In fact, according to Copyright Alliance, over 30 lawsuits had arrived to U.S. federal courts as a result of the practice.
Lastly, even when websites try to block AI web crawlers, they sometimes outright ignore explicit block instructions. In one example, Perplexity, an AI company, has been accused by Cloudflare of attempting to bypass network blocks and ignoring robots.txt.
Anthropic, another AI company, has guidelines outlining how ClaudeBot respects blocking measures and how the AI company’s data collection “should be transparent.” Perplexity explicitly mentions ignoring robots.txt for user requests. AI companies remain largely opaque about their practices and standards regarding web crawling.
Does your website suffer from AI web crawlers?
With AI web crawlers surging in number and AI content scraping still prevalent, it’s important to know if AI web crawling is directly impacting your business.
If your domain already uses Cloudflare, this new update is taking place automatically. It’s always possible to re-enable access for certain AI web crawlers, or even apply for the pay per crawl beta. Cloudflare isolates AI web crawlers through machine learning and other techniques, so other kinds won’t be affected by the block.
Unfortunately, the only way to block AI web crawlers manually is through the robots.txt file. AI companies write the explicit robots.txt user-agent lines needed to block their web crawlers. The robots.txt file can be viewed by appending /robots.txt to the domain name and edited in WordPress through common plugins such as Yoast SEO or Rank Math SEO.
Conclusion
So far AI companies have not publicly responded to Cloudflare’s update or announced changes to how they crawl the web and scrape content. More broadly, it’s difficult to tell how this will move forward.
The rapidly-growing AI industry could collaborate and build a standardized pricing mechanism and compensate all websites, or Cloudflare may end up being the only website hosting provider serving as an example for change. What is certain is that the current trend is unsustainable.
Get in touch
It’s crucial to prepare for changes as they come in the landscape of SEO, especially as AI becomes more prominent. Flashpoint Marketing can assist in getting the best results.