Cloudflare is updating its method of identifying and blocking AI crawlers, which may result in Googlebot being blocked on sites that prevent AI training. The company announced the update as part of its second Content Independence Day.
The new controls let websites manage automated traffic based on three behaviors rather than a single “block AI bots” switch. They are live now for all customers, including the free tier. A separate set of default changes takes effect September 15.
Three Ways To Sort AI Crawlers
Cloudflare now sorts crawlers by what they do on a site rather than whether they count as “AI.” The company splits the AI use cases into three categories:
- Search indexes a site to answer questions later, and Cloudflare ties this behavior to referral traffic.
- Agent, real-time bots acting for a person, such as ChatGPT-User or browser agents like Gemini or Claude operating Chrome.
- Training, crawling that pulls content to train or fine-tune a model.
Cloudflare says bot operators should run separate crawlers for each behavior so that websites can see why a bot is visiting and decide whether to allow or block it.
What Changes On September 15
Two default changes take effect on September 15. For new customers and new sites for existing customers, Training and Agent crawlers will be blocked by default on pages that display ads, while Search stays allowed. Cloudflare’s press release also says existing free customers who have not changed their settings by September 15 will be moved to these defaults.
The second change goes even further. Cloudflare will start treating multi-purpose crawlers based on their overall behavior, applying the strictest rule that applies. For example, a crawler that performs both Search and Training will be blocked if a site blocks Training. Cloudflare uses Googlebot, Applebot, and Bingbot as examples, since each crawls for both search and AI training. If a site has already enabled the older “Block AI bots” setting, it will be covered by this new rule.
If you want to keep those crawlers, you can review or change these settings in your Cloudflare dashboard any time before September 15. Cloudflare says it will continue to notify customers ahead of the date.
New Signals For How Bots Use Content
Cloudflare is also testing a content-use signal that extends Content Signals in robots.txt. It carries three values, from most to least restrictive: immediate, which stores nothing; reference, which indexes and links back and is the new default; and full, which summarizes and reproduces. Cloudflare says these state a preference and do not block on their own.
The company has revised the definition of “Verified” for bots. Now, a verified bot isn’t automatically permitted everywhere; instead, its access depends on its category. Additionally, bots that replicate content in its entirety are ineligible for verification. Cloudflare introduced a searchable directory, BotBase, for Enterprise Bot Management users, which…
Source link
Disclaimer
We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.
Website Upgradation is going on for any glitch kindly connect at [email protected]