Anthropic’s Claude Bots Make Robots.txt Decisions More Granular

Anthropic updated its crawler documentation this week with a formal breakdown of its three web crawlers and their individual purposes.

The page now lists ClaudeBot (training data collection), Claude-User (fetching pages when Claude users ask questions), and Claude-SearchBot (indexing content for search results) as separate bots, each with its own robots.txt user-agent string.

Each bot gets a “What happens when you disable it” explanation. For Claude-SearchBot, Anthropic wrote that blocking it “prevents our system from indexing your content for search optimization, which may reduce your site’s visibility and accuracy in user search results.”

For Claude-User, the language is similar. Blocking it “prevents our system from retrieving your content in response to a user query, which may reduce your site’s visibility for user-directed web search.”

The update formalizes a pattern that’s becoming more common among AI search products. OpenAI runs the same three-tier structure with GPTBot, OAI-SearchBot, and ChatGPT-User. Perplexity operates a two-tier version with PerplexityBot for indexing and Perplexity-User for retrieval.

Anthropic says all three of its bots honor robots.txt, including Claude-User. OpenAI and Perplexity draw a sharper line for user-initiated fetchers, warning that robots.txt rules may not apply to ChatGPT-User and generally don’t apply to Perplexity-User. For Anthropic and OpenAI, blocking the training bot does not block the search bot or the user-requested fetcher.

What Changed From The Old Page

The previous version of Anthropic’s crawler page referenced only ClaudeBot and used broader language about data collection for model development. Before ClaudeBot, Anthropic operated under the Claude-Web and Anthropic-AI user agents, both now deprecated.

The move from one listed crawler to three mirrors what OpenAI did in late 2024 when it separated GPTBot from OAI-SearchBot and ChatGPT-User. OpenAI updated that documentation again in December, adding a note that GPTBot and OAI-SearchBot share information to avoid duplicate crawling when both are allowed.

OpenAI also noted in that December update that ChatGPT-User, which handles user-initiated browsing, may not be governed by robots.txt in the same way as its automated crawlers. Anthropic’s documentation does not make a similar distinction for Claude-User.

Why This Matters

The blanket “block AI crawlers” strategy that many sites adopted in 2024 no longer works the way it did. Blocking ClaudeBot stops training data collection but does nothing about Claude-SearchBot or Claude-User. The same is true on OpenAI’s side.

A BuzzStream study we covered in January found that 79% of top news sites block at least one AI training bot. But 71% also block at least one retrieval or search bot, potentially removing themselves from AI-powered search citations in the process.

That matters more now than it did a year ago. Hostinger’s analysis of 66.7 billion bot requests…

Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

Categorized in: