I launched CitationIQ.com recently. Over the last two weeks, my logs claimed 33 AI assistants visited, a little better than two a day. That number is a lie. The real number? Six.
Googlebot looked worse. Of 799 requests carrying its name, only 107 were real, though we all know scammers love to spoof Googlebot. And some of those fake AI visits, while wearing ChatGPT’s name, asked my server to hand over its secrets file.
I run this brand-new platform, and I have spent zero dollars promoting it thus far, so traffic remains modest. I went looking for a quiet, accurate read of who (robots and crawlers, since Google Analytics 4 handles the rest) was visiting, expecting small numbers, and I got them. What I did not expect was that most of even these modest numbers were lies. Here is what happened, how I checked, how I chased the stubborn cases to proof, and why the most useful thing you can do this week is run the same check on your own logs.
The Thing Nobody Checks
When a bot fetches your page, it announces a name. ChatGPT-User. Claude-User. Googlebot. CCBot, or whoever they say they are. Your server writes that name into the log, your analytics counts it, and you draw conclusions from it.
The name is self-reported, merely a string in the request header, and anyone can put anything they like there. Claiming to be Googlebot costs nothing and proves nothing. It is a stranger at your door in a delivery uniform, and the uniform is easy to fake.
The real check is not complicated. The major operators publish the actual IP addresses their bots use, as plain files you can open right now, and a request is legitimate only if the name matches and the address sits inside the published list. The name is the claim. The IP is the proof.
- ChatGPT-User https://openai.com/chatgpt-user.json
- Claude (all bots) https://claude.com/crawling/bots.json
- Perplexity-User https://www.perplexity.com/perplexity-user.json
- Googlebot https://developers.google.com/static/crawling/ipranges/common-crawlers.json
- CCBot https://index.commoncrawl.org/ccbot.json
I built my check with three outcomes, not two. Verified means the IP is in the published range. Spoofed means the ranges loaded, and the IP is not in them. Unverifiable means I could not determine it, because a list failed to load or a record was missing. I never call something fake just because I failed to confirm it, and later that restraint is exactly what kept one investigation honest long enough to reach the truth.
The check is about 15 lines of Python using only the standard library, because deciding whether an address sits inside a network range is a solved problem.
import ipaddress, json, urllib.request
# A vendor’s published list of the IPs its bot really uses.
url = “https://openai.com/chatgpt-user.json”
data = json.loads(urllib.request.urlopen(url).read())
# Pull every address range out of the file.
nets = []
def collect(node):
if isinstance(node, dict):
for v in...
Source link
Disclaimer
We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.
Website Upgradation is going on for any glitch kindly connect at [email protected]