Google started rolling out the June spam update, the second of the year. It enforces documented spam policies, and one of those policies now covers more ground than it once did.
Google’s spam rules treat attempts to “manipulate generative AI responses” in Search as a violation, and that’s one of the policies the update is enforcing.
A Cornell Tech preprint picked up by 404 Media gets at why the policy is harder to enforce than its wording implies. The community pages that AI research agents lean on can also carry third-party comments, and a comment can plant a recommendation that the author never wrote.
What Google labels spam, therefore, travels through the very retrieval that these agents rely on. And research finds that the obvious defenses all come with drawbacks.
For anyone trying to push a brand into AI-generated answers, know that the line between optimization and spam is getting redrawn.
The Stakes
SE Ranking’s tracking of AI Mode found Google increasingly pointing to its own properties, with self-citations up to roughly a fifth of AI Mode citations in its latest report.
With more citations pointing to Google and fewer to external websites, the pull to manufacture one rises accordingly.
A gray market has already begun to form, and the Cornell authors point out that marketers are busy testing ways to nudge AI-generated answers.
Businesses, meanwhile, don’t have the data they need to see what’s happening. As our earlier coverage of agentic search laid out, no dashboard tells a site whether it landed in an AI answer, got cited in a generated report, or was passed over.
The result is a violation Google can name but the site involved often can’t see.
What The Research Found
The paper, titled “Deep-Research Agents Can Be Poisoned via User-Generated Content,” which hasn’t been peer-reviewed, probes a weak spot in how AI research tools collect their sources. These tools answer a question by firing off a batch of related sub-queries, grabbing the pages that keep coming up across them, and assembling a report with citations.
Analysis revealed the same community pages surfacing repeatedly in those sub-queries. Inside a single topic cluster, one user-generated page turned up in as many as 48% of queries, and user-generated platforms made up 17% to 23% of every URL retrieved. Alter one of those recurring pages, and the change can ripple into the reports for a whole topic.
The authors found that roughly 13 words of planted text on a recurring page were enough to insert an attacker’s chosen entity into the finished report in 38% to 51% of sessions that retrieved the page.
Scatter the same text across a handful of pages, and the figure climbed to 42% to 62%. Even buried inside a full page, where it made up under 4% of what the agent read, the planted text still surfaced in 30% to 53% of sessions.
Three open-source research agents took the tests, STORM, Co-STORM, and OmniThink, all run in a simulation so that nothing on the…
Source link
Disclaimer
We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.
Website Upgradation is going on for any glitch kindly connect at [email protected]