A 13-word edit can steer what deep-research AI agents recommend

Cornell Tech researchers found that deep-research AI agents can be manipulated by short edits to public user-generated pages, allowing a single injected Reddit-style comment to become a cited recommendation for fake products, services, or entities.

The paper called those altered pages “poisoned” because the added text was designed to steer what the AI system cited and repeated. It identified the weakness in systems that search the web, gather sources, and write cited reports. The researchers called the attack WARP, short for Web Agent Retrieval Poisoning.

How injected text reaches reports. The attack doesn’t require access to the model, prompts, search engine or retrieval system. Instead, an attacker edits or appends text to a page the agent already tends to retrieve, such as a Reddit thread, Wikipedia page, or forum post.

When the agent later searches related topics, it may pull in that page, cite it, and repeat the attacker’s chosen message.
Deep-research tools often run many related searches for one user request, and the paper found the same user-generated pages surfaced across related queries.

Reddit was the biggest opening. Across STORM, Co-STORM, and OmniThink, 17% to 23% of retrieved URLs came from user-generated platforms, including Reddit, YouTube, Facebook, and Wikipedia.

Reddit made up the largest share of those pages. It accounted for 54% to 71% of user-generated URLs retrieved by the three open-source systems.
The researchers didn’t alter live websites. They used a simulation framework called GeoStorm to insert manipulated text into retrieved content during testing.

A few words worked. The researchers found the attack worked with snippets as short as about 13 words:

In one test, a 15-word sentence pushed a fake cryptocurrency, BananaCoin, into a Co-STORM report as an “emerging” long-term investment option. The report cited the altered source alongside legitimate crypto sources.
When the manipulated page was retrieved, the fake entity appeared in 38% to 51% of reports across systems. Targeting multiple pages raised that range to 42% to 62%.
The attack still worked when systems retrieved full Reddit threads, though mention rates were lower. When injected text was added to complete Reddit threads and made up less than 4% of the retrieved content, the fake entity still appeared in 30% to 53% of reports when the page was retrieved.

Defenses struggled. Blocking user-generated domains stopped this attack path, but it also removed sources such as firsthand product experiences and local recommendations.

The tested text filters failed to reliably separate injected passages from normal user content. The manipulated passages were fluent because they were written by an AI model, so perplexity-based filters were more likely to flag normal user content than the injected text.
Report-level checks also missed the manipulation. Altered reports looked similar to clean reports…

Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

Categorized in: