Google researchers published a new paper detailing a new way to catch spammers who are using generative AI to flood Google’s platform with spam and overwhelm its quality filters. While the research is focused on identifying video content spam, the techniques described could give an idea of methods that Google could use for web content spam. In fact, the research paper discusses a text-based generative AI identification system.
The new system is said to be a “highly accurate defense” against coordinated generative AI spam, which means that something like this could conceivably be in use. The new system is called Scalable Cluster Termination System (S-CTS).
Can This System Be Used For AI-Generated Text Spam?
The system succeeds because it looks for the organizational structure of an attack, which is the mass reuse of a specific semantic narrative template instead of evaluating isolated videos one by one.
The research paper also describes the use of text embeddings, salient terms, and templated narratives as a part of their content classifier. If a high percentage of accounts in an infrastructure cluster are identified as using the same AI-generated text/media templates, the entire cluster is terminated.
Quickly Adapting To New Kinds Of AI Spam
The paper says that when attackers adopt new generative models, Google can adapt its synthetic spam detection system faster by using Low-Rank Adaptation (LoRA) and Automatic Prompt Optimization (APO) instead of retraining a massive AI model.
They write:
“The Stage 2 Classifier is specialized for synthetic trend detection using Parameter-Efficient Fine-Tuning (PEFT) techniques, specifically Low-Rank Adaptation (LoRA) and Automatic Prompt Optimization (APO).
…This approach allows for the efficient adaptation of the large proprietary LLM (e.g., Gemini 2.0 Flash) without the prohibitive computational cost of full fine-tuning. Specifically, LoRA significantly reduces the number of trainable parameters and substantially decreases the memory footprint, allowing for rapid, cost-effective execution and parallelized inference on scalable TPU infrastructure.
…APO allows us to engineer prompts that adapt to new “Slop” trends faster than retraining a dense model. We can retrain a LoRA adapter rapidly when a new GenAI model (like Sora or Kling) is released by attackers.”
Sentence-BERT (S-BERT) For Identifying AI-Generated Text
What will probably be of most interest is that the researchers acknowledge the use of Sentence-BERT (SBERT) as a way to identify semantically similar sentences.
They cite Sentence-BERT to validate a core assumption of their paper: that automated, AI-generated text leaves a distinct mathematical footprint (“text embeddings”) that can be detected.
They then pivot from S-BERT to highlight why their system (S-CTS) is an advancement: because it doesn’t stop at text embedding matching. It scales up to a multimodal, two-stage LLM architecture that evaluates these text patterns…
Source link
Disclaimer
We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.
Website Upgradation is going on for any glitch kindly connect at [email protected]