Google is working toward a future where it understands what you want before you ever type a search.
Now Google is pushing that thinking onto the device itself, using small AI models that perform nearly as well as much larger ones.
What’s happening. In a research paper presented at EMNLP 2025, Google researchers show that a simple shift makes this possible: break “intent understanding” into smaller steps. When they do, small multimodal LLMs (MLLMs) become powerful enough to match systems like Gemini 1.5 Pro — while running faster, costing less, and keeping data on the device.
The future is intent extraction. Large AI models can already infer intent from user behavior, but they usually run in the cloud. That creates three problems. They’re slower. They’re more expensive. And they raise privacy concerns, because user actions can be sensitive.
Google’s solution is to split the task into two simple steps that small, on-device models can handle well.
- Step one: Each screen interaction is summarized separately. The system records what was on the screen, what the user did, and a tentative guess about why they did it.
- Step two: Another small model reviews only the factual parts of those summaries. It ignores the guesses and produces one short statement that explains the user’s overall goal for the session.
- By keeping each step focused, the system avoids a common failure mode of small models: breaking down when asked to reason over long, messy histories all at once.
How the researchers measure success. Instead of asking whether an intent summary “looks similar” to the right answer, they use a method called Bi-Fact. Using its main quality metric, an F1 score, small models with the step-by-step approach consistently outperform other small-model methods:
- Gemini 1.5 Flash, an 8B model, matches the performance of Gemini 1.5 Pro on mobile behavior data.
- Hallucinations drop because speculative guesses are stripped out before the final intent is written.
- Even with extra steps, the system runs faster and cheaper than cloud-based large models.
How it works. Intent is broken into small pieces of information, or facts. Then they measure which facts are missing and which ones were invented. This:
- Shows how intent understanding fails, not just that it fails.
- Reveals where systems tend to hallucinate meaning versus where they drop important details.
The paper also shows that messy training data hurts large, end-to-end models more than it hurts this step-by-step approach. When labels are noisy — which is common with real user behavior — the decomposed system holds up better.
Why we care. If Google wants agents that suggest actions or answers before people search, it needs to understand intent from user behavior (how people move through apps, browsers, and screens). This research moves this idea closer to reality. Keywords will still matter, but the query will be just…
Source link
Disclaimer
We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.
Website Upgradation is going on for any glitch kindly connect at [email protected]