One thing that I thought we’d see more of in 2025 was how Gemini could control your Android phone. There was the May demo and other underlying work, but we don’t have Google’s complete vision yet.

At I/O 2025 in May, Google demoed the latest research prototype of Project Astra that could retrieve content from the web/Chrome, search and play YouTube videos, search through your emails, make calls on your behalf, and place orders.

The nearly 2-minute demo showed Gemini scrolling a PDF in Chrome for Android, as well as opening the YouTube app to the search results page, scrolling, and then selecting/tapping a video. Google is working to bring these capabilities to Gemini Live. 

In October, Google made a Computer Use model available to developers in preview that lets Gemini interact with — by scrolling, clicking, and typing — user interfaces like humans do. What’s currently available is “optimized for web browsers,” but Google noted “strong promise for mobile UI control tasks.” 

Advertisement – scroll for more content

Google described these capabilities as a “crucial next step in building powerful, general-purpose agents” since “many digital tasks still require direct interaction with graphical user interfaces.”

A future version of Siri will let you “take action in and across apps” using your voice. The vision Apple pitched in 2024 is that tasks that would have required you to jump through multiple apps “could be addressed in a matter of seconds” through a series of voice prompts. Apple has detailed what app developers must do to support this. So far, we’ve had nothing from Google, specifically the Android team, if a similar system or approach is coming.  

…Siri can take actions across apps, so after you ask Siri to enhance a photo for you by saying “Make this photo pop,” you can ask Siri to drop it in a specific note in the Notes app — without lifting a finger.

Instead, what Google has shown is very generalized and seems to not require any prior integrations. In many ways, it’s the pragmatic approach, especially if Android developers don’t rush to support this in their apps.

This is not the first time Google has worked towards this. The premise of the new Google Assistant in 2019 was that on-device voice processing — a breakthrough at the time — would make “tapping to use your phone… seem slow.”

This next-generation Assistant will let you instantly operate your phone with your voice, multitask across apps, and complete complex actions, all with nearly zero latency.

This did not really take off in 2019 and never dropped Pixel-exclusivity, with it suffering from the same issues of the previous era of assistants, like regimented voice commands.

LLMs should let you…


Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

 

 

Categorized in:

Blog,

Last Update: December 29, 2025