NVIDIA and Google infrastructure cuts AI inference costs

At the Google Cloud Next conference, Google and NVIDIA outlined their hardware roadmap designed to address the cost of AI inference at scale.

The companies detailed the new A5X bare-metal instances, which run on NVIDIA Vera Rubin NVL72 rack-scale systems. Through hardware and software codesign, this architecture aims to deliver up to ten times lower inference cost per token compared to previous generations, while concurrently achieving ten times higher token throughput per megawatt.

Connecting thousands of processors requires massive bandwidth to prevent processing delays. The A5X instances address this hardware challenge by pairing NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology.

This configuration scales to 80,000 NVIDIA Rubin GPUs within a single site cluster, and up to 960,000 GPUs across a multisite deployment. Operating at this scale requires sophisticated workload management, as routing data across nearly a million parallel processors demands exact synchronisation to avoid idle compute time.

Mark Lohmeyer, VP and GM of AI and Computing Infrastructure at Google Cloud, said: “At Google Cloud, we believe the next decade of AI will be shaped by customers’ ability to run their most demanding workloads on a truly integrated, AI‑optimised infrastructure stack.

“By combining Google Cloud’s scalable infrastructure and managed AI services with NVIDIA’s industry‑leading platforms, systems and software, we’re giving customers flexibility to train, tune, and serve everything from frontier and open models to agentic and physical AI workloads—while optimising for performance, cost, and sustainability.”

Sovereign data governance and cloud security requirements

Beyond raw processing capabilities, data governance remains a primary issue for enterprise deployments. Highly regulated sectors, including finance and healthcare, often stall machine learning initiatives due to data sovereignty requirements and the risks of exposing proprietary information.

To address these compliance mandates, Google Gemini models running on NVIDIA Blackwell and Blackwell Ultra GPUs are entering preview on Google Distributed Cloud. This deployment method allows organisations to retain frontier models entirely within their controlled environments, alongside their most sensitive data stores.

The architecture incorporates NVIDIA Confidential Computing. This hardware-level security protocol ensures that training models operate within a protected environment where prompts and fine-tuning data remain encrypted. The encryption prevents unauthorised parties, including the cloud infrastructure operators themselves, from viewing or altering the underlying data.

For multi-tenant public cloud environments, a preview of Confidential G4 VMs equipped with NVIDIA RTX PRO 6000 Blackwell GPUs introduces these same cryptographic protections, giving regulated industries access to high-performance hardware without violating data privacy standards. This release…

Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

Categorized in: