“My primary focus for the India AI Mission would be to put all the money, first of all, into collecting data,” said one of the speakers at a discussion organised by the Software Freedom Law Centre at IIIT Hyderabad on January 31, 2026.

Several speakers said the shortage of high-quality, India-specific datasets for training Large Language Models (LLMs) is holding back India’s AI growth. They argued that openness must extend beyond releasing model weights to making training data genuinely accessible, especially not through “openwashing”, and outlined steps the government could take to support large-scale, credible data collection.

The discussion was held under the Chatham House Rules. This news report includes only the information shared during the discussion, without identifying the speakers or attendees.

Why is India’s AI Mission Not Enough?

The speaker was referring to the India AI Mission (2024), which allocated Rs 10,372 crore over five years to “encourage the development of artificial intelligence in India.” In the recent Union Budget (2026), the government further reduced this year’s allocation to Rs 1,000 crore from Rs 2,000 crore the previous year, as the India AI Mission reportedly utilised only Rs 800 crore last year.

“It’s not lakhs of crores, by the way. It’s Rs 10,000 crore. It’s not even a drop in the ocean, compared to megacorps in the US and China,” he said, commenting on the overall budget allocation for the mission.

Arguing further, he pointed to the lengthy selection process and the rapid pace of AI development.

“When it was announced, you first had to go through a major selection process. By the time that process was completed, the GPUs had already changed. You make a proposal based on the current GPUs, the current costs, and everything else, but by the time the proposal is approved, the whole situation has shifted. Then you either go through the cycle again or end up using older GPUs,” he commented.

“I think from the beginning, our approach, at least at Viswam.ai and Swecha, has been that the focus should be on data rather than compute,” he added, referring to Viswam AI, a joint initiative of Swecha and IIIT Hyderabad.

The Lack of high-quality data in Indian data sets

Emphasising the need for high-quality datasets, one speaker said, “Everybody is starved of data. And the way out they’re taking is using synthetic data,” which, he argued, results in “bad models.”

“They are generating their own synthetic data because of what they got with the grants, the computing power. What do you do with the computer? You generate data. But how do you generate data? You generate data with a bad model for our language. So what do you get? You get bad data. You end up with a bad model. There is no way out. And I can say without a doubt that it will not be a state-of-the-art model,” he explained, describing the loop created by the lack of quality…


Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

 

 

Categorized in:

Blog,

Last Update: February 4, 2026