How Will DPDP Rules Affect AI Models Collecting, Retaining Data?

India’s new data protection regime is going to reshape how AI developers collect, train, and retain data, as the Digital Personal Data Protection (DPDP) Act, 2023, together with the DPDP Rules, 2025, introduces a consent-centric framework that applies to all digital personal data processed in India.

It requires Data Fiduciaries to obtain free, specific, and informed consent for each specified purpose, and to present notices that clearly itemise the personal data collected and the exact purpose behind its use. Consequently, AI companies building training datasets must explain why they are collecting each data field, how they will process it, and how users can withdraw consent.

Moreover, the Act mandates that data be erased once consent is withdrawn or when the specified purpose is no longer served. This requirement could force developers to design systems capable of selectively removing data from training pipelines and logs, especially for models updated continuously.

However, the law also draws important boundaries. It expressly excludes personal data made publicly available by the individual or under a legal obligation, offering AI firms some flexibility in sourcing publicly posted information.

Additionally, the DPDP Rules introduce a research exemption: which covers processing necessary for research, archiving, or statistical purposes, provided that such work complies with standards in the Second Schedule. Importantly, this carve-out may ease constraints on non-commercial and academic AI work.

How Should Companies Collect Personal Data?

India’s DPDP Rules regime forces AI firms to reconceive basic personal data collection flows with consent and purpose to be specified to users. Dhruv Garg, a technology lawyer of the Indian Governance And Policy Project research group framed the law’s core aim, saying, “The idea of this Act is to create a regime where the users know what all they have said ‘yes’ to in terms of processing of their personal data and access, and what they have said ‘no’ to, and what their rights are and how they can access those rights.”

Consequently, companies must map services to purposes, and then map purposes to the precise data they collect and generate. As Garg explained, firms will “map all their services, the purposes for those services, what data they collect for those services, and what data they generate through those services”.

Meanwhile, Nikhil Jhanji, Senior Product Manager at IDfy, pushed the operational implications further, arguing that “traceability and explainability is now non-negotiable”. He also recommended that teams should “embed logging directly into the data pipeline so every ingestion, preprocessing, or training event leaves a verifiable trail”.

Additionally, Jhanji noted that robust provenance “builds trust with regulators without exposing proprietary model design and allows for clear, transparent, and accessible communication…

Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

Categorized in: