Multiple authors filed a class-action copyright infringement lawsuit against NVIDIA, the US-based GPU chipmaker, alleging that it used their books without permission to train its AI models. The complaint also accuses NVIDIA of paying Anna’s Archive, a shadow library that often hosts illegally obtained large datasets, for high-speed access to pirated datasets containing approximately 500 terabytes of data.

In 2021, NVIDIA began developing its own large language models (LLMs), and it recently announced plans to offer end-to-end AI infrastructure services, expanding beyond its traditional role as a GPU (graphics processing unit) provider.

The Details of the Case Filed by NVIDIA:

The authors listed below filed a complaint in the US District Court for the Northern District of California, accusing NVIDIA of using pirated shadow libraries, including Anna’s Archive, to download copyrighted books and to copy, store and use that material to train its LLMs, including NeMo Megatron, without authorisation from the copyright owners.

The complainants of the class action lawsuit are:

  • Abdi Nazemian, author of ‘Like a Love Story’
  • Brian Keene, author of ‘Ghost Walk’
  • Stewart O’Nan, author of ‘Last Night at the Lobster’
  • Andre Dubus III, author of ‘The Garden of Last Days’, ‘The Cage Keeper’ and ‘Townie: A Memoir’
  • Susan Orlean, author of ‘The Orchid Thief’ and ‘The Library Book’

In addition to the authored books of the petitioners, they also submitted a non-exhaustive list of registered copyrights owned by them.

“NVIDIA unlawfully copied copyrighted material from illegal
pirate “shadow libraries.” NVIDIA collated and stored this material in centralized servers which its engineers (and other employees) could access for any purpose. NVIDIA and its employees subsequently made additional unlawful copies of this illegally-obtained copyrighted material during the LLM development process,” reads the complaint.

What data did NVIDIA allegedly access?

The complaint claims NVIDIA allegedly admitted to copying, storing and using copyrighted material to develop its AI models. It refers to the now-deleted dataset, The Pile, from Hugging Face, a GitHub-like platform for AI developers which can host LLMs. The “model cards” for NVIDIA’s models allegedly stated, “The model was trained on ‘ The Pile’ dataset prepared by EleutherAI.” A model card is an accessible ReadMe-type file that provides essential information to users. 

NVIDIA has allegedly accessed the Books3 dataset from the Bibliotik private tracker, which contains a large collection of fiction and nonfiction books and is nearly an order of magnitude larger than the next-largest book dataset, BookCorpus2. EleutherAI said it included Bibliotik because it is “invaluable for long-range context modelling research and coherent storytelling,” according to the paper published by EleutherAI, a non-profit AI research…


Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

 

 

Categorized in:

Blog,

Last Update: January 22, 2026