Two book authors are asking for the destruction of Artificial Intelligence (AI) models that tech giant Apple trained on copyrighted material, as part of a lawsuit in the US Court for the Northern District of California.

The proposed class action lawsuit accuses the tech company of training Apple Intelligence, a term that Apple uses for its AI tools across iPhones, MacBook, iPad, and other devices, on copyrighted works without consent or compensation

The complaint argues that by copying and retaining copyrighted works for AI training, Apple infringed upon the authors’ exclusive rights and thus violated the US Copyright Act.

In addition to the destruction of Apple’s infringing models and datasets, the plaintiffs are also pleading the court to award them with statutory damages and permanently prohibit Apple from using copyrighted material for AI training.

What Do The Plaintiffs Allege?

The complaint alleges that Apple’s OpenELM models, designed for on-device use are trained on Books3: a dataset that allegedly contains copyrighted works, include those owned by the plaintiffs.

Apple released OpenELM models in April 2024, including variants that can store 270 million, 450 million, 1.1 billion, and 3 billion weights found in a training dataset.

Notably, the pre-training dataset of these models contained a subset of RedPajama, which is a much bigger open-source dataset.

A paper about OpenELM from Apple specified this subset to be the ‘Books’ component of RedPajama, which is actually a copy of the Books3 dataset.

The plaintiffs claim their copyrighted works are part of this dataset, which Apple used to train OpenELM models. Based on the trajectory of datasets, the complaint argues that Apple effectively admitted to training its models on pirated copyrighted material.

Additionally, the complaint levels the same charge against the Foundational Language Models (FLMs) that power Apple Intelligence.

The complaint states that Apple revealed three training data sources for its FLMs. First, limited high-quality data licensed from publishers, used during “continued pre-training” rather than “core pre-training”. Second, “publicly-available or open-sourced datasets” which Apple does not specifically identify. Third, web pages crawled by Applebot since approximately mid-2015.

The lawsuit also alleges that Apple’s terminology “publicly available” and “open source” misleadingly describes pirated works. The complaint suggests Apple likely used Books3 as part of its “publicly-available or open-sourced datasets”, since the tech giant already possessed copies from OpenELM training, and that other AI companies had previously described Books3 using similar terminology.

The plaintiffs also claimed that Apple retained copies of all data scraped by AppleBot before major publishers could opt-out. Allegedly, the web crawler had been active since 2015, even though Apple did not acknowledge that…


Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We blogs.grocliq.com want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

 

 

Categorized in:

Blog,

Last Update: September 9, 2025