top of page

Meta wins AI copyright case against authors

  • Staff Writer
  • Jun 26
  • 3 min read
ree

Tech firms accused of using copyrighted books to train AI models have been handed a breather after US federal courts ruled this week that their actions were legal and fell within the fair use doctrine of the copyright law. 

On Wednesday, a US federal judge sided with Meta in a class-action suit filed in July 2023 by 13 authors, including Sarah Silverman, Junot Diaz, and Andrew Sean Greer, against the use of their books as training material for Llama without their consent. The authors also claimed in the lawsuit that Llama’s output was mimicking their writing styles. 


The federal judge, Vince Chhabria, ruled that Meta’s use of the copyrighted books to train AI models to generate new text was transformative and was not meant to replicate them. 


Having said that, Chhabria noted that the ruling doesn’t mean that all use of copyrighted work for AI training is legal. He said that the plaintiffs in this case failed to provide “meaningful evidence” to show that Meta’s copying of books harmed the market for authors. 


“In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use,” Chhabria added. 


Books are considered high-quality data to train AI models as they have a coherent structure and are much better written than text-based materials available on the Internet or social media. 

Generative AI models use this training data to learn patterns of how words are used together in different contexts and then apply them to provide text-based answers to specific user queries. However, they are trained not to memorize and reproduce copyrighted material from their training data in any of their responses. 

An expert witness, testifying for Meta, told the judge that the model could not generate more than 50 words and punctuation marks from the plaintiffs’ books. 


In another copyright case this week, a US federal judge, William Alsup, ruled in favour of AI startup Anthropic in a lawsuit filed in August 2024 by three authors, Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who claimed that Anthropic copied their books from pirated and purchased sources to create a central library for training AI models. 

The federal judge in this case also said that Anthropic’s use of the training data amounted to fair use. However, the court will hold another trial to look into the potential damages caused by the central library’s creation.


Anthropic acknowledged that it knowingly downloaded millions of books from known pirated libraries to avoid “legal/practice/business slog.” It acquired 196,640 books in January 2021 from Books3, 5 million books from Library Genesis (LibGen) in June 2021, and 2 million books from Pirate Library Mirror (PiLiMi) in July 2022. 


Copyright lawsuits are turning out to be a major concern for tech firms as they need large volumes of data to train large language models (LLMs).  They have defended their actions on the ground that acquiring licenses is time taking and challenging. 


For instance, Meta claims that it initially tried to acquire books by negotiating licensing deals with publishers and was willing to pay $100 million for them. However, the process proved to be difficult and time-consuming as the rights to license books for AI training were held by individual authors. After failing to acquire book licenses, Meta decided in 2023 to use books downloaded from shadow libraries such as LibGen and Anna’s Archive to train Llama. Meta is believed to have downloaded more than 80 terabytes of pirated books from these libraries.


The authors claim that Meta never approached them for permission to use their books, but they are now open to signing licensing deals with the company for training AI on their books. 


Several tech firms are facing similar copyright lawsuits across the US. In March 2024, Nvidia was sued by authors Brian Keene, Abdi Nazemian and Stewart O’Nan for using their copyrighted books to train its AI framework NeMo without their permission.

In April 2025, 12 separate copyright cases against OpenAI and Microsoft were merged into one lawsuit by the order of a US judicial panel. 

The merged lawsuit includes cases filed in California by authors Ta-Nehisi Coates, Michael Chabon, Junot Díaz, and Sarah Silverman; and cases filed in New York by the New York Times and authors John Grisham, George Saunders, Jonathan Franzen, and Jodi Picoult.



Image credit: Pexels

bottom of page