In a new legal battle in the AI space, Meta and CEO Mark Zuckerberg have been sued by five publishers and author Scott Turow, who allege the tech company illegally copied millions of books, articles and other works to train Meta’s artificial-intelligence systems.
“In their effort to win the AI ‘arms race’ and build a functional generative AI model, Defendants Meta and Zuckerberg followed their well-known motto: ‘move fast and break things,’” the plaintiffs say in their lawsuit. “They first illegally torrented millions of copyrighted books and journal articles from notorious pirate sites and downloaded unauthorized web scrapes of virtually the entire internet. They then copied those stolen fruits many times over to train Meta’s multibillion-dollar generative AI system called Llama. In doing so, Defendants engaged in one of the most massive infringements of copyrighted materials in history.”
The suit was filed Tuesday (May 5) in the U.S. District Court for the Southern District of New York by five publishers (Hachette, Macmillan, McGraw Hill, Elsevier and Cengage) and Turow individually. The proposed class-action suit seeks unspecific monetary damages for the alleged copyright infringement. A copy of the lawsuit is available at this link.
Asked for comment, a Meta spokesperson said, “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use. We will fight this lawsuit aggressively.”
Authors have sued AI companies for copyright infringement before — and lost.
For example, in June 2025, a federal judge rejected a claim brought by 13 authors, including Sarah Silverman and Junot Díaz, that Meta violated their copyrights by training its AI model on their books. Judge Vincent Chhabria ruled that Meta had engaged in “fair use” when it used a data set of nearly 200,000 books to train its Llama language model for generative AI.
But the latest lawsuit alleges that Meta and Zuckerberg deliberately circumvented copyright-protection mechanisms — and had considered paying to license the works before abandoning that strategy at “Zuckerberg’s personal instruction.” The suit essentially argues that the conduct described falls outside protections afforded by fair-use provisions of the U.S. copyright code.
“Meta — at Zuckerberg’s direction — copied millions of books, journal articles, and other written works without authorization, including those owned or controlled by Plaintiffs and the Class, and then made additional copies of those works to train Llama,” the suit says. “Zuckerberg himself personally authorized and actively encouraged the infringement. Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.”
According to the lawsuit, after the release of Llama 1, Meta briefly considered entering into licensing deals with major publishers. Meta discussed increasing the company’s “dataset licensing” budget to as much as $200 million from January to April 2023, per the complaint.
But then in early April 2023, “Meta abruptly stopped its licensing strategy,” according to the lawsuit. “The question of whether to license or pirate [copyrighted material] moving forward was ‘escalated’ to Zuckerberg. After this escalation to Zuckerberg, Meta’s business development team received verbal instructions to stop licensing efforts. One Meta employee presciently described the rationale: ‘if we license once [sic] single book, we won’t be able to lean into the fair use strategy.'”
According to the lawsuit, Meta and Zuckerberg “are well aware of the market for licensing AI training materials.” Meta signed four licenses in 2022 with African-language book publishers for “a limited training set, and it subsequently reached licensing agreements with major news publishers including Fox News, CNN and USA Today,” the suit says.
On Dec. 13, 2023, Meta employees internally circulated a memo concerning the legal risks of using LibGen, a repository of copyrighted material that the Meta memo described as “a dataset we know to be pirated” and added that “we would not disclose use of Libgen datasets used to train,” per the suit. “Ultimately, however, those concerns went unheeded. Zuckerberg and other Meta executives authorized and directed the torrenting of over 267 TB of pirated material — equivalent to hundreds of millions of publications and many times the size of the entire print collection of the Library of Congress,” according to the lawsuit.
As a result of the alleged infringement, Meta’s AI system “readily generates, at speed and scale, substitutes for Plaintiffs’ and the Class’s works on which it was trained,” the lawsuit states. “Those substitutes take multiple forms, including verbatim and near-verbatim copies, replacement chapters of academic textbooks, summaries and alternative versions of famous novels and journal articles, inferior knockoffs that copy creative elements of original works, and derivative works exclusively reserved to rights holders. Llama even tailors outputs to mimic the expressive elements and creative choices of specific authors.”