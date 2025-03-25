News

Revealed: Meta pirated millions of books to train its AI engine

A new investigation has revealed Meta used millions of pirated publications to train its new AI systems.
David Burton
Meta is now battling multiple court cases, defending itself against claims of mass theft. Image: Julio Lopez on Unsplash.

An investigation by The Atlantic has revealed that Meta, the company that owns Facebook, Instagram and WhatsApp, used millions of pirated books to train its hyped AI software, Llama. The announcement comes as publishers worldwide struggle to determine fair use of published works in training AI models.

In the ongoing development of its AI engine, Meta software engineers experimented with different legal strategies to obtain book data. Court documents show that the Meta team was frustrated by the potential expense and extended wait time of obtaining book data legally. Employees turned to LibGen or Library Genesis. LibGen is one of the largest libraries of pirated books available online, containing over 7.5 million books and 81 million research papers. The Meta developer team eventually got permission from ‘MZ’ – allegedly Meta CEO Mark Zuckerberg – to download and use the data. 

Several authors, including Sarah Silverman and Junot Diaz, have now brought a copyright-infringement lawsuit against Meta. Meta argues that the use of the data is ‘fair use’ as large language models (LLM) that fuel AI ‘transform’ the original material into new work. Neither the courts nor the industry are near resolving whether this argument is reasonable.

Regardless, it’s clear that the millions of authors implicated in the legal action are furious. “None of these authors were compensated for the use of their work, nor asked permission for its use,” Australian author Jay Kristoff posted on Instagram (a Meta-owned platform). “I’m not being hyperbolic when I say that LLM (aka AI) technology represents a clear and present danger to the human artistic endeavour… It is a plagiarism machine, trained on the stolen works of hundreds of thousands of human artists, developed solely to supplant the very artists from which it was stolen.”

The court case is ongoing. Meanwhile, the Meta AI engine continues to underpin many of its services across various platforms. 

David Burton is a writer from Meanjin, Brisbane. David also works as a playwright, director and author. He is the playwright of over 30 professionally produced plays. He holds a Doctorate in the Creative Industries.

