Revealed: Meta pirated millions of books to train its AI engine

A new investigation has revealed Meta used millions of pirated publications to train its new AI systems.
Meta is now battling multiple court cases, defending itself against claims of mass theft. Image: Julio Lopez on Unsplash.

An investigation by The Atlantic has revealed that Meta, the company that owns Facebook, Instagram and WhatsApp, used millions of pirated books to train its hyped AI software, Llama. The announcement comes as publishers worldwide struggle to determine fair use of published works in training AI models.

In the ongoing development of its AI engine, Meta software engineers experimented with different legal strategies to obtain book data. Court documents show that the Meta team was frustrated by the potential expense and extended wait time of obtaining book data legally. Employees turned to LibGen or Library Genesis. LibGen is one of the largest libraries of pirated books available online, containing over 7.5 million books and 81 million research papers. The Meta developer team eventually got permission from ‘MZ’ – allegedly Meta CEO Mark Zuckerberg – to download and use the data. 

Unlock Padlock Icon

Unlock this content?

Access this content and more

David Burton is a writer from Meanjin, Brisbane. David also works as a playwright, director and author. He is the playwright of over 30 professionally produced plays. He holds a Doctorate in the Creative Industries.