An investigation by The Atlantic has revealed that Meta, the company that owns Facebook, Instagram and WhatsApp, used millions of pirated books to train its hyped AI software, Llama. The announcement comes as publishers worldwide struggle to determine fair use of published works in training AI models.
In the ongoing development of its AI engine, Meta software engineers experimented with different legal strategies to obtain book data. Court documents show that the Meta team was frustrated by the potential expense and extended wait time of obtaining book data legally. Employees turned to LibGen or Library Genesis. LibGen is one of the largest libraries of pirated books available online, containing over 7.5 million books and 81 million research papers. The Meta developer team eventually got permission from ‘MZ’ – allegedly Meta CEO Mark Zuckerberg – to download and use the data.Â