Meta Training on Pirated Books D

社交媒体巨头Meta Platforms Inc.承认在其人工智能研发中使用了未经授权的书籍数据，这一做法正引发版权争议。根据最新诉讼文件，Meta在训练其Llama 1和Llama 2大型语言模型时，使用了名为Books3的开源图书数据集，该数据集包含了近20万本图书的纯文本，总数据量达37GB。

尽管Books3是一个广泛使用的开源资源，但Meta在训练模型时所涉及的大量书籍仍受版权保护。公司方面认为，依据美国版权法中的“合理使用”原则，其无需为使用这些受版权保护的材料支付费用或取得许可。Meta的立场是，他们所进行的是人工智能的“机器学习”，而非直接复制或分发这些书籍内容，因此不构成侵权。

这一论点在版权法领域引发了激烈讨论。法律专家们指出，尽管合理使用原则为学术研究和某些类型的创作提供了空间，但Meta的做法——未经作者或出版商同意便在商业产品中使用版权材料——可能超出了合理使用的范围。

对于Meta的这一行为，部分版权持有者和业界人士表达了不满，认为Meta的行为损害了内容创作者的合法权益。他们担忧，如果Meta的辩解成立，可能会为其他公司树立一个使用版权材料的危险先例，从而进一步削弱作者和出版社的版权保护。

Meta的这一行为同时也引起了公众的广泛关注，许多人在社交媒体上表达了对此事的看法。有人认为Meta的行为是对版权的漠视，应当受到法律的制裁；也有人认为Meta可能是在合理范围内使用这些资源，不应过度指责。

目前，Books3数据集中包含的书籍版权状况复杂，涉及的作品来自不同的时代和作者，要逐一取得授权可能面临巨大挑战。随着事件的发展，Meta的这一做法可能会推动对于人工智能“合理使用”范围的新讨论，甚至可能导致相关法律法规的更新和完善。

英文翻译：
Title: Meta’s Training on ‘Pirated’ Books Sparks Legal Debate
Keywords: Meta, Pirated Books, Legal Controversy

News content:
The social media behemoth Meta Platforms Inc. has admitted to using unauthorized book materials in its artificial intelligence research, sparking a copyright controversy. According to the latest court documents, Meta utilized the Books3 open-source book dataset to train its Llama 1 and Llama 2 large language models. This dataset comprises the text of nearly 200,000 books and amounts to a total data volume of 37GB.

Despite being an open-source resource widely used, the books within Books3 are still under copyright protection. Meta contends that, under the “fair use” provision of U.S. copyright law, it does not need to pay for or obtain licenses for the use of these copyrighted materials. The company’s argument is that what they are doing is “machine learning” for artificial intelligence, rather than directly copying or distributing the content of these books, thus not constituting infringement.

This argument has sparked heated discussions within the copyright law domain. Legal experts note that while the fair use principle provides space for academic research and certain types of creations, Meta’s actions — using copyrighted materials in commercial products without the consent or license of the authors or publishers — may exceed the scope of fair use.

This behavior by Meta has evoked dissatisfaction from some copyright holders and industry figures, who believe that Meta’s actions undermine the legitimate rights of content creators. They worry that if Meta’s defense holds, it could set a dangerous precedent for other companies using copyrighted materials, further weakening the copyright protection for authors and publishers.

This incident has also garnered widespread public attention, with many people expressing their views on social media. Some argue that Meta’s actions show disregard for copyright and should be punished by law; others believe that Meta may be using these resources within a reasonable scope and should not be overly blamed.

Currently, the copyright status of the books within the Books3 dataset is complex, involving works from different eras and authors. Securing authorization for each work individually may present an enormous challenge. As the incident unfolds, Meta’s actions could lead to new discussions about the scope of “fair use” in artificial intelligence or even prompt updates and improvements to relevant laws and regulations.

【来源】https://www.techspot.com/news/101507-meta-admits-using-pirated-books-train-ai-but.html