Meta争议中承认未经授权使用盗版书籍训练AI

社交媒体巨头Meta Platforms Inc. 在一场诉讼中承认，未经版权持有者同意，使用了可能构成“盗版”的书籍数据来训练其人工智能模型。据悉，Meta使用了名为Books3的数据集来训练Llama 1和Llama 2大模型，该数据集包含了近20万本图书的纯文本，总容量高达37GB。

在诉讼案中，Meta方面辩称，其使用受版权保护的作品来训练人工智能模型属于“合理使用”，因此无需获得版权持有者的“同意、许可或付费”。这一主张在版权法和人工智能伦理界引发了广泛争议。合理使用是一个法律概念，通常指在不侵犯版权的情况下，为了评论、新闻报道、教学、学术研究等目的而使用版权作品。然而，利用大量未经授权的版权材料来训练人工智能模型，是否构成合理使用，目前在法律上尚无定论。

Books3数据集是一个知名的开源图书数据集，虽然它提供了大量文本资源，但其包含的许多作品仍受到版权保护。meta的这一行为可能涉及众多出版商的权益，因此受到了他们的关注和质疑。

Meta的这一行为在其竞争对手和出版业界引起了轩然大波。有观点认为，Meta的行为侵犯了作者和出版商的合法权益，有可能对整个出版产业造成负面影响。另一方面，有专家表示，Meta的立场可能代表了人工智能企业对未来版权法律的一种挑战，即在人工智能快速发展的今天，如何平衡版权保护与技术发展的需求。

对于Meta的这一行为，业界需密切关注其后续发展，因为这不仅关系到Meta自身的发展，也可能对未来人工智能技术的研发和版权法律的解释产生深远影响。

英文翻译：
Title: Meta Controversially Admits to Using Unauthorized “Pirated” Books to Train AI
Keywords: Copyright Controversy, AI Training, Open Source Dataset

News content:
The social media behemoth Meta Platforms Inc. has admitted, in the midst of a lawsuit, to using potentially pirated book datasets without the consent of copyright holders to train its artificial intelligence models. It has been revealed that Meta utilized the Books3 dataset to train its Llama 1 and Llama 2 models, which encompasses nearly 200,000 books in text form, with a total capacity of up to 37GB.

In the lawsuit, Meta has argued that using copyrighted works to train AI models constitutes “fair use,” and therefore, they do not require permission, licenses, or payments to the copyright holders. This claim has sparked widespread debate within the realms of copyright law and AI ethics. Fair use is a legal concept that typically allows for the use of copyrighted works without permission for purposes such as criticism, news reporting, teaching, academic research, and others. However, it remains unclear under the law whether using a large amount of copyrighted material without authorization to train AI models qualifies as fair use.

The Books3 dataset, a renowned open-source collection of books, although providing extensive text resources, still contains numerous works that are under copyright protection. Meta’s actions may infringe upon the rights of many publishers, drawing their attention and criticism.

This move by Meta has caused an uproar among its competitors and within the publishing industry. Some argue that Meta’s actions violate the合法权益 of authors and publishers, potentially having a negative impact on the entire publishing industry. Meanwhile, experts suggest that Meta’s stance may represent a challenge by artificial intelligence companies to copyright law interpretation, regarding the balance between copyright protection and technological progress in the age of rapidly developing AI.

The subsequent developments surrounding Meta’s actions are worth close observation, as they could not only impact Meta’s own progression but also shape the interpretation of copyright law and the research and development of AI technology in the future.

【来源】https://www.techspot.com/news/101507-meta-admits-using-pirated-books-train-ai-but.html