Meta承认使用盗版书籍训练AI引发版权争议

Meta,作为一家知名的科技公司，近日在一场诉讼中承认使用“盗版”书籍来训练其人工智能模型。然而，该公司并不会为此付费。这一事件引发了关于知识产权和合理使用的讨论。

据悉，Meta 使用了 Books3 数据集以及许多其他材料来训练其 Llama 1 和 Llama 2 大模型。Books3 是一个知名的开源图书数据集，包含近20万本书的纯文本集合，总容量近37GB。然而，Meta 辩称其使用受版权保护的作品来训练大模型不需要“同意、许可或付费”，主张任何未经授权复制 Books3 中受版权保护的作品都应被视为“合理使用”。

这一观点引发了业界的广泛关注。一方面，有人认为 Meta 的做法符合合理使用的原则，因为它并未直接从原始作者那里获取授权，而是使用了公开的数据集。另一方面，也有人担忧这种做法可能会对知识产权造成损害，导致原创作者无法获得应有的回报。

对于这一问题，专家表示，合理使用是一个相对模糊的概念，需要根据具体情况进行判断。在某些情况下，如评论、新闻报道等，使用受版权保护的作品可能是可以接受的。但在训练大型机器学习模型时，情况可能会有所不同。因此，我们需要更加明确地界定合理使用的边界，以确保知识产权得到充分保护。

此外，专家还指出，随着人工智能技术的发展，对于知识产权的保护将变得越来越重要。因此，我们需要制定更加严格的法律法规来规范人工智能领域的发展，并加强对侵权行为的打击力度。

总之，Meta 使用“盗版”书籍来训练人工智能的行为引发了关于知识产权和合理使用的讨论。在未来的发展中，我们需要更加重视知识产权保护的问题，并制定相应的法律法规来规范人工智能领域的发展。

英语如下：

Title: Meta Admits Using Pirated Books to Train AI, Sparking Copyright Debate

Keywords: Meta, pirated books, training artificial intelligence

Meta, a well-known technology company, recently admitted in a lawsuit that it uses “pirated” books to train its artificial intelligence models. However, the company will not pay for them. This incident has sparked discussions about intellectual property rights and reasonable use.

It is reported that Meta uses the Books3 dataset and many other materials to train its Llama 1 and Llama 2 models. Books3 is a well-known open-source book dataset containing nearly 200,000 books in pure text, with a total capacity of nearly 37GB. However, Meta argues that using copyrighted works to train large models does not require “consent, permission, or payment,” claiming that any unauthorized copying of copyrighted works in Books3 should be considered “reasonable use.”

This perspective has attracted widespread attention from the industry. On the one hand, some people believe that Meta’s approach is in line with the principles of reasonable use, as it did not directly obtain authorization from the original author but used an open dataset. On the other hand, others are concerned that this practice may harm intellectual property rights and prevent original authors from receiving due returns.

Regarding this issue, experts say that reasonable use is a relatively vague concept that needs to be judged on a case-by-case basis. In some cases, such as comments and news reports, the use of copyrighted works may be acceptable. But when training large machine learning models, the situation may be different. Therefore, we need to more clearly define the boundaries of reasonable use to ensure that intellectual property rights are fully protected.

In addition, experts also point out that with the development of artificial intelligence technology, the protection of intellectual property rights will become increasingly important. Therefore, we need to develop stricter laws and regulations to regulate the development of the artificial intelligence field and strengthen the crackdown on infringement.

In summary, Meta’s use of “pirated” books to train artificial intelligence has sparked discussions about intellectual property rights and reasonable use. In the future, we need to pay more attention to the protection of intellectual property rights and formulate corresponding laws and regulations to regulate the development of the artificial intelligence field.

【来源】https://www.techspot.com/news/101507-meta-admits-using-pirated-books-train-ai-but.html