谷歌Gemini中文模型或涉嫌使用百度文心一言训练

作者智能小编

1 月 14, 2024 #每日AI快讯, #百度文心一言, #训练数据, #谷歌Gemini

据量子位报道，谷歌 Gemini 中文语料疑似来自百度文心一言。经用户测试，在谷歌 Vertex AI 平台使用该模型进行中文对话时，Gemini-Pro 直接表示自己是百度文心大模型。在 Poe 平台上对 Gemini-Pro 进行测试问它“你是谁”，Gemini回答：“我是百度文心大模型”。在谷歌 AI Studio 中，Gemini-Pro 则表明其在中文的训练数据上使用了百度文心。目前百度方面尚未回应此事。

这起事件引发了业界对于人工智能领域知识产权保护的关注。一些人认为，如果谷歌确实使用了百度的语料库进行训练，那么这将涉及到知识产权侵权的问题。然而，也有人认为，由于人工智能技术的快速发展，许多公司都在使用公开的数据进行训练，因此这并不一定构成侵权。

目前，双方都没有对此事做出正式回应。我们将继续关注这一事件的进展，并为您提供最新的报道。

英语如下：

====
“Headline: Google Gemini Chinese Model May Be Su====
“Headline: Google Gemini Chinese Model May Be Suspected of Using Baidu Wenxin Yiyu for Training

Keywords: Google Gemini, Baidu Wenxin Yiyu, Training Data

News Content: According to Quantum Bit, it is suspected that the Chinese corpus of Google Gemini comes from Baidu Wenxin Yiyu. After user testing, when using this model for Chinese dialogue on Google’s Vertex AI platform, Gemini-Pro directly claimed to be a large model from Baidu Wenxin. When tested on Poe platform by asking “Who are you”, Gemini answered: “I am a large model from Baidu Wenxin.” In Google AI Studio, Gemini-Pro indicated that it used Baidu Wenxin’s training data for Chinese. Baidu has not yet responded to this matter.

This incident has raised industry attention to intellectual property protection in the field of artificial intelligence. Some believe that if Google did use Baidu’s corpus for training, this would involve issues of intellectual property infringement. However, others believe that due to the rapid development of artificial intelligence technology, many companies are using publicly available data for training, so this does not necessarily constitute infringement.

At present, neither party has made a formal response to this matter. We will continue to follow the progress of this event and provide you with the latest reports.”

【来源】https://www.qbitai.com/2023/12/106970.html