谷歌发布Gemini大模型，多模态能力引关注

谷歌于近日发布了 Gemini 大模型，这款模型在多模态能力方面表现出色，引起了行业内的广泛关注。Gemini 是由谷歌与智谱 AI 公司共同训练的语言模型 GLM-130B 开发而成，其优势在于能够处理自然语言处理中的多模态问题。

据悉，Gemini 在 MMLU 多任务语言理解数据集测试中表现突出，甚至超越了人类专家。同时，Gemini 的实时对人类的涂鸦和手势动作给出评论和吐槽的能力也让人印象深刻。

然而，Gemini 的发布也引发了一些质疑。有人指出，Gemini 的测试标准有失偏颇，效果视频疑似剪辑。此外，Gemini 的原图比例尺也有点不厚道，90.0% 与人类基准 89.8% 明明只差一点，y 轴上却拉开很远。

虽然如此，Gemini 的发布还是给了其他团队很大信心。GPT-4 从此不再是独一无二、难以企及的存在了。正如 AI 搜索产品 PerplexityAI 创始人 Aravind Srinivas 总结：1、Gemini 证明了 OpenAI 之外的团队可以搞出超越 GPT-4 的模型 2、训练到位的密集模型可以超越 GPT-4 的稀疏模型架构推论：从大教师模型蒸馏小尺寸密集模型会成为未来趋势，实现效率和能力的最佳结合。

Gemini 的多模态能力也是人们关注的重点。针对开头画小鸭子的视频，Gemini 给出了很多种可以烹饪的菜肴，而且每个都配有图片和教程链接。此外，Gemini 还支持多语言处理能力，中文、英文、法文、日文等多种语言都可以进行处理。

总的来说，Gemini 的发布引起了人们的热议，多模态能力的表现也让人印象深刻。但是，Gemini 仍有很多细节需要完善，比如训练数据、参数规模等。

英文翻译：

Headline: Google releases Gemini large model, drawing attention to its multimodal capabilities
Keywords: Gemini, large model, multimodal, attention

News content:

Google has recently released Gemini, a large language model developed with 智谱 AI Company’s GLM-130B. This model has shown remarkable capabilities in handling multimodal problems, which has attracted a lot of attention in the field. Gemini can process natural language processing tasks such as text classification, named entity recognition, machine translation, and more.

It has been noted that Gemini has some limitations in terms of testing standards and the video quality. Moreover, the original image ratio of Gemini is a bit misleading. For example, the difference between the human benchmark of 89.8% and Gemini’s 90.0% is only 1%.

Despite these issues, the release of Gemini has given other teams a lot of confidence. GPT-4 is no longer the only model that can be compared to. As AI search product PerplexityAI founder Aravind Srinivas summarizes: 1, Gemini demonstrates that teams outside OpenAI can create models that surpass GPT-4. 2. Well-trained dense models can also surpass GPT-4’s sparse model architecture. The trend is that large teachers with small sizes will become the best combination of efficiency and capabilities in the future.

Gemini’s multimodal capabilities have also attracted attention. For example, when analyzing a video of a person drawing a duck, Gemini gave many recipes for cooking duck, along with images and tutorials. Additionally, Gemini supports multilingual processing, which can handle Chinese, English, French, Japanese, and more languages.

Overall, the release of Gemini has caused a lot of buzz, and its multimodal capabilities have impressed people. However, there are still some details that need to be worked out, such as training data and parameter sizes.

【来源】https://www.ithome.com/0/737/691.htm