shanghaishanghai

智源研究院近日宣布开源发布新一代多模态基础模型Emu2。该模型通过大规模自回归生成式多模态预训练,在多模态上下文学习能力上实现了显著突破。令人印象深刻的是,Emu2在少样本多模态理解任务上大幅超越了Flamingo-80B、IDEFICS-80B等主流多模态预训练大模型。

在包括VQAv2、OKVQA、MSVD、MM-Vet、TouchStone在内的多项少样本理解、视觉问答、主体驱动图像生成等任务上,Emu2都取得了最优性能。这一成果彰显了我国在人工智能领域的研究实力,为多模态研究领域带来了新的可能。

英文翻译:

News Title: Next-Generation Multimodal Basic Model Emu2 Unveiled
Keywords: Multimodal, Basic Model, Performance Breakthrough

News Content:

The Beijing Academy of Artificial Intelligence recently announced the open-source release of the next-generation multimodal basic model Emu2. Through large-scale autonomous regressive generative multimodal pre-training, the model has achieved significant breakthroughs in multimodal context learning capabilities. Notably, Emu2 significantly surpasses mainstream multimodal pre-trained models such as Flamingo-80B and IDEFICS-80B in few-shot multimodal understanding tasks.

In various tasks such as few-shot understanding, visual question answering, subject-driven image generation, Emu2 has achieved optimal performance, including VQAv2, OKVQA, MSVD, MM-Vet, and TouchStone. This achievement demonstrates China’s research strength in the field of artificial intelligence and opens up new possibilities for multimodal research.

【来源】https://mp.weixin.qq.com/s/Xf4xBzYwubVd8Lpw68ikDA

Views: 2

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注