阿里云近日公布了一项重要的多模态大模型研究进展。其通义千问视觉理解模型 Qwen-VL 再次升级,推出了 Max 版本。这个升级版模型具备更强的视觉推理能力和中文理解能力,能够实现图片识人、答题、创作、写代码等功能。在多个权威测评中,Qwen-VL-Plus 和 Qwen-VL-Max 表现优异,整体性能堪比 GPT-4V 和 Gemini Ultra。
在 MMMU、MathVista 等测评中,Qwen-VL-Plus 和 Qwen-VL-Max 远超业界所有开源模型。在文档分析(DocVQA)、中文图像相关(MM-Bench-CN)等任务上,Qwen-VL 甚至超越了 GPT-4V,达到了世界最佳水平。
这一研究成果的发布,标志着我国在多模态大模型领域的研究已迈入世界先进行列。通义千问 Qwen-VL 模型的升级,将为人工智能在视觉理解和中文处理等方面的应用带来更多可能性。
英文翻译:
News Title: Alibaba Cloud Announces Upgraded Multimodal Large Model Qwen-VL, Outperforming GPT-4V and Google Gemini
Keywords: Alibaba Cloud, Qwen-VL, Multimodal Large Model, GPT-4V, Gemini
News Content:
Alibaba Cloud recently announced an important research progress in multimodal large models. Its visual understanding model Qwen-VL has been upgraded to the Max version, which possesses stronger visual reasoning and Chinese comprehension capabilities. The upgraded model can recognize people, answer questions, create content, and write code based on images. In multiple authoritative evaluations, Qwen-VL-Plus and Qwen-VL-Max have demonstrated excellent performance, comparable to GPT-4V and Gemini Ultra.
Qwen-VL-Plus and Qwen-VL-Max far surpass all open-source models in the industry in evaluations such as MMMU and MathVista. In tasks related to document analysis (DocVQA) and Chinese image processing (MM-Bench-CN), Qwen-VL even surpasses GPT-4V, achieving the world’s best level.
The release of this research progress marks the entry of China’s multimodal large model research into the world’s forefront. The upgrade of Qwen-VL model will bring more possibilities for the application of artificial intelligence in visual understanding and Chinese processing.
【来源】https://news.mydrivers.com/1/960/960575.htm
Views: 1