阿里云最新公布的通义千问多模态大模型Qwen-VL Max版本,在视觉推理和中文理解能力上实现了显著的升级。据快科技报道,Qwen-VL Max不仅在视觉理解上表现出色,还能根据图片识人、答题、创作和写代码。在多个权威测评中,Qwen-VL Max的性能赶超了GPT-4V和谷歌Gemini Ultra。
在MMMU、MathVista等测评中,Qwen-VL Max大幅领先业界所有开源模型。在文档分析(DocVQA)和中文图像相关(MM-Bench-CN)任务上,Qwen-VL Max的表现更是超越了GPT-4V,达到了世界最佳水平。这一进展进一步展示了我国在人工智能领域的强大实力。
With the release of the latest version of the Tsinghua University KEG Lab’s Qwen-VL multimodal model by Alibaba Cloud, the model’s visual reasoning and Chinese understanding capabilities have been significantly upgraded. According to the report from Kuai科技, Qwen-VL Max not only excels in visual understanding but is also capable of identifying people from images, answering questions, creating content, and writing code. In multiple authoritative evaluations, the performance of Qwen-VL Max surpasses that of GPT-4V and Google’s Gemini Ultra.
In evaluations such as MMMU and MathVista, Qwen-VL Max far outperforms all open-source models in the industry. In tasks like document analysis (DocVQA) and Chinese image-related (MM-Bench-CN), Qwen-VL Max’s performance exceeds that of GPT-4V and reaches the world’s best level. This progress further demonstrates China’s strong capabilities in the field of artificial intelligence.
【来源】https://news.mydrivers.com/1/960/960575.htm
Views: 1