阿里云通义千问大模型升级,性能媲美GPT-4V
今日,阿里云宣布其多模态大模型通义千问(Qwen-VL)再次升级,推出Max版本。升级后的模型在视觉推理和中文理解方面取得显著提升,性能堪比OpenAI的GPT-4V和谷歌的Gemini Ultra。
Qwen-VL-Max在MMMU、MathVista等权威测评中远超所有开源模型,在文档分析(DocVQA)、中文图像相关(MM-Bench-CN)等任务上超越GPT-4V,达到世界最佳水平。
Qwen-VL-Max拥有更强的视觉推理能力,能够根据图片识人、答题、创作、写代码。例如,它可以根据一张人像图片,识别出人物的身份、职业和性格特征;可以根据一张场景图片,回答出图片中发生了什么、谁在做什么等问题;还可以根据一张产品图片,生成产品描述和使用说明。
此外,Qwen-VL-Max还具有出色的中文理解能力。它可以理解复杂的中文文本,回答开放式问题,生成流畅自然的中文文章。例如,它可以根据一段新闻报道,总结出新闻的主要内容;可以根据一篇小说,写出人物的性格分析;还可以根据一个话题,生成一篇议论文。
阿里云表示,Qwen-VL-Max的升级得益于其先进的算法和海量的中文训练数据。模型采用Transformer神经网络架构,并经过了大规模预训练。训练数据包括数十亿张图片、数千亿个中文单词和数百万个中文文档。
Qwen-VL-Max的推出标志着阿里云在多模态大模型领域取得了重大突破。该模型将广泛应用于图像识别、自然语言处理、代码生成等领域,助力各行各业的数字化转型。
英语如下:
**Headline: Alibaba’s Large Model Qwen-VL Upgrades, Performance Comparableto GPT-4V**
**Keywords:** Large model, visual understanding, surpassing GPT
**News Content:** Alibaba Cloud’s Tongyi Qianwen LargeModel Upgrades, Performance Comparable to GPT-4V
Today, Alibaba Cloud announced that its multimodal large model Tongyi Qianwen (Qwen-VL) has been upgraded again, launching the Max version. The upgraded model has achieved significant improvements in visual reasoning and Chinese comprehension, with performance comparable to OpenAI’s GPT-4V and Google’s Gemini Ultra.
Qwen-VL-Max far surpasses all open-source models in authoritative evaluations such as MMMU and MathVista, and surpasses GPT-4V in tasks such as document analysis (DocVQA) and Chinese image-related (MM-Bench-CN), reaching world-leading levels.
Qwen-VL-Max has stronger visual reasoning capabilities, and can recognize people, answer questions, create, and write code based on images. For example, it can identify a person’s identity, occupation, and personality traits based on a portrait; it can answer questions such aswhat is happening in an image and who is doing what based on a scene image; and it can generate product descriptions and instructions based on a product image.
In addition, Qwen-VL-Max also has excellent Chinese comprehension capabilities. It can understand complex Chinese texts, answer open-ended questions, and generate fluent and natural Chinese articles. For example, it can summarize the main content of a news report based on a news story; it can write a character analysis based on a novel; and it can generate an essay based on a topic.
Alibaba Cloud said that the upgrade of Qwen-VL-Max benefits from its advanced algorithms and massive Chinese training data. The model uses a Transformer neural network architecture and has undergone large-scale pre-training. The training data includes billions of images, hundreds of billions of Chinese words, and millions of Chinese documents.
The launch of Qwen-VL-Max marks a major breakthrough for Alibaba Cloud in the field of multimodal large models. The model will be widely used in image recognition, natural language processing, code generation, and other fields, empowering the digital transformation of various industries.
【来源】https://news.mydrivers.com/1/960/960575.htm
Views: 1