蚂蚁百灵大模型迈入多模态时代：人类般的理解与感知新纪元开启

7月5日，在2024世界人工智能大会“可信大模型助力产业创新发展”论坛上，蚂蚁集团公布了其自主研发的大模型最新进展——百灵大模型已具备原生多模态能力。这一技术的突破，标志着人工智能在理解和感知能力上取得了重大进展。

蚂蚁集团副总裁徐鹏介绍了百灵大模型的这一全新能力。他表示，多模态技术能让大模型像人一样感知和互动，使AI更好地理解人类世界的复杂信息，更贴近人类的交互习惯。这一技术的实现，在国内仅有少数大模型厂商能够达到。

百灵大模型的原生多模态能力，已经应用于“支付宝智能助理”上，未来还将支持支付宝上更多智能体的升级。这一能力使得大模型能够直接理解和训练音频、视频、图像、文字等多模态数据，具备支持规模化应用的能力，能在AIGC、图文对话、视频理解、数字人等一系列下游任务中发挥巨大作用。

在大会上，百灵大模型的多模态能力在中文图文理解MMBench-CN评测集上达到了GPT-4o水平，同时在信通院多模态安全能力评测中达到优秀级（最高）。这一技术的突破，为AI在智能客服、自动驾驶、医疗诊断等领域的应用提供了巨大的潜力。

蚂蚁集团拥有广泛的应用场景，百灵大模型的多模态能力将助推其在生活服务、搜索推荐、互动等领域的应用更上一层楼。此次技术的突破，无疑将为人工智能的发展开启新的篇章。

英语如下：

News Title: Ant Group’s BaiduBigModel Ushers in a Multimodal Era: A New Era of Human-like Understanding and Perception

Keywords: Ant BaiduBigModel, Multimodal Capabilities, Application Upgrade

News Content:

Title: Ant Group’s BaiduBigModel Demonstrates New Multimodal Capabilities, Leading the New Wave of AI Development

On July 5th, at the 2024 World Artificial Intelligence Conference forum on “Trusted Big Models to Drive Industrial Innovation and Development,” Ant Group announced the latest progress of its self-developed big model – the BaiduBigModel has acquired native multimodal capabilities. This technological breakthrough marks a significant advancement in artificial intelligence’s understanding and perception abilities.

Ant Group’s Vice President Xu Peng introduced this new capability of the BaiduBigModel. He stated that multimodal technology enables the big model to perceive and interact like a human, allowing AI to better understand complex information in the human world and be closer to human interactive habits. This technology achievement is only achieved by a few big model manufacturers in China.

The native multimodal capabilities of the BaiduBigModel have been applied to the “Alipay Intelligent Assistant” and will support upgrades for more smart bodies on Alipay in the future. This capability enables the big model to directly understand and train multimodal data such as audio, video, images, and text, possess the ability to support scaled applications, and play a significant role in a series of downstream tasks such as AIGC, graphic conversation, video understanding, digital humans, etc.

At the conference, the BaiduBigModel’s multimodal capabilities achieved GPT-4o level on the MMBench-CN evaluation set for Chinese image and text understanding, and also achieved an excellent score (the highest) in the Information Technology and Communications Research Institute’s multimodal security capability evaluation. This technological breakthrough provides enormous potential for AI applications in areas such as smart customer service, autonomous driving, and medical diagnosis.

Ant Group has a wide range of application scenarios, and the BaiduBigModel’s multimodal capabilities will propel its applications in areas such as life services, search recommendations, and interactions to the next level. This technological breakthrough will undoubtedly open a new chapter in artificial intelligence development.

【来源】https://www.jiqizhixin.com/articles/2024-07-05