国产端侧AI模型超越GPT-4V，手机秒懂视频成现实

国产端侧模型新力作：面壁“小钢炮”MiniCPM-V 2.6模型震撼发布，以80亿参数一举突破200亿以下参数的多模态理解技术壁垒，全面超越GPT-4V，成为端侧多模态理解领域的佼佼者。这款模型不仅在单图、多图、视频理解方面取得世界领先的成绩，而且在端侧推理速度上也大幅提升，成为AI领域的一大突破。

MiniCPM-V 2.6模型采用int4量化技术，能够在6GB内存环境下运行，推理速度高达18tokens/s，较上一代快了33%。它支持多种语言，并具备实时视频理解、多图联合理解等多项前沿功能，极大地提升了端侧AI传感器的使用效率和用户体验。

该模型还拥有比同类模型更低的视觉token数量，这意味着它能够以更高的效率处理图像信息，从而在知识压缩率上取得了显著优势。在权威评测平台OpenCompass、Mantis-Eval和Video-MME上的测试中，MiniCPM-V 2.6均取得了领先的成绩，特别是在OCR性能上，它实现了开源+闭源模型中的SOTA水平，显示了其在端侧OCR领域的强大实力。

此外，MiniCPM-V 2.6的幻觉评测结果也表现出色，幻觉率远低于其他商用模型，这表明它在理解真实世界场景方面更加准确和可靠。

MiniCPM-V 2.6的发布，标志着国产端侧模型在人工智能领域迈出了坚实的一步，为移动设备、智能设备等提供了更加高效、智能的多模态理解能力，为用户带来了更加便捷和智能的服务体验。随着技术的不断进步，未来端侧AI将更加深入地融入人们的日常生活，为人类社会的发展带来新的动力。

英语如下：

News Title: “China’s End-to-End AI Model Surpasses GPT-4V, Making Video Understanding a Reality for Smartphones”

Keywords: Domestic End-to-End Model, SOTA Breakthrough, Multimodal Understanding

News Content:

China’s latest end-to-end model breakthrough: The “MiniCPM-V 2.6” model, known as the “Barrier-Facing Mini Cannon,” was unveiled with 80 billion parameters, breaking through the technical barrier of multimodal understanding with fewer than 200 billion parameters and surpassing GPT-4V to become a leader in the end-to-end multimodal understanding field. The model not only achieves world-leading results in single-image, multi-image, and video understanding but also significantly improves end-to-end inference speed, marking a major breakthrough in the AI field.

The MiniCPM-V 2.6 model utilizes int4 quantization technology and can run in a 6GB memory environment, with an inference speed of up to 18 tokens/s, a 33% increase over its predecessor. It supports multiple languages and features real-time video understanding, multi-image joint understanding, and other cutting-edge functions, greatly enhancing the efficiency of end-to-end AI sensors and user experience.

The model also boasts a lower number of visual tokens compared to its peers, indicating its ability to process image information more efficiently, giving it a significant advantage in knowledge compression rate. In tests conducted on authoritative platforms such as OpenCompass, Mantis-Eval, and Video-MME, the MiniCPM-V 2.6 model achieved leading results, particularly excelling in OCR performance, achieving the SOTA level among both open-source and closed-source models, demonstrating its strong strength in the field of end-to-end OCR.

Additionally, the MiniCPM-V 2.6 model’s hallucination testing results are outstanding, with hallucination rates far below those of other commercial models, indicating its greater accuracy and reliability in understanding real-world scenarios.

The release of the MiniCPM-V 2.6 model marks a solid step forward for China’s end-to-end models in the field of artificial intelligence, providing mobile devices and smart devices with more efficient and intelligent multimodal understanding capabilities, offering users more convenient and intelligent service experiences. As technology continues to advance, future end-to-end AI will be more deeply integrated into people’s daily lives, bringing new momentum to the development of human society.

【来源】https://zhidx.com/p/436844.html