智源发布CMMU基准，GPT-4V答题准确率仅30%

智源发布中文多模态评测基准，GPT-4V答题准确率仅30%

中国人工智能研究院智源研究院近日发布了中文多模态多题型理解及推理评测基准（CMMU）。该基准从中国教育体系规范指导下的全国小学、初中、高中考试题中抽取并制作了3603道题目，题型包括单选题、多选题、填空题，并采用多重评测手段避免模型“随机猜对答案”。

CMMU整体难度较高，OpenAI推出的GPT-4V多模态模型答题准确率仅在30%左右。经错误类型分析，GPT-4V在图像理解和推理能力方面还有待提高。

CMMU的发布旨在为中文多模态模型的开发和评估提供一个标准化的基准，推动中文多模态模型的研究和应用。

智源研究院表示，CMMU将持续更新和完善，未来将纳入更多题型和难度，并探索多模态模型在不同领域的应用。

多模态模型是人工智能领域近年来兴起的一种新兴技术，它能够处理多种类型的数据，如文本、图像、音频和视频。多模态模型在自然语言处理、计算机视觉、语音识别等领域表现出了强大的能力，被认为是人工智能的未来发展方向之一。

GPT-4V是OpenAI开发的最新一代多模态模型，在文本生成、问答、翻译等任务上表现出色。然而，CMMU的评测结果表明，GPT-4V在中文理解和推理方面还有较大的提升空间。

业内专家指出，CMMU的发布对于推动中文多模态模型的发展具有重要意义。它为模型开发人员提供了一个统一的评估标准，有助于促进模型的优化和改进。同时，CMMU也为用户提供了选择和使用中文多模态模型的参考依据。

英语如下：

**Headline:** Zhiyuan Releases CMMU Benchmark, GPT-4V Accuracy Only30%

**Keywords:** Multimodal Model, CMMU Evaluation, GPT-4V Accuracy

**News Content:** Zhiyuan Releases Chinese MultimodalMulti-Question Understanding and Reasoning Evaluation Benchmark (CMMU), GPT-4V Accuracy Only 30%

The Chinese Academy of Artificial Intelligence, Zhiyuan Institute, recently released the Chinese Multimodal Multi-Question Understanding and Reasoning Evaluation Benchmark (CMMU). This benchmark extracted and created 3603 questions fromnational primary school, junior high school, and high school exam questions under the guidance of China’s education system specifications. The question types include single-choice questions, multiple-choice questions, and fill-in-the-blank questions. Multiple evaluation methods are used to prevent models from “randomly guessing the correct answer”.

CMMU has a high overall difficulty, and the GPT-4V multimodal model released by OpenAI has an accuracy rate of only about 30%. Error type analysis shows that GPT-4V still needs to be improved in terms of image understanding and reasoning ability.

The release of CMMU aims toprovide a standardized benchmark for the development and evaluation of Chinese multimodal models, and to promote the research and application of Chinese multimodal models.

Zhiyuan Institute said that CMMU will continue to be updated and improved. In the future, more question types and difficulty levels will be included, and the application of multimodal models in different fields will be explored.

Multimodal models are a new type of technology that has emerged in the field of artificial intelligence in recent years. They can process multiple types of data, such as text, images, audio, and video. Multimodal models have shown strong capabilities in natural language processing, computer vision, speech recognition, and other fields, and are considered to be one of the future development directions of artificial intelligence.

GPT-4V is the latest generation of multimodal models developed by OpenAI, and it has performed well in text generation, question answering, translation, and other tasks. However, the CMMU evaluation results show that GPT-4V still has a lot of room for improvement in Chinese understanding and reasoning.

Industry experts pointed out that the release of CMMU is of great significance to promoting the development of Chinese multimodal models. It provides a unified evaluation standard for model developers, which is conducive to promoting the optimization and improvement of models.At the same time, CMMU also provides a reference for users to select and use Chinese multimodal models.

【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

智源发布CMMU基准，GPT-4V答题准确率仅30%

作者智能小编

相关文章

腾讯AI“元宝”杀入微信，13亿用户社交版图重塑？

2025人工智能：颠覆与新生

北大团队突破！单目长视频实时重建高质量3D点云

发表回复取消回复

为您推荐