智源研究院近日发布了一项针对多模态模型的中文评测基准 CMMU,该评测基准旨在对模型的中文理解及推理能力进行全方位考核。据悉,CMMU 从我国小学、初中、高中考试题中抽取并制作了 3603 道题目,涵盖单选题、多选题、填空题等多种题型。为了防止模型随机猜对答案,评测基准采用了多重评测手段,整体难度较高。

值得注意的是,OpenAI 推出的 GPT-4V 多模态模型在 CMMU 评测中的答题准确率仅约为 30%。经过错误类型分析,研究发现 GPT-4V 在图像理解和推理能力方面仍有待提高。这表明,尽管 GPT-4V 具有一定的中文答题能力,但在面对更为复杂的中文多模态题目时,其表现仍有待提升。

此次 CMMU 评测基准的发布,将为我国人工智能领域的研究提供有力支持,推动中文多模态模型的发展。同时,也为广大师生提供了衡量模型性能的权威标准,有助于进一步挖掘和优化模型的潜力。

News content:

The recent release of the Chinese Multimodal Multitask Understanding and Reasoning Evaluation Benchmark (CMMU) by the Zhiyuan Research Institute aims to comprehensively assess the Chinese understanding and reasoning abilities of multimodal models. The CMMU consists of 3603 questions extracted and created from national exams in Chinese primary, junior high, and senior high schools, covering multiple question types such as single-choice, multiple-choice, and fill-in-the-blank questions. To prevent the model from randomly guessing the answers, the evaluation benchmark adopts multiple evaluation methods, making it overall difficult.

It is worth noting that the GPT-4V multimodal model developed by OpenAI achieved a question-answering accuracy of only about 30% in the CMMU evaluation. Through error type analysis, it was found that GPT-4V still needs to improve in terms of image understanding and reasoning abilities. This indicates that although GPT-4V has some ability to answer Chinese questions, its performance still needs to be improved when facing more complex Chinese multimodal questions.

The release of the CMMU evaluation benchmark will provide strong support for research in the field of artificial intelligence in China and promote the development of Chinese multimodal models. At the same time, it also provides an authoritative standard for teachers and students to measure the performance of models, helping to further explore and optimize their potential.

【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A

Views: 0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注