智源发布CMMU评测基准，GPT-4V在中文多模态试题中准确率仅

北京——智源研究院近日推出了一项创新的中文评测基准，名为CMMU（Chinese Multimodal Multiple-Task Understanding and Reasoning），旨在评估多模态模型在中文环境下的理解与推理能力。CMMU v0.1版本从全国小学、初中、高中的考试题目中精心挑选并制作了3603道题目，涵盖了单选、多选及填空等多种题型，以全面检验模型的综合能力。

据智源研究院介绍，CMMU基准的设立严格遵循中国教育体系的规范，旨在确保测试的严谨性和教育相关性。为防止模型通过随机猜测得到正确答案，该基准采用了多重评测手段，确保评价的公正性和准确性。

在CMMU基准的测试中，知名的OpenAI的GPT-4V多模态模型表现出了一定的局限性，其答题准确率大约在30%。这一结果表明，尽管GPT-4V在自然语言处理领域有显著成就，但在图像理解和推理能力方面仍有待加强。智源研究院的这一评测结果为未来多模态模型的研发提供了有价值的参考，也揭示了当前模型在处理复杂、多维度信息时的挑战。

随着人工智能技术的不断发展，对模型的评估标准也在不断进化。CMMU的发布，不仅推动了中文多模态研究的进步，也为全球科研人员提供了衡量和提升模型性能的新标准。未来，智源研究院将继续致力于构建更为复杂和全面的评测体系，以推动人工智能在教育、科研等领域的应用。

英语如下：

**News Title:** “Zhongyuan Releases CMMU Benchmark, GPT-4V Scores Only 30% Accuracy in Chinese Multimodal Assessments”

**Keywords:** Multimodal models, CMMU benchmark, GPT-4V evaluation

**News Content:**

**Title:** Zhongyuan Institute Launches CMMU Chinese Multimodal Benchmark, GPT-4V Achieves 30% Accuracy

**Beijing** — The Zhongyuan Institute recently unveiled an innovative Chinese benchmark called CMMU (Chinese Multimodal Multiple-Task Understanding and Reasoning) to evaluate the understanding and reasoning capabilities of multimodal models in a Chinese context. CMMU v0.1 consists of 3,603 carefully selected and crafted questions from primary, junior, and senior high school exams across China, featuring various question types such as multiple-choice, multiple-select, and fill-in-the-blank to comprehensively assess a model’s综合实力.

According to the Zhongyuan Institute, the CMMU benchmark adheres strictly to the norms of the Chinese education system, ensuring the rigor and educational relevance of the tests. To prevent models from guessing correctly, the benchmark employs multiple evaluation methods to guarantee fairness and accuracy.

In the CMMU evaluations, OpenAI’s renowned GPT-4V multimodal model demonstrated limitations, achieving an accuracy rate of approximately 30%. This result indicates that while GPT-4V has made significant strides in natural language processing, its image understanding and reasoning capabilities still require improvement. The Zhongyuan Institute’s findings provide valuable insights for future multimodal model development and highlight the challenges these models face when processing complex, multidimensional information.

As artificial intelligence technology advances, so do the standards for evaluating models. The introduction of CMMU not only propels progress in Chinese multimodal research but also establishes a new benchmark for global researchers to gauge and enhance model performance. The Zhongyuan Institute is committed to further developing more complex and comprehensive evaluation systems to advance AI applications in education, research, and beyond.

【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A