智源发布中文多模态评测基准CMMU

近日，智源研究院发布了中文多模态多题型理解及推理评测基准CMMU，旨在推动人工智能在教育领域的应用。CMMU v0.1版本包含了3603道题目，覆盖了小学、初中、高中教育阶段的考试题型，包括单选题、多选题和填空题。这些题目均从中国教育体系规范指导下的全国性考试中抽取，确保了评测的权威性和实用性。

智源研究院在CMMU的研发过程中，采用了多重评测手段，以避免模型仅凭猜测得出正确答案。这一做法显著提高了评测的难度和准确性。然而，根据智源研究院的测试结果，OpenAI推出的GPT-4V多模态模型在CMMU上的答题准确率仅为30%左右。这一结果表明，尽管GPT-4V在多模态模型领域取得了显著进展，但在图像理解和推理能力方面仍有较大的提升空间。

智源研究院的这一成果，不仅为多模态模型的研发提供了新的基准，也为人工智能在教育领域的应用提供了重要的参考。通过CMMU的评测，研究人员可以更准确地评估和改进模型的性能，进而推动人工智能技术在教育领域的深入发展。

Title: Synced Launches Chinese Multimodal Evaluation Benchmark CMMU
Keywords: Multimodal Evaluation, GPT-4V, Education System
News content:
Recently, Synced Research Institute released the Chinese Multimodal Multi-type Understanding and Reasoning Evaluation Benchmark CMMU, aiming to promote the application of artificial intelligence in the field of education. The CMMU v0.1 version contains 3603 questions, covering examination types from primary school to high school, including single-choice, multiple-choice, and fill-in-the-blank questions. These questions are all extracted from national examinations under the guidance of the Chinese education system, ensuring the authoritativeness and practicality of the evaluation.

During the development of CMMU, Synced Research Institute adopted multiple evaluation methods to avoid the model guessing the correct answer. This approach significantly increased the difficulty and accuracy of the evaluation. However, according to the test results from Synced Research Institute, the OpenAI-released GPT-4V multimodal model has an accuracy rate of only about 30% on CMMU. This result indicates that although GPT-4V has made significant progress in the field of multimodal models, there is still much room for improvement in image understanding and reasoning abilities.

The achievement of Synced Research Institute not only provides a new benchmark for the development of multimodal models but also offers important references for the application of artificial intelligence in the field of education. Through the evaluation of CMMU, researchers can more accurately assess and improve the performance of models, thus promoting the in-depth development of artificial intelligence technology in education.

【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A