北京智源研究院近日发布了一项名为 CMMU 的中文多模态多题型理解及推理评测基准。CMMU v0.1 版本包含了从中国教育体系规范指导下的全国小学、初中、高中考试题中抽取并制作的 3603 道题目,涵盖题型包括单选题、多选题和填空题。该评测基准采用多重评测手段,以避免模型“随机猜对答案”。

据了解,CMMU 的整体难度较高,即使是 OpenAI 推出的 GPT-4V 多模态模型,其答题准确率也只有大约 30%。错误类型分析显示,GPT-4V 在图像理解和推理能力方面还有待提高。

这一评测基准的发布,旨在为中文多模态模型提供更为全面的测试和评估,进一步推动中文自然语言处理技术的发展。

English Title: Zhipu Releases Chinese Multimodal Evaluation Benchmark CMMU, GPT-4V Accuracy Only 30%
Keywords: Zhipu release, multimodal evaluation, GPT-4V

News content:
Beijing Zhipu Research Institute recently released a Chinese multimodal multitype understanding and reasoning evaluation benchmark called CMMU. The CMMU v0.1 version includes 3603 questions extracted and produced from the national primary, junior high, and senior high school exam questions under the guidance of China’s educational system, covering question types such as single-choice, multiple-choice, and fill-in-the-blank questions. The evaluation benchmark adopts multiple evaluation methods to avoid models “randomly guessing the right answer”.

It is understood that the overall difficulty of CMMU is high, and even the GPT-4V multimodal model launched by OpenAI can only achieve an accuracy of about 30. Error type analysis shows that GPT-4V still needs to improve its image understanding and reasoning能力.

The release of this evaluation benchmark aims to provide a more comprehensive test and evaluation for Chinese multimodal models, further promoting the development of Chinese natural language processing technology.

【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注