近日,智源研究院发布了一项针对多模态模型的中文评测基准 CMMU。这项评测基准旨在评估模型在中文多题型理解及推理方面的能力。从全国小学、初中、高中考试题中抽取并制作了3603道题目,涵盖单选题、多选题、填空题等多种题型。为了防止模型“随机猜对答案”,CMMU 采用了多重评测手段,使得整体难度较高。
值得注意的是,OpenAI 推出的 GPT-4V 多模态模型在此次评测中的答题准确率约为30%。然而,经过错误类型分析,发现该模型在图像理解和推理能力方面仍有待提高。这表明,尽管 GPT-4V 具有一定的中文答题能力,但仍有很大的提升空间。
此次 CMMU 评测基准的发布,为中文多模态研究领域提供了有力的评估工具,同时也为相关模型的发展指明了方向。未来,随着技术的不断进步,有望看到更多在中文多模态领域表现出色的模型问世。
英文翻译:
News Title: Zhiyuan Releases CMMU Evaluation Benchmark, Revealing GPT-4V’s Answering Ability
Keywords: Zhiyuan, Multimodal, GPT-4V, Accuracy
News Content:
Recently, Zhiyuan Research Institute released a Chinese evaluation benchmark for multimodal models, aiming to assess the model’s ability in understanding and reasoning in various Chinese question types. The benchmark consists of 3603 questions extracted and created from national primary, secondary, and high school examination questions, covering multiple choice, fill-in-the-blank, and other question types. To prevent the model from guessing randomly, CMMU employs multiple evaluation methods, making the overall difficulty high.
It is worth noting that the GPT-4V multimodal model proposed by OpenAI achieved an answering accuracy of about 30% in this evaluation. However, through error type analysis, it was found that the model still needs to improve in terms of image understanding and reasoning abilities. This indicates that although GPT-4V has some answering abilities in Chinese, there is still much room for improvement.
The release of the CMMU evaluation benchmark provides a powerful evaluation tool for the Chinese multimodal research field and also points out the direction for the development of related models. In the future, with the continuous advancement of technology, it is expected to see more outstanding models in the Chinese multimodal field.
【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A
Views: 1