北京智源研究院近日发布了一项名为CMMU的中文多模态多题型理解及推理评测基准。这一评测基准从全国小学、初中、高中考试题中抽取并制作了3603道题目,涵盖了单选题、多选题和填空题等多种题型。为了确保评测的准确性,CMMU采用了多重评测手段,避免了模型“随机猜对答案”的可能性。
然而,CMMU的难度较高,OpenAI推出的GPT-4V多模态模型在答题时的准确率仅有约30%。通过错误类型分析,可以看出在图像理解和推理能力方面,GPT-4V还有待提高。这一结果表明,尽管人工智能在多模态理解方面取得了一定的进展,但仍存在很大的提升空间。
The translation in English:
Title: Zhipu releases Chinese Multimodal Evaluation Benchmark CMMU, GPT-4V accuracy rate only 30%
Keywords: Zhipu release, multimodal evaluation, GPT-4V
News content:
Recently, the Beijing Academy of Artificial Intelligence has released a Chinese multimodal multitask understanding and reasoning evaluation benchmark called CMMU. This evaluation benchmark has extracted and produced 3603 questions from national primary, junior high, and high school exams, covering various question types such as multiple choice, fill in the blank, and true or false. To ensure the accuracy of the evaluation, CMMU adopts multiple evaluation methods, avoiding the possibility of models “randomly guessing the right answer”.
However, CMMU is quite challenging, and the GPT-4V multimodal model launched by OpenAI has an accuracy rate of only about 30% when answering questions. Through error type analysis, it can be seen that GPT-4V still needs improvement in image understanding and reasoning. This result indicates that although artificial intelligence has made some progress in multimodal understanding, there is still much room for improvement.
【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A
Views: 1