北京智源研究院近日发布了一项名为 CMMU 的中文多模态多题型理解及推理评测基准。这一评测基准从全国小学、初中、高中考试题中抽取并制作了 3603 道题目,涵盖单选题、多选题和填空题等多种题型。为了确保评测的公正性和准确性,CMMU 采用了多重评测手段,以避免模型“随机猜对答案”。

然而,尽管智源研究院对 CMMU 进行了严格的设计和制作,但 OpenAI 推出的 GPT-4V 多模态模型在答题时的准确率仅有约 30%。经错误类型分析,GPT-4V 在图像理解和推理能力方面还有待提高。这一结果反映出尽管人工智能在多模态理解方面取得了一定的进展,但仍存在很大的提升空间。

CMMU 的发布为中文多模态理解及推理评测提供了一个新的标准,对促进人工智能领域的研究和发展具有重要意义。未来,随着人工智能技术的不断进步,我们有理由相信 CMMU 将会发挥更大的作用。

English Translation:
Beijing Academy of Artificial Intelligence recently released a Chinese Multimodal Multitask Understanding and Reasoning Evaluation Benchmark, named CMMU. This benchmark consists of 3603 questions extracted and compiled from national primary, junior high and high school exams, covering various question types such as multiple choice, fill in the blank, and so on. To ensure fairness and accuracy in evaluation, CMMU employs multiple evaluation methods to prevent models from randomly guessing the right answers.

However, despite the rigorous design and production of CMMU by the Beijing Academy of Artificial Intelligence, the GPT-4V multimodal model developed by OpenAI achieved an accuracy rate of only about 30% in answering questions. Error type analysis shows that GPT-4V still needs improvement in image understanding and reasoning. This result reflects that although artificial intelligence has made some progress in multimodal understanding, there is still much room for improvement.

The release of CMMU provides a new standard for Chinese multimodal understanding and reasoning evaluation, which is of great significance for promoting research and development in the field of artificial intelligence. In the future, with the continuous advancement of artificial intelligence technology, we have reason to believe that CMMU will play a greater role.

【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注