智源研究院近日发布了中文多模态多题型理解及推理评测基准CMMU,旨在评估人工智能在中文理解方面的能力。CMMU v0.1版本包含3603道题目,涵盖单选题、多选题和填空题,这些题目均来自中国教育体系规范下的全国小学、初中和高中考试。智源研究院采用多重评测手段,确保模型不是随机猜对答案。评测结果显示,OpenAI的GPT-4V多模态模型在CMMU上的答题准确率约为30%,表明模型在图像理解和推理能力方面仍有较大提升空间。
英文标题:Zhiyuan CMMU Evaluates Chinese Understanding Abilities
英文关键词:Zhiyuan CMMU, GPT-4V, Multimodal Models
英文新闻内容:
The Zhiyuan Institute has recently released the CMMU (Chinese Multimodal Multi-choice Understanding and Reasoning Evaluation Benchmark), a benchmark designed to assess the capabilities of AI in understanding the Chinese language. CMMU v0.1 consists of 3,603 questions from national exams for primary, middle, and high schools in China, covering multiple-choice, multiple-selection, and fill-in-the-blank question types. The Zhiyuan Institute has employed multiple evaluation methods to ensure that the models are not merely guessing the answers randomly. The results indicate that the GPT-4V multimodal model from OpenAI has an accuracy rate of around 30% on the CMMU, suggesting that there is still significant room for improvement in the model’s image understanding and reasoning abilities.
【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A
Views: 1