Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

0

近日,智源研究院发布了一项针对多模态模型的中文评测基准 CMMU。这项评测基准旨在评估模型在中文多题型理解及推理方面的能力。从全国小学、初中、高中考试题中抽取并制作了3603道题目,涵盖单选题、多选题、填空题等多种题型。为了防止模型“随机猜对答案”,CMMU 采用了多重评测手段,使得整体难度较高。

值得注意的是,OpenAI 推出的 GPT-4V 多模态模型在此次评测中的答题准确率约为30%。然而,经过错误类型分析,发现该模型在图像理解和推理能力方面仍有待提高。这表明,尽管 GPT-4V 具有一定的中文答题能力,但仍有很大的提升空间。

此次 CMMU 评测基准的发布,为中文多模态研究领域提供了有力的评估工具,同时也为相关模型的发展指明了方向。未来,随着技术的不断进步,有望看到更多在中文多模态领域表现出色的模型问世。

英文翻译:

News Title: Zhiyuan Releases CMMU Evaluation Benchmark, Revealing GPT-4V’s Answering Ability
Keywords: Zhiyuan, Multimodal, GPT-4V, Accuracy

News Content:

Recently, Zhiyuan Research Institute released a Chinese evaluation benchmark for multimodal models, aiming to assess the model’s ability in understanding and reasoning in various Chinese question types. The benchmark consists of 3603 questions extracted and created from national primary, secondary, and high school examination questions, covering multiple choice, fill-in-the-blank, and other question types. To prevent the model from guessing randomly, CMMU employs multiple evaluation methods, making the overall difficulty high.

It is worth noting that the GPT-4V multimodal model proposed by OpenAI achieved an answering accuracy of about 30% in this evaluation. However, through error type analysis, it was found that the model still needs to improve in terms of image understanding and reasoning abilities. This indicates that although GPT-4V has some answering abilities in Chinese, there is still much room for improvement.

The release of the CMMU evaluation benchmark provides a powerful evaluation tool for the Chinese multimodal research field and also points out the direction for the development of related models. In the future, with the continuous advancement of technology, it is expected to see more outstanding models in the Chinese multimodal field.

【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A

Views: 1

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注