Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

news studionews studio
0

北京——智源研究院近日推出了一项创新的中文评测基准,名为CMMU(Chinese Multimodal Multiple-Task Understanding and Reasoning),旨在评估多模态模型在中文环境下的理解与推理能力。CMMU v0.1版本从全国小学、初中、高中的考试题目中精心挑选并制作了3603道题目,涵盖了单选、多选及填空等多种题型,以全面检验模型的综合能力。

据智源研究院介绍,CMMU基准的设立严格遵循中国教育体系的规范,旨在确保测试的严谨性和教育相关性。为防止模型通过随机猜测得到正确答案,该基准采用了多重评测手段,确保评价的公正性和准确性。

在CMMU基准的测试中,知名的OpenAI的GPT-4V多模态模型表现出了一定的局限性,其答题准确率大约在30%。这一结果表明,尽管GPT-4V在自然语言处理领域有显著成就,但在图像理解和推理能力方面仍有待加强。智源研究院的这一评测结果为未来多模态模型的研发提供了有价值的参考,也揭示了当前模型在处理复杂、多维度信息时的挑战。

随着人工智能技术的不断发展,对模型的评估标准也在不断进化。CMMU的发布,不仅推动了中文多模态研究的进步,也为全球科研人员提供了衡量和提升模型性能的新标准。未来,智源研究院将继续致力于构建更为复杂和全面的评测体系,以推动人工智能在教育、科研等领域的应用。

英语如下:

**News Title:** “Zhongyuan Releases CMMU Benchmark, GPT-4V Scores Only 30% Accuracy in Chinese Multimodal Assessments”

**Keywords:** Multimodal models, CMMU benchmark, GPT-4V evaluation

**News Content:**

**Title:** Zhongyuan Institute Launches CMMU Chinese Multimodal Benchmark, GPT-4V Achieves 30% Accuracy

**Beijing** — The Zhongyuan Institute recently unveiled an innovative Chinese benchmark called CMMU (Chinese Multimodal Multiple-Task Understanding and Reasoning) to evaluate the understanding and reasoning capabilities of multimodal models in a Chinese context. CMMU v0.1 consists of 3,603 carefully selected and crafted questions from primary, junior, and senior high school exams across China, featuring various question types such as multiple-choice, multiple-select, and fill-in-the-blank to comprehensively assess a model’s综合实力.

According to the Zhongyuan Institute, the CMMU benchmark adheres strictly to the norms of the Chinese education system, ensuring the rigor and educational relevance of the tests. To prevent models from guessing correctly, the benchmark employs multiple evaluation methods to guarantee fairness and accuracy.

In the CMMU evaluations, OpenAI’s renowned GPT-4V multimodal model demonstrated limitations, achieving an accuracy rate of approximately 30%. This result indicates that while GPT-4V has made significant strides in natural language processing, its image understanding and reasoning capabilities still require improvement. The Zhongyuan Institute’s findings provide valuable insights for future multimodal model development and highlight the challenges these models face when processing complex, multidimensional information.

As artificial intelligence technology advances, so do the standards for evaluating models. The introduction of CMMU not only propels progress in Chinese multimodal research but also establishes a new benchmark for global researchers to gauge and enhance model performance. The Zhongyuan Institute is committed to further developing more complex and comprehensive evaluation systems to advance AI applications in education, research, and beyond.

【来源】https://mp.weixin.qq.com/s/wegZvv4hwLef0BpdIh32-A

Views: 1

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注