智源评测揭示：闭源大模型领跑语言及多模态能力竞赛

**智源评测发布大模型全方位能力评估：闭源模型表现优异**

近日，智源研究院推出了一项名为“智源评测”的大模型评估体系，并在5月17日举办了一场大模型评测发布会。发布会公布了国内外140余个开源和商业闭源的语言及多模态大模型的全方位能力评测结果。

此次评测中，智源研究院引入了海淀区教师进修学校新编的小学三年级至高三学段的试卷，覆盖语文、数学、英语、物理、化学、历史6个学科，共计45套试卷，1400道试题。评测结果严格按照各年级、各学科的综合得分率进行排名。

值得关注的是，在综合得分率方面，表现优异的前五名均为闭源大模型。其中，通义Qwen-vl-max、百度文心一言4.0、智谱华章GLM-4、百川智能Baichuan3、GPT-4等模型表现突出。这些模型在语言理解和生成、多模态处理等方面展现出了强大的能力。

然而，在学科测验方面，大模型的表现略低于海淀各年级学生的平均水平。这一结果反映出，尽管大模型在处理通用任务方面具有明显优势，但在特定领域的应用上，仍存在一定的差距。这也为我国大模型研究和发展提供了新的思路和方向。

此次评测结果的发布，不仅展示了闭源大模型在综合能力方面的优势，也反映出我国在大模型领域的研究成果。智源评测体系的推出，有助于推动我国大模型技术的进步，进一步促进人工智能技术在各行各业中的应用。

未来，智源研究院将继续深化评测体系，进一步完善大模型的评估标准，以期为我国人工智能技术的发展提供更有力的支持。同时，随着开放数据的增多和算法研究的深入，我们有理由相信，大模型的性能将不断提高，为人类社会带来更多惊喜。

英语如下：

**Title:** “AI Benchmarking Revealed: Closed-source Models Lead in Language and Multimodal Competitions”

**Keywords:** AI Benchmarking, Large Model Competition, Beijing Student Level

**News Content:**

**AI Benchmarking Releases Comprehensive Assessment of Large Models: Closed-source Models Excel**

Recently, the AI Research Institute of Beijing released an AI benchmarking system called “AI Benchmarking,” which held an evaluation launch event for large models on May 17. The event announced the comprehensive capability assessment results of over 140 open-source and closed-source language and multimodal large models from both domestic and international participants.

In this assessment, the AI Research Institute introduced new sets of exams compiled by the Beijing Haidian District Teacher Training School for grades 3 to 12, covering six subjects: Chinese, Mathematics, English, Physics, Chemistry, and History, totaling 45 sets of papers and 1400 questions. The ranking was strictly based on the overall score rate of each grade and subject.

Notably, the top five models with the highest overall score rates are all closed-source models. Among them, Tengyi Qwen-vl-max, Baidu Wenxin 4.0, Zhipu Huaweiang GLM-4, Baichuan Intelligence Baichuan3, and GPT-4 models performed outstandingly. These models demonstrated powerful capabilities in language understanding and generation, as well as multimodal processing.

However, in subject tests, the performance of large models was slightly lower than the average level of students in the Haidian District. This result reflects that although large models have significant advantages in general tasks, there is still a certain gap in specific field applications. This also provides new insights and directions for the research and development of large models in our country.

The release of these assessment results not only showcases the superiority of closed-source large models in comprehensive capabilities but also reflects the research achievements of our country in the field of large models. The introduction of the AI Benchmarking system helps promote the progress of large model technology in our country and further promotes the application of artificial intelligence technology in various industries.

In the future, the AI Research Institute of Beijing will continue to deepen the benchmarking system, further improve the evaluation standards for large models, and expects to provide more robust support for the development of artificial intelligence technology in our country. At the same time, with the increase in open data and in-depth algorithm research, there is reason to believe that the performance of large models will continue to improve, bringing more surprises to human society.

【来源】https://www.jiemian.com/article/11186669.html