智源研究院发布大模型评测体系,公布140余个模型评估结果
5月17日,智源研究院在京举办大模型评测发布会,对外公布了140余个国内外开源及商业闭源的语言及多模态大模型全方位能力评测结果。此次评测中,智源研究院引入了海淀区教师进修学校新编的小学三年级至高三的6个学科共计45套试卷,1400道试题,对各模型的学科应用能力进行了全面评估。
评测结果显示,在综合得分率方面,表现优异的前五名大模型均为闭源模型,分别是通义Qwen-vl-max、百度文心一言4.0、智谱华章GLM-4、百川智能Baichuan3、GPT-4。然而,尽管这些模型在综合能力上表现出色,但在学科测验上,其表现仍略低于海淀区各年级学生的平均水平。
智源研究院的此次评测旨在为业界提供更全面的模型能力评估数据,帮助研究人员和开发者更好地理解不同模型在实际应用中的表现,并据此进行优化和改进。同时,这也为教育领域提供了参考,帮助教育工作者了解人工智能在教育中的实际应用效果。
据悉,智源研究院将继续推进大模型评测体系的完善和更新,以期为人工智能领域的研究和应用提供更加科学、客观的评估标准。
英语如下:
Title: “Zhiyuan Evaluation Unveiled: Open Source Large Models Lead, Show Slight Disadvantage in Subject Tests Compared to Haidian’s Average”
Keywords: Zhiyuan Evaluation, Large Model Assessment, Subject Test
Content: Zhiyuan Institute of Artificial Intelligence Releases Large Model Evaluation Framework
On May 17, the Zhiyuan Institute of Artificial Intelligence held a large model evaluation launch event in Beijing, unveiling the comprehensive assessment results of more than 140 language and multimodal large models from both open source and proprietary international and domestic sources. The evaluation included the introduction of 45 sets of exam papers covering six subjects from Grade 3 to Grade 12 at Haidian District Teachers’ Continuing Education School, totaling 1,400 questions, to comprehensively assess the subject application capabilities of each model.
The evaluation results showed that in terms of overall passing rate, the top five performing large models, which excelled, were all proprietary models: Tongyin Qwen-vl-max, Baidu Wenxin Yiyan 4.0, Zhisu Huazhang GLM-4, Baichuan Intelligent Baichuan3, and GPT-4. However, despite their excellent overall capabilities, these models still showed a slight disadvantage in subject tests compared to the average performance of students in each grade across Haidian District.
The evaluation by the Zhiyuan Institute of Artificial Intelligence aims to provide the industry with more comprehensive data on model capabilities, helping researchers and developers better understand the performance of different models in practical applications and making optimizations and improvements accordingly. At the same time, it also serves as a reference for the education sector, helping educators understand the actual impact of artificial intelligence in education.
It is reported that the Zhiyuan Institute of Artificial Intelligence will continue to advance the perfection and update of the large model evaluation system, aiming to provide more scientific and objective evaluation standards for the research and application of artificial intelligence in the field.
【来源】https://www.jiemian.com/article/11186669.html
Views: 2