Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

0

智源研究院发布大模型评测体系,公布140余个模型评估结果

5月17日,智源研究院在京举办大模型评测发布会,对外公布了140余个国内外开源及商业闭源的语言及多模态大模型全方位能力评测结果。此次评测中,智源研究院引入了海淀区教师进修学校新编的小学三年级至高三的6个学科共计45套试卷,1400道试题,对各模型的学科应用能力进行了全面评估。

评测结果显示,在综合得分率方面,表现优异的前五名大模型均为闭源模型,分别是通义Qwen-vl-max、百度文心一言4.0、智谱华章GLM-4、百川智能Baichuan3、GPT-4。然而,尽管这些模型在综合能力上表现出色,但在学科测验上,其表现仍略低于海淀区各年级学生的平均水平。

智源研究院的此次评测旨在为业界提供更全面的模型能力评估数据,帮助研究人员和开发者更好地理解不同模型在实际应用中的表现,并据此进行优化和改进。同时,这也为教育领域提供了参考,帮助教育工作者了解人工智能在教育中的实际应用效果。

据悉,智源研究院将继续推进大模型评测体系的完善和更新,以期为人工智能领域的研究和应用提供更加科学、客观的评估标准。

英语如下:

Title: “Zhiyuan Evaluation Unveiled: Open Source Large Models Lead, Show Slight Disadvantage in Subject Tests Compared to Haidian’s Average”

Keywords: Zhiyuan Evaluation, Large Model Assessment, Subject Test

Content: Zhiyuan Institute of Artificial Intelligence Releases Large Model Evaluation Framework

On May 17, the Zhiyuan Institute of Artificial Intelligence held a large model evaluation launch event in Beijing, unveiling the comprehensive assessment results of more than 140 language and multimodal large models from both open source and proprietary international and domestic sources. The evaluation included the introduction of 45 sets of exam papers covering six subjects from Grade 3 to Grade 12 at Haidian District Teachers’ Continuing Education School, totaling 1,400 questions, to comprehensively assess the subject application capabilities of each model.

The evaluation results showed that in terms of overall passing rate, the top five performing large models, which excelled, were all proprietary models: Tongyin Qwen-vl-max, Baidu Wenxin Yiyan 4.0, Zhisu Huazhang GLM-4, Baichuan Intelligent Baichuan3, and GPT-4. However, despite their excellent overall capabilities, these models still showed a slight disadvantage in subject tests compared to the average performance of students in each grade across Haidian District.

The evaluation by the Zhiyuan Institute of Artificial Intelligence aims to provide the industry with more comprehensive data on model capabilities, helping researchers and developers better understand the performance of different models in practical applications and making optimizations and improvements accordingly. At the same time, it also serves as a reference for the education sector, helping educators understand the actual impact of artificial intelligence in education.

It is reported that the Zhiyuan Institute of Artificial Intelligence will continue to advance the perfection and update of the large model evaluation system, aiming to provide more scientific and objective evaluation standards for the research and application of artificial intelligence in the field.

【来源】https://www.jiemian.com/article/11186669.html

Views: 1

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注