





Title: “Zhiyuan Evaluation Unveiled: Open Source Large Models Lead, Show Slight Disadvantage in Subject Tests Compared to Haidian’s Average”

Keywords: Zhiyuan Evaluation, Large Model Assessment, Subject Test

Content: Zhiyuan Institute of Artificial Intelligence Releases Large Model Evaluation Framework

On May 17, the Zhiyuan Institute of Artificial Intelligence held a large model evaluation launch event in Beijing, unveiling the comprehensive assessment results of more than 140 language and multimodal large models from both open source and proprietary international and domestic sources. The evaluation included the introduction of 45 sets of exam papers covering six subjects from Grade 3 to Grade 12 at Haidian District Teachers’ Continuing Education School, totaling 1,400 questions, to comprehensively assess the subject application capabilities of each model.

The evaluation results showed that in terms of overall passing rate, the top five performing large models, which excelled, were all proprietary models: Tongyin Qwen-vl-max, Baidu Wenxin Yiyan 4.0, Zhisu Huazhang GLM-4, Baichuan Intelligent Baichuan3, and GPT-4. However, despite their excellent overall capabilities, these models still showed a slight disadvantage in subject tests compared to the average performance of students in each grade across Haidian District.

The evaluation by the Zhiyuan Institute of Artificial Intelligence aims to provide the industry with more comprehensive data on model capabilities, helping researchers and developers better understand the performance of different models in practical applications and making optimizations and improvements accordingly. At the same time, it also serves as a reference for the education sector, helping educators understand the actual impact of artificial intelligence in education.

It is reported that the Zhiyuan Institute of Artificial Intelligence will continue to advance the perfection and update of the large model evaluation system, aiming to provide more scientific and objective evaluation standards for the research and application of artificial intelligence in the field.


Views: 1


您的邮箱地址不会被公开。 必填项已用 * 标注