**智源研究院发布国内外大模型评测结果:闭源大模型表现亮眼,但略低于学生平均水平**
5月17日,智源研究院举办了大模型评测发布会,向公众公布了国内外140余个语言及多模态大模型的全方位能力评测结果。此次测评不仅覆盖了开源模型,也对商业闭源模型进行了深入评估。
在综合测评中,表现最为突出的前五名均为闭源大模型,依次为通义Qwen-vl-max、百度文心一言4.0、智谱华章GLM-4、百川智能Baichuan3以及GPT-4。这些大模型在多项任务中展现了高水平的能力。
值得一提的是,为了更贴近实际应用,智源研究院在测评中创新性地引入了海淀区教师进修学校编制的海淀学生试卷。涉及小学三年级至高三的六个学科,总计45套试卷、1400道试题。然而,令人颇感兴趣的是,尽管大模型在处理复杂任务上能力显著,但在这些学科测验中,其得分率却略低于各年级学生的平均水平。
对此,专家指出,这表明在当前阶段,尽管人工智能技术在某些方面取得了显著进展,但仍需进一步提高其在特定学科领域的深度理解和应用能力。此次评测发布会为行业提供了一个宝贵的参考,展示了现有大模型的优劣,为未来技术研发提供了新的方向。
智源研究院的评测结果引起了行业和社会的高度关注,为未来语言及多模态大模型的发展提供了有力的数据支撑。
英语如下:
News Title: “ZhiYuan Institute Releases Big Model Evaluation Results: Leading Technology Challenges Haidian Students, Who Will Win?”
Keywords: ZhiYuan Evaluation System, Big Model Evaluation Results, Subject Test Challenge
News Content: **ZhiYuan Institute Releases Evaluation Results of Domestic and Foreign Big Models: Closed-Source Big Models Show Bright Performance, but Slightly Lower than Student Average**
On May 17th, ZhiYuan Institute held a big model evaluation conference, announcing comprehensive ability evaluation results of more than 140 domestic and international language and multi-modal big models. This evaluation not only covered open-source models but also conducted in-depth assessments of commercial closed-source models.
In the comprehensive evaluation, the top five models that stood out were closed-source big models, including Tongyi Qwen-vl-max, Baidu Wenxin Yiyuan 4.0, Zhipu Huazhang GLM-4, Baichuan Intelligence Baichuan3, and GPT-4. These big models demonstrated high-level capabilities in multiple tasks.
It is worth mentioning that, to be closer to practical applications, ZhiYuan Institute innovatively introduced Haidian student examination papers compiled by Haidian Teacher Training School in the evaluation. Involving six subjects from third-grade to third-year high school students, a total of 45 sets of examination papers and 1,400 questions were included. However, what was interesting was that although big models have remarkable abilities in dealing with complex tasks, their scoring rate in these subject tests was slightly lower than the average level of students in each grade.
Experts pointed out that this indicates that at the current stage, although artificial intelligence technology has made significant progress in some areas, it still needs to further improve its deep understanding and application abilities in specific subject areas. The evaluation conference provided a valuable reference for the industry, showing the advantages and disadvantages of existing big models and providing new directions for future technology research and development.
The evaluation results of ZhiYuan Institute have attracted great attention from the industry and society, providing powerful data support for the development of language and multi-modal big models in the future.
【来源】https://www.jiemian.com/article/11186669.html
Views: 2