##多模态模型评测框架LMMs-Eval发布:全面覆盖,低成本,零污染
随着大模型研究的深入,如何将其推广到更多的模态上已经成为了学术界和产业界的热点。最近发布的闭源大模型如 GPT-4o、Claude 3.5等都已经具备了超强的图像理解能力,开源领域模型如 LLaVA-NeXT、MiniCPM、InternVL 等也展现出了越来越接近闭源的性能。在这个“亩产八万斤”, “10 天一个 SoTA”的时代,简单易用、标准透明、可复现的多模态评估框架变得越来越重要,而这并非易事。
为了解决以上问题,来自南洋理工大学 LMMs-Lab 的研究人员联合开源了 LMMs-Eval,这是一个专为多模态大型模型设计的评估框架,为多模态模型(LMMs)的评测提供了一站式、高效的解决方案。
**LMMs-Eval 的主要特点:**
* **全面覆盖:** LMMs-Eval 包含了 80 多个数据集和 10 多个模型,并且还在持续增加中。
* **低成本:** LMMs-Eval 提供了一键式启动功能,用户无需进行任何准备,只需一条命令即可自动下载并测试多个数据集和模型。
* **零污染:** LMMs-Eval 内置了统一的 logging 工具,保证了可复现性和透明性,同时避免了数据泄露问题。
**LMMs-Eval 的愿景:**
LMMs-Eval 的愿景是未来的多模态模型不再需要自行编写数据处理、推理以及提交代码。通过接入 LMMs-Eval,模型训练者可以将更多精力集中在模型本身的改进和优化上,而不是在评测和对齐结果上耗费时间。
**评测的“不可能三角”:**
LMMs-Eval 的最终目标是找到一种 1. 覆盖广 2. 成本低 3. 零数据泄露 的方法来评估 LMMs。然而,即使有了 LMMs-Eval,作者团队发现想同时做到这三点困难重重,甚至是不可能的。
为了解决这一问题,LMMs-Eval 提出了两种解决方案:
* **LMMs-Eval-Lite:** 广覆盖轻量级评估,在保证评测多样性的同时降低评测成本。
* **LiveBench:** 专门针对低成本和零数据泄露的评估方法。
**LMMs-Eval 的未来:**
LMMs-Eval 自 2024 年 3 月发布以来,已经收到了来自开源社区、公司和高校等多方的协作贡献。现已在 Github 上获得 1.1K Stars,超过 30+ contributors,未来将继续完善和发展,为多模态模型的评估提供更加强大和便捷的工具。
**LMMs-Eval 的发布将为多模态模型的研究和应用带来巨大的推动作用,也为人工智能领域的未来发展提供了新的方向。**
**相关链接:**
* 代码仓库: https://github.com/EvolvingLMMs-Lab/lmms-eval
* 官方主页: https://lmms-lab.github.io/
* 论文地址: https://arxiv.org/abs/2407.12772
* 榜单地址:https://huggingface.co/spaces/lmms-lab/LiveBench
英语如下:
##LMMs-Eval: A Comprehensive, Low-Cost, and Zero-Pollution Evaluation Framework for Multimodal Models
**Keywords:** Multimodal, Evaluation,Framework
**News Content:**
The advancement of large language models has sparked a surge of interest in extending their capabilities to encompass multiple modalities. Recently released closed-source models like GPT-4o and Claude 3.5 have demonstrated remarkable image understanding abilities, while open-source models such as LLaVA-NeXT, MiniCPM, and InternVL are increasingly approaching the performance of their closed-source counterparts. In this era of rapid progress, where breakthroughs are achieved daily, a user-friendly, transparent, and reproducible multimodal evaluation framework becomes paramount,yet its development poses significant challenges.
To address these challenges, researchers from the LMMs-Lab at Nanyang Technological University have jointly released LMMs-Eval, an evaluation framework specifically designed for large multimodal models. It provides aone-stop, efficient solution for evaluating multimodal models (LMMs).
**Key Features of LMMs-Eval:**
* **Comprehensive Coverage:** LMMs-Eval encompasses over 80 datasets and 10 models, with continuous expansion underway.
* **Low Cost:** LMMs-Eval offers one-click launch functionality, eliminating the need for user preparation. A single command automatically downloads and tests multiple datasets and models.
* **Zero Pollution:** LMMs-Eval incorporates a unified logging tool, ensuring reproducibility and transparency while preventing data leakage.
**Vision of LMMs-Eval:**
LMMs-Eval envisions a future where multimodal models no longer require developers to write their own data processing, inference, and submission code. By integrating with LMMs-Eval, model trainers can focus their efforts on model improvement and optimization, rather than spending time on evaluation and aligning results.
**The “Impossible Triangle” of Evaluation:**
LMMs-Eval’s ultimate goal is to identify a method for evaluating LMMs that is 1. comprehensive, 2. low-cost, and 3. free from data leakage. However, even with LMMs-Eval, the team has discovered thatachieving all three simultaneously is extremely challenging, if not impossible.
To address this challenge, LMMs-Eval proposes two solutions:
* **LMMs-Eval-Lite:** A lightweight evaluation framework with broad coverage, balancing evaluation diversity with reduced cost.
* **LiveBench:** A dedicated evaluation method focusedon low cost and zero data leakage.
**Future of LMMs-Eval:**
Since its release in March 2024, LMMs-Eval has garnered collaborative contributions from the open-source community, companies, and universities. It has already achieved 1.1K Stars and over30 contributors on Github. LMMs-Eval will continue to evolve and enhance its capabilities, providing a more powerful and convenient tool for multimodal model evaluation.
**The release of LMMs-Eval will significantly accelerate research and applications in multimodal models, paving the way for future advancements in the field of artificial intelligence.**
**Relevant Links:**
* Code Repository: https://github.com/EvolvingLMMs-Lab/lmms-eval
* Official Website: https://lmms-lab.github.io/
* Paper: https://arxiv.org/abs/2407.12772
* Leaderboard: https://huggingface.co/spaces/lmms-lab/LiveBench
【来源】https://www.jiqizhixin.com/articles/2024-08-21-3
Views: 1