Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

上海枫泾古镇正门_20240824上海枫泾古镇正门_20240824
0

智源研究院引领AI评测新高度

2024年9月4日,我国知名AI研究机构——智源研究院,在人工智能领域再创辉煌,发布了全球首个包含文生视频的模型对战评测服务——FlagEval大模型角斗场。这一创新举措,标志着我国在AI评测领域迈出了重要步伐。

评测服务涵盖四大任务,支持多轮交流

FlagEval大模型角斗场旨在为用户提供全面、客观的AI模型评测。该服务支持国内外约40款大模型,覆盖语言问答、多模态图文理解、文生图、文生视频四大任务。用户可在线或离线进行盲测,并与模型进行多轮交流和提问,以最大程度衡量模型输出与人类期望或偏好保持一致性。

引入主观倾向阶梯评分体系,提升评测精确度

FlagEval大模型角斗场在评测过程中,引入了主观倾向阶梯评分体系,包含A远好于B、A略好于B、AB差不多、A远好于B、B略好于A、B远好于A共5个梯度,其中“AB差不多”又分为“都好与都不好”。相较于传统的三个评分等级,这一体系更能捕捉模型生成内容的细微差异,精确揭示模型性能差异,从而提供更丰富和深入的评测洞察。

移动端访问入口,便捷体验

为了方便用户使用,FlagEval大模型角斗场还率先开放了国内首个移动端访问入口,为用户提供高效便捷的模型对战评测体验。

开源全链路数据,促进评测生态发展

智源研究院表示,未来将对模型对战评测的全链路数据进行开源,以促进大模型评测生态的发展。

总结

FlagEval大模型角斗场的发布,为我国AI评测领域注入了新的活力。这一创新举措,将有助于推动我国AI技术的发展,为用户提供更优质的服务。让我们共同期待,智源研究院在AI领域的更多精彩表现。


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注