Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

上海的陆家嘴
0

The world of Large Multimodal Models (LMMs) is rapidly evolving. We’ve seen the rise of seemingly omnipotent models like GPT-4o and Gemini 2 Flash, capable of handling complex tasks involving both text and images. But a new benchmark has emerged, exposing the limitations of even these cutting-edge systems: ZeroBench.

This challenging new benchmark has left over 20 prominent LMMs, including GPT-4o, with a score of zero on their first attempt. The results have sent shockwaves through the AI community, prompting a closer examination of ZeroBench and its implications for the future of AI evaluation.

Why ZeroBench Matters

Existing benchmarks are becoming increasingly inadequate for evaluating the true visual understanding capabilities of advanced LMMs. ZeroBench aims to address this issue by presenting a set of 100 novel and highly challenging problems.

What Makes ZeroBench So Difficult?

The problems in ZeroBench require more than just simple object recognition. They demand a combination of visual perception, reasoning, and real-world knowledge. Here are a couple of examples:

  • Problem 1: The Upside-Down Menu Challenge: Imagine being presented with a restaurant menu that’s both upside-down and obscured by glare. The task? Calculate the total cost of ordering one of each item on the menu. This requires the model to decipher distorted text, identify individual items, and perform arithmetic calculations.

  • Problem 2: The Weightlifting Conundrum: This problem involves analyzing an image of various weights, including kettlebells and dumbbells. The model must:

    • Calculate the total weight of all kettlebells.
    • Calculate the total weight of dumbbells between 5 and 15 pounds (inclusive).
    • Estimate the weight of each green kettlebell.

    Solving this requires not only visual recognition but also an understanding of weightlifting equipment and the ability to perform calculations with specific constraints.

Implications and Future Directions

ZeroBench’s emergence highlights the need for more robust and realistic benchmarks that can truly assess the capabilities of LMMs. It reveals that while these models may excel at many tasks, they still struggle with problems that require complex reasoning, real-world knowledge, and the ability to overcome visual challenges.

The failure of even the most advanced models on ZeroBench suggests that there’s still significant room for improvement in the development of LMMs. Future research should focus on enhancing their ability to:

  • Understand and reason about complex visual scenes.
  • Integrate visual information with real-world knowledge.
  • Overcome visual distortions and ambiguities.

ZeroBench serves as a valuable tool for guiding these efforts and pushing the boundaries of AI development. It’s a reminder that while AI has made remarkable progress, there are still significant challenges to overcome before we can truly claim that machines possess human-level visual understanding.

References

  • Machine Heart. (2024, February 18). 这届出题太难了!新基准让多模态模型集体自闭,GPT-4o都是零分. Retrieved from [Original Article URL – If Available, Insert Here]


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注