Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

NEWS 新闻NEWS 新闻
0

Shanghai – The Shanghai AI Laboratory, in collaboration with ShanghaiTech University, Shanghai Jiao Tong University, and the University of Hong Kong, has announced the release of MM-Eureka, a groundbreaking multi-modal reasoning model poised to significantly advance the field of artificial intelligence. This innovative model leverages rule-based reinforcement learning (RL) to extend key features of single-modal reasoning, such as stable answer length growth, accuracy rewards, and visual insight moments, into complex multi-modal scenarios.

The development of MM-Eureka marks a significant step towards AI systems that can seamlessly integrate and reason with diverse data types, including text and images. This capability is crucial for a wide range of applications, from advanced robotics and autonomous driving to sophisticated medical diagnosis and personalized education.

Two Powerful Models: MM-Eureka-8B and MM-Eureka-Zero-38B

The MM-Eureka project has yielded two core models: MM-Eureka-8B and MM-Eureka-Zero-38B. These models are built upon the foundations of InternVL2.5-Instruct-8B and InternVL2.5-Pretrained-38B, respectively, demonstrating the team’s commitment to leveraging and building upon existing advancements in the field.

What sets MM-Eureka apart is its remarkable efficiency in learning. The researchers achieved impressive performance with a relatively small dataset. Specifically, MM-Eureka was trained using only 54,000 image-text data points for rule-based reinforcement learning. Remarkably, this limited dataset allowed the model to surpass the average performance of MPO models trained on a significantly larger dataset of 1 million data points.

Furthermore, MM-Eureka-Zero-38B demonstrated exceptional capabilities in mathematical reasoning. Trained on a mere 8,000 image-text mathematical reasoning data points, the model outperformed instruction models by 8.2% on a self-constructed K12 benchmark. Its performance on the MathVerse dataset was equally impressive, showcasing its ability to tackle complex mathematical problems presented in a multi-modal format.

Key Features and Capabilities of MM-Eureka:

  • Multi-Modal Reasoning: MM-Eureka extends the power of rule-based reinforcement learning to the multi-modal domain, enabling it to process and reason with both textual and visual information. This allows the model to understand the relationships between different data types and draw more informed conclusions.
  • Replication of Key Single-Modal Features: The model successfully replicates key features of text-based RL systems, such as DeepSeek, within the multi-modal space. This ensures that the model maintains the strengths of existing AI systems while expanding its capabilities to handle more complex data.
  • Efficient Learning: MM-Eureka achieves remarkable performance with a relatively small training dataset, demonstrating the effectiveness of its rule-based reinforcement learning approach. This efficiency reduces the computational resources required for training and makes the model more accessible to researchers and developers.

Implications and Future Directions:

The development of MM-Eureka represents a significant step forward in the field of multi-modal AI. Its ability to reason with both text and images opens up new possibilities for AI applications in a wide range of industries. The model’s efficient learning capabilities also make it a promising platform for future research and development.

The Shanghai AI Laboratory and its collaborators are committed to further refining and expanding the capabilities of MM-Eureka. Future research will focus on exploring new applications for the model, improving its performance on more complex tasks, and developing new techniques for multi-modal reasoning. The ultimate goal is to create AI systems that can seamlessly integrate and reason with all types of data, leading to more intelligent and capable machines.

References:

  • Information sourced from AI工具集 (AI Tools Collection) website. (Note: Direct URL not provided as per instruction to avoid external links). Further research and official publications from Shanghai AI Laboratory, ShanghaiTech University, Shanghai Jiao Tong University, and the University of Hong Kong are recommended for in-depth technical details.


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注