In a significant development in the field of artificial intelligence, Alibaba International has announced the release of its latest open-source multimodal model, Ovis. The model has achieved remarkable results in various tasks, including image understanding, mathematical reasoning, object recognition, and complex decision-making, showcasing the power of multimodal AI.
Ovis: A Leap Forward in Multimodal AI
Ovis, which stands for Open Vision for Intelligence, is a state-of-the-art multimodal model that can process and understand a wide range of data types, including text, images, and more. This capability sets Ovis apart from large language models (LLMs) which are primarily focused on processing and generating text data.
According to data from OpenCompass, a leading multimodal evaluation platform, Ovis 1.6-Gemma2-9B has achieved a comprehensive first place among models with parameters below 30B. This puts it ahead of industry-leading models like MiniCPM-V-2.6.
Key Features of Ovis
Ovis boasts several key features that contribute to its exceptional performance:
- Innovative Architecture Design: Ovis introduces a learnable visual embedding lexicon, which converts continuous visual features into probabilistic visual tokens. This approach, combined with a visual embedding lexicon, significantly improves the performance of multimodal tasks.
- High-Performance Image Processing: Ovis features a dynamic subgraph scheme that supports the processing of images with extreme aspect ratios and high resolutions, demonstrating excellent image understanding capabilities.
- Comprehensive Data Optimization: Ovis is trained on a diverse set of multimodal datasets, covering various data directions such as captions, visual question answering (VQA), optical character recognition (OCR), tables, and charts. This approach significantly enhances the performance of tasks like multimodal question answering and instruction following.
- Outstanding Model Performance: Ovis has achieved impressive results on various benchmarks, including mathematical reasoning and幻觉 tasks. In particular, Ovis-1.6’s performance in幻觉 is significantly better than that of similar models, indicating higher text quality and accuracy.
- Open Source and Commercial Use: Ovis is released under the Apache 2.0 license, making it freely available for both research and commercial purposes. The Ovis 1.0, 1.5, and 1.6 series models, along with their data, models, training, and inference code, have been open-sourced, allowing for reproducibility.
Ovis in Action
Ovis has a wide range of applications across various industries, including autonomous driving, medical diagnosis, video content understanding, image description generation, and visual question answering. For example, in the field of autonomous driving, Ovis can integrate data from cameras, radar, and lidar to achieve more precise environmental perception and decision-making.
Alibaba International’s AI Efforts
Alibaba International has been actively investing in AI research and development. Last year, the company formed an AI team that has already tested AI capabilities in over 40 e-commerce scenarios, covering the entire cross-border e-commerce value chain. Many of these applications are based on the Ovis model and have helped over 500,000 small and medium-sized merchants optimize information for 100 million products.
Conclusion
The release of Ovis represents a significant step forward in the field of multimodal AI. With its impressive performance and wide range of applications, Ovis is poised to drive innovation and advancements in various industries. As AI continues to evolve, models like Ovis will play a crucial role in shaping the future of technology and society.
Views: 3