Beijing – In a move signaling a growing commitment to open-source technology and collaborative innovation, Geely Automobile Group and Chinese AI startup StepStar jointly announced the open-sourcing of two powerful multimodal AI models: Step-Video-T2V, a video generation model, and Step-Audio, a speech interaction model. This collaboration marks a significant contribution to the global open-source community, particularly in the rapidly evolving field of artificial intelligence.
StepStar, a rising star in China’s AI landscape, has been focusing on developing general-purpose foundation models with the ultimate goal of achieving Artificial General Intelligence (AGI). Geely, a leading Chinese automaker, has been strategically partnering with technology companies like StepStar to enhance its technological ecosystem. This collaboration leverages the strengths of both companies, combining StepStar’s AI expertise with Geely’s computational resources, algorithms, and real-world application scenarios.
The open-sourced models represent a significant leap forward in their respective domains:
-
Step-Video-T2V: This video generation model boasts 30 billion parameters, enabling it to generate high-quality videos. According to StepStar’s technical report, its performance surpasses other open-source video generation models globally. The model can directly generate 204 frames of 540P resolution video, ensuring high information density and consistency. To comprehensively evaluate the performance of open-source video generation models, StepStar has also released and open-sourced Step-Video-T2V-Eval, a new benchmark dataset for text-to-video quality assessment. This dataset includes 128 Chinese evaluation questions derived from real users, designed to assess the quality of generated videos in 11 content categories, including motion, scenery, animals, combined concepts, surrealism, characters, 3D animation, and cinematography.
-
Step-Audio: Touted as the industry’s first product-level open-source speech interaction model, Step-Audio supports multiple languages, dialects, emotional expression, and even voice cloning. The model has reportedly achieved top rankings in various evaluations, demonstrating its advanced capabilities in speech recognition and synthesis. The model is currently available for trial within the YueWen App.
We firmly believe that the realization of AGI requires the joint efforts of global developers, stated a StepStar representative. Our open-source initiative is driven by the desire to share our latest multimodal AI technology achievements and contribute a force from China to the global open-source community.
The open-sourcing of these models is expected to accelerate innovation in various fields, including content creation, entertainment, education, and human-machine interaction. By making these advanced AI technologies accessible to a wider audience, StepStar and Geely are fostering a more collaborative and inclusive AI development ecosystem.
Conclusion:
The joint open-sourcing of Step-Video-T2V and Step-Audio by StepStar and Geely represents a significant step towards democratizing access to advanced AI technologies. These powerful models, coupled with the accompanying evaluation datasets, will undoubtedly empower researchers, developers, and innovators worldwide to explore new possibilities and contribute to the advancement of artificial intelligence. This move highlights the growing importance of open-source collaboration in driving innovation and shaping the future of AI.
References:
- StepStar Official Announcement: [Insert Link to Official Announcement Here – If Available]
- Geely Automobile Group Official Website: [Insert Link to Geely Official Website Here – If Available]
Views: 0