The era of video generation is rapidly evolving, and a groundbreaking development has emerged from the team led by Youyang Yu at the National University of Singapore. After six months of dedicated work, the team has introduced VideoSys, an open-source video generation system designed to make video creation accessible, swift, and cost-effective for everyone. This announcement marks a significant milestone in the field, addressing the need for robust infrastructure in the video generation domain.
Since the beginning of the year, OpenAI’s Sora and other diffusion-based video generation models have sparked a new wave of interest in the AI community. However, the sector is still in its infancy, and many of its foundational tools and platforms are yet to catch up. In February, the Yu team’s OpenDiT project opened up new avenues for training and deploying diffusion models, particularly for text-to-video and text-to-image generation. The system, known for its ease of use, speed, and memory efficiency, gained significant traction, prompting the team to continue refining their work.
Recently, the Yu team integrated their advancements into VideoSys, a comprehensive video generation system tailored to address the unique challenges posed by video models. Unlike language models, video models handle long sequences and intricate processes, requiring distinct characteristics from each component, which in turn pose different memory and computational demands. VideoSys aims to simplify this process, offering a streamlined and efficient solution.
As an open-source project, VideoSys provides a high-performance, user-friendly infrastructure for video generation. The all-encompassing toolkit supports the entire pipeline, from training and inference to service and compression. This marks a new chapter in video generation, promising to democratize the process and make it more accessible to creators worldwide.
The Yu team’s efforts, from OpenDiT to VideoSys, have already garnered over 1,400 stars, indicating strong interest and support from the AI community. The team has also developed cutting-edge acceleration technologies to enhance the performance of diffusion models.
Pyramid Attention Broadcast (PAB)
PAB, the first real-time, diffusion-based video generation method, requires no additional training and delivers lossless quality. By eliminating redundant attention computations, PAB achieves a frame rate of 21.6 FPS and a 10.6 times speedup without compromising the quality of models like Open-Sora, Open-Sora-Plan, and Latte. As a model-agnostic method, PAB can accelerate future diffusion-based video generation models, enabling real-time generation capabilities.
Dynamic Sequence Parallelism (DSP)
DSP is an innovative, efficient sequence parallel algorithm tailored for multi-dimensional transformer architectures like Open-Sora and Latte. It outperforms state-of-the-art sequence parallel methods, offering a threefold increase in training acceleration and a twofold boost in inference speed for Open-Sora. In comparison to DeepSpeed Ulysses, DSP significantly reduces inference latency for 10-second, 512×512 videos.
The development of VideoSys and its accompanying acceleration technologies signals a new era in video generation. By overcoming the challenges associated with handling complex sequences and high computational demands, VideoSys is poised to revolutionize the way videos are created and accessed. The team’s commitment to open-source solutions ensures that these advancements will be available to a broad audience, fostering innovation and creativity in the video generation landscape. For more information and to access the VideoSys project, visit https://github.com/NUS-HPC-AI-Lab/VideoSys.
【source】https://www.jiqizhixin.com/articles/2024-08-26-3
Views: 1