Volcano EngineUnveils Video Preprocessing Solution for Large Language Model Training

Volcano Engine Unveils Video Preprocessing Solution for Large Model Training, Applied to Doubao Video Generation Model

Beijing, China – On October 15th, VolcanoEngine, the cloud computing arm of ByteDance, announced a novel video preprocessing solution designed to tackle the challenges of training large video models. This innovative solution, whichhas already been successfully implemented in the Doubao video generation model, addresses key concerns surrounding cost, quality, and performance during the training process.

The announcement was made at theVolcano Engine Video Cloud Technology Conference, where Tan Dai, President of Volcano Engine, highlighted the profound impact of AIGC and multi-modal technologies on user experiences across various dimensions. Based on our experience with Douyin and collaborative efforts with industryclients, Tan stated, Volcano Engine Video Cloud is actively exploring the deep integration of AI large models with video technology, seeking solutions for businesses in terms of technical foundation, processing pipelines, and business growth.

BMF: ASelf-Developed Multimedia Processing Framework

Preprocessing training videos is crucial for ensuring optimal large model training outcomes. This process involves standardizing video data formats, enhancing data quality, achieving data normalization, reducing data volume, and handling annotation information. By streamlining these aspects, models can learn features and knowledge from videos more efficiently, leading toimproved training effectiveness and efficiency.

Wang Yue, Head of Video Architecture at Douyin Group, outlined the challenges faced by large model developers in this domain. Firstly, the massive scale of video training datasets results in a surge in computational and processing costs, Wang explained. Secondly, video sample data often exhibits inconsistencies,and the processing pipeline involves numerous complex engineering steps. Finally, there’s the challenge of scheduling and deploying diverse heterogeneous computing resources like GPUs, CPUs, and ARM processors.

Leveraging Intel CPUs and other resources, Volcano Engine’s newly released video preprocessing solution relies on its self-developed multimedia processing framework, BMF. Thisframework effectively addresses the computational cost challenges associated with model training. Furthermore, the solution has undergone optimization in both algorithms and engineering, enabling high-quality preprocessing of massive video datasets within a short timeframe.

Impact and Future Directions

This breakthrough in video preprocessing technology signifies a significant step forward in the development and application of largevideo models. By addressing the key challenges of cost, quality, and performance, Volcano Engine’s solution empowers developers to train more efficient and effective models, paving the way for advancements in video generation, understanding, and analysis.

The integration of BMF with the Doubao video generation model serves as a testament to thesolution’s practical applicability. As the field of AI continues to evolve, Volcano Engine’s commitment to innovation in video cloud technology positions it as a key player in shaping the future of video-based AI applications.

References: