Tencent Open-Sources AniPortrait A Framework for Generating Talking Head Videos fromPhotos

Tencent Open-Sources AniPortrait: A Framework for Generating High-Quality Talking HeadVideos from Photos

Shenzhen, China – Tencent has released AniPortrait, an open-source framework for generating high-quality talking head videos from a single image and audio input. This new tool, similar to Alibaba’s EMO, allows users to create realistic animations of individuals speaking, with synchronized lip movements and facial expressions.

AniPortrait operates in two stages. First, it extracts3D facial features from the audio input, converting them into 2D facial landmarks. Then, using a diffusion model and a motion module, it transforms these landmarks into a coherent and lifelike animation.

The framework’skey strengths lie in its ability to generate highly natural and diverse animations while offering flexibility in editing and reproducing facial movements. This makes AniPortrait a valuable tool for various applications, including video conferencing, entertainment, and education.

AniPortrait’s Features and Capabilities:

Audio-Driven Animation Generation: AniPortrait can automatically generate facial animations synchronized with the input audio, capturing lip movements, facial expressions, and head postures.
High-Quality Visual Effects: The framework leverages diffusion models and motion modules to produce high-resolution, visually realisticportrait animations, delivering an exceptional visual experience.
Temporal Consistency: AniPortrait ensures the animation’s coherence over time, resulting in smooth and natural character movements without jarring jumps or inconsistencies.
Flexibility and Controllability: Utilizing 3D facial representations as intermediate features, AniPortrait offers flexibility in editinganimations, allowing users to customize and adjust the generated output.
Precise Capture of Facial Expressions and Lip Movements: Through an improved PoseGuider module and multi-scale strategy, AniPortrait accurately captures and reproduces subtle lip movements and complex facial expressions.
Consistency with the Reference Image: The framework integrates appearanceinformation from the reference image, ensuring that the generated animation visually aligns with the original portrait, preventing identity mismatches.

AniPortrait’s Working Mechanism:

AniPortrait consists of two main modules: Audio2Lmk and Lmk2Video.

Audio2Lmk Module (Audio to 2D Facial Landmarks): The Audio2Lmk module aims to extract a sequence of 3D facial mesh and head pose information representing facial expressions and lip movements from the audio input. It utilizes a pre-trained wav2vec model to extract audio features, crucial for generating realistic facial animations. These features are then transformedinto 3D facial mesh through two fully connected layers. For head pose prediction, the wav2vec network serves as the backbone but does not share weights, as pose is more closely related to the rhythm and intonation of the audio. A transformer decoder decodes the pose sequence, integrating audio features into the decoder through across-attention mechanism. Finally, 3D mesh and pose information are converted into a sequence of 2D facial landmarks through perspective projection.
Lmk2Video Module (2D Facial Landmarks to Video): The Lmk2Video module generates temporally consistent, high-quality portrait videos based onthe reference portrait image and a sequence of facial landmarks. Inspired by the AnimateAnyone network architecture, it employs Stable Diffusion 1.5 as the backbone, incorporating a temporal motion module to transform multi-frame noise inputs into a series of video frames. A ReferenceNet, mirroring the structure of SD1.5, isintroduced to extract appearance information from the reference image and integrate it into the backbone network, ensuring consistent facial identity in the video. To enhance the accuracy of lip movement capture, the PoseGuider module is strengthened.

Availability and Resources:

AniPortrait is available on GitHub, with its code, research paper, andpre-trained models accessible through various platforms:

GitHub Code Repository: https://github.com/Zejun-Yang/AniPortrait
arXiv Research Paper: https://arxiv.org/abs/2403.17694
Hugging Face Model: https://huggingface.co/ZJYang/AniPortrait/tree/main
Hugging Face Demo: https://huggingface.co/spaces/ZJYang/AniPortrait_official

Conclusion:

AniPortrait’sopen-source release signifies a significant advancement in the field of talking head video generation. Its ability to create high-quality, realistic animations from a single image and audio input opens up new possibilities for various applications. As the technology continues to evolve, we can expect even more innovative and immersive experiences powered by AniPortrait and similarframeworks.

【source】https://ai-bot.cn/aniportrait-ai/

一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Tencent Open-Sources AniPortrait A Framework for Generating Talking Head Videos fromPhotos

作者智能小编

Tencent Open-Sources AniPortrait: A Framework for Generating High-Quality Talking HeadVideos from Photos

相关文章

Here are a few options playing with different angles SnapGen Shrinks Text-to-Image Power to Phone Size Pocket-Sized

手机文生图革命！SnapGen小体积实现百分百效果

AI重塑材料化学：2024年度突破盘点

发表回复取消回复

为您推荐

Here are a few options playing with different angles SnapGen Shrinks Text-to-Image Power to Phone Size Pocket-Sized

手机文生图革命！SnapGen小体积实现百分百效果

AI重塑材料化学：2024年度突破盘点

AI赋能汤姆猫，玩具风口再起？

作者智能小编

Tencent Open-Sources AniPortrait: A Framework for Generating High-Quality Talking HeadVideos from Photos

相关文章

发表回复 取消回复

为您推荐

发表回复取消回复