Tencent Open-Sources AniPortrait A Framework for Generating Talking Head Videos fromPhotos

Tencent Open-Sources AniPortrait: A Framework for Generating High-Quality Talking HeadVideos from Photos

Shenzhen, China – Tencent has released AniPortrait, an open-source framework for generating high-quality talking head videos from a single image and audio input. This new tool, similar to Alibaba’s EMO, allows users to create realistic animations of individuals speaking, with synchronized lip movements and facial expressions.

AniPortrait operates in two stages. First, it extracts3D facial features from the audio input, converting them into 2D facial landmarks. Then, using a diffusion model and a motion module, it transforms these landmarks into a coherent and lifelike animation.

The framework’skey strengths lie in its ability to generate highly natural and diverse animations while offering flexibility in editing and reproducing facial movements. This makes AniPortrait a valuable tool for various applications, including video conferencing, entertainment, and education.

AniPortrait’s Features and Capabilities:

Audio-Driven Animation Generation: AniPortrait can automatically generate facial animations synchronized with the input audio, capturing lip movements, facial expressions, and head postures.
High-Quality Visual Effects: The framework leverages diffusion models and motion modules to produce high-resolution, visually realisticportrait animations, delivering an exceptional visual experience.
Temporal Consistency: AniPortrait ensures the animation’s coherence over time, resulting in smooth and natural character movements without jarring jumps or inconsistencies.
Flexibility and Controllability: Utilizing 3D facial representations as intermediate features, AniPortrait offers flexibility in editinganimations, allowing users to customize and adjust the generated output.
Precise Capture of Facial Expressions and Lip Movements: Through an improved PoseGuider module and multi-scale strategy, AniPortrait accurately captures and reproduces subtle lip movements and complex facial expressions.
Consistency with the Reference Image: The framework integrates appearanceinformation from the reference image, ensuring that the generated animation visually aligns with the original portrait, preventing identity mismatches.

AniPortrait’s Working Mechanism:

AniPortrait consists of two main modules: Audio2Lmk and Lmk2Video.

Audio2Lmk Module (Audio to 2D Facial Landmarks): The Audio2Lmk module aims to extract a sequence of 3D facial mesh and head pose information representing facial expressions and lip movements from the audio input. It utilizes a pre-trained wav2vec model to extract audio features, crucial for generating realistic facial animations. These features are then transformedinto 3D facial mesh through two fully connected layers. For head pose prediction, the wav2vec network serves as the backbone but does not share weights, as pose is more closely related to the rhythm and intonation of the audio. A transformer decoder decodes the pose sequence, integrating audio features into the decoder through across-attention mechanism. Finally, 3D mesh and pose information are converted into a sequence of 2D facial landmarks through perspective projection.
Lmk2Video Module (2D Facial Landmarks to Video): The Lmk2Video module generates temporally consistent, high-quality portrait videos based onthe reference portrait image and a sequence of facial landmarks. Inspired by the AnimateAnyone network architecture, it employs Stable Diffusion 1.5 as the backbone, incorporating a temporal motion module to transform multi-frame noise inputs into a series of video frames. A ReferenceNet, mirroring the structure of SD1.5, isintroduced to extract appearance information from the reference image and integrate it into the backbone network, ensuring consistent facial identity in the video. To enhance the accuracy of lip movement capture, the PoseGuider module is strengthened.

Availability and Resources:

AniPortrait is available on GitHub, with its code, research paper, andpre-trained models accessible through various platforms:

GitHub Code Repository: https://github.com/Zejun-Yang/AniPortrait
arXiv Research Paper: https://arxiv.org/abs/2403.17694
Hugging Face Model: https://huggingface.co/ZJYang/AniPortrait/tree/main
Hugging Face Demo: https://huggingface.co/spaces/ZJYang/AniPortrait_official

Conclusion:

AniPortrait’sopen-source release signifies a significant advancement in the field of talking head video generation. Its ability to create high-quality, realistic animations from a single image and audio input opens up new possibilities for various applications. As the technology continues to evolve, we can expect even more innovative and immersive experiences powered by AniPortrait and similarframeworks.

【source】https://ai-bot.cn/aniportrait-ai/

一	二	三	四	五	六	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Tencent Open-Sources AniPortrait A Framework for Generating Talking Head Videos fromPhotos

作者智能小编

Tencent Open-Sources AniPortrait: A Framework for Generating High-Quality Talking HeadVideos from Photos

相关文章

免费短剧，爆发式增长！或短剧免费：流量密码？或免费引爆！短剧狂飙

拼多多：降速，还是求变？拼多多战略转向：降速求变拼多多放慢脚步，谋求转型拼多多：从高速增长到精细运营拼多多：减速背后的战

阿里整合电商，家居小家电瞄准日本或者：阿里巴巴布局海外，日本成小家电新蓝海

发表回复取消回复

为您推荐

免费短剧，爆发式增长！或短剧免费：流量密码？或免费引爆！短剧狂飙

拼多多：降速，还是求变？拼多多战略转向：降速求变拼多多放慢脚步，谋求转型拼多多：从高速增长到精细运营拼多多：减速背后的战

阿里整合电商，家居小家电瞄准日本或者：阿里巴巴布局海外，日本成小家电新蓝海

石头科技：寻找下一个增长点石头科技谋求“第二曲线” 石头科技：转型升级在路上石头科技的第二曲线难题石头科技：巨头焦虑与突围

作者智能小编

Tencent Open-Sources AniPortrait: A Framework for Generating High-Quality Talking HeadVideos from Photos

相关文章

发表回复 取消回复

为您推荐

发表回复取消回复