Tencent Open-Sources AniPortrait: A Framework for Generating High-Quality Talking HeadVideos from Photos

Shenzhen, China – Tencent has released AniPortrait, an open-source framework for generating high-quality talking head videos from a single image and audio input. This new tool, similar to Alibaba’s EMO, allows users to create realistic animations of individuals speaking, with synchronized lip movements and facial expressions.

AniPortrait operates in two stages. First, it extracts3D facial features from the audio input, converting them into 2D facial landmarks. Then, using a diffusion model and a motion module, it transforms these landmarks into a coherent and lifelike animation.

The framework’skey strengths lie in its ability to generate highly natural and diverse animations while offering flexibility in editing and reproducing facial movements. This makes AniPortrait a valuable tool for various applications, including video conferencing, entertainment, and education.

AniPortrait’s Features and Capabilities:

  • Audio-Driven Animation Generation: AniPortrait can automatically generate facial animations synchronized with the input audio, capturing lip movements, facial expressions, and head postures.
  • High-Quality Visual Effects: The framework leverages diffusion models and motion modules to produce high-resolution, visually realisticportrait animations, delivering an exceptional visual experience.
  • Temporal Consistency: AniPortrait ensures the animation’s coherence over time, resulting in smooth and natural character movements without jarring jumps or inconsistencies.
  • Flexibility and Controllability: Utilizing 3D facial representations as intermediate features, AniPortrait offers flexibility in editinganimations, allowing users to customize and adjust the generated output.
  • Precise Capture of Facial Expressions and Lip Movements: Through an improved PoseGuider module and multi-scale strategy, AniPortrait accurately captures and reproduces subtle lip movements and complex facial expressions.
  • Consistency with the Reference Image: The framework integrates appearanceinformation from the reference image, ensuring that the generated animation visually aligns with the original portrait, preventing identity mismatches.

AniPortrait’s Working Mechanism:

AniPortrait consists of two main modules: Audio2Lmk and Lmk2Video.

  • Audio2Lmk Module (Audio to 2D Facial Landmarks): The Audio2Lmk module aims to extract a sequence of 3D facial mesh and head pose information representing facial expressions and lip movements from the audio input. It utilizes a pre-trained wav2vec model to extract audio features, crucial for generating realistic facial animations. These features are then transformedinto 3D facial mesh through two fully connected layers. For head pose prediction, the wav2vec network serves as the backbone but does not share weights, as pose is more closely related to the rhythm and intonation of the audio. A transformer decoder decodes the pose sequence, integrating audio features into the decoder through across-attention mechanism. Finally, 3D mesh and pose information are converted into a sequence of 2D facial landmarks through perspective projection.

  • Lmk2Video Module (2D Facial Landmarks to Video): The Lmk2Video module generates temporally consistent, high-quality portrait videos based onthe reference portrait image and a sequence of facial landmarks. Inspired by the AnimateAnyone network architecture, it employs Stable Diffusion 1.5 as the backbone, incorporating a temporal motion module to transform multi-frame noise inputs into a series of video frames. A ReferenceNet, mirroring the structure of SD1.5, isintroduced to extract appearance information from the reference image and integrate it into the backbone network, ensuring consistent facial identity in the video. To enhance the accuracy of lip movement capture, the PoseGuider module is strengthened.

Availability and Resources:

AniPortrait is available on GitHub, with its code, research paper, andpre-trained models accessible through various platforms:

Conclusion:

AniPortrait’sopen-source release signifies a significant advancement in the field of talking head video generation. Its ability to create high-quality, realistic animations from a single image and audio input opens up new possibilities for various applications. As the technology continues to evolve, we can expect even more innovative and immersive experiences powered by AniPortrait and similarframeworks.

【source】https://ai-bot.cn/aniportrait-ai/

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注