黄山的油菜花黄山的油菜花

小红书AI团队推出FireRedTTS语音合成系统,突破传统局限

In a groundbreaking development in the field of AI, the FireRed team from Xiaohongshu (Little Red Book), a popular Chinese social commerce platform, has introduced FireRedTTS, a cutting-edge voice synthesis system based on large language models. This innovative technology has already made waves, enabling the creation of unique audio content with a variety of styles and voices, including those of famous characters from Chinese television dramas.

The Rise of FireRedTTS

FireRedTTS is the brainchild of the FireRed team at Xiaohongshu, which has been actively working on the development of AI technologies. The system leverages the power of large language models to transform text sequences into natural and expressive voice sequences, making it possible to mimic any accent, speaking style, or voice.

Applications and Practices

The FireRed team has showcased several practical applications of FireRedTTS, including voiceovers for short videos and conversational voice interactions. The system has the ability to generate a wide range of audio content, from humorous and girlfriend-style voices to emo snippets, providing users with endless possibilities for customization.

One notable example is the creation of a mix of Beijing dialect and English, which has generated a buzz among users. This unique combination of languages has been used to create a city feel, complete with a sense of contradiction and playfulness.

Multi-style Voice Synthesis

FireRedTTS has demonstrated its versatility by enabling the creation of voices for various characters from Chinese TV dramas, such as Li Yunlong from the popular series Swordsmen of the Han Dynasty, Xu Jiang from The Fast and the Furious, and Wang Duoyu from The Man Who Wants to Be Rich. By combining these voices, the system has created a humorous and engaging experience for users.

Natural Conversational Voice Generation

In addition to voice synthesis, FireRedTTS has also shown its potential in natural conversational voice generation. The system has been used to create a realistic and engaging companion for users, complete with a cute and demanding girlfriend persona. This capability has been further enhanced by the ability to generate voiceovers with the style of Xiaohongshu bloggers, making the audio content more relatable and personalized.

Technical Breakthroughs

The FireRedTTS system is built on a comprehensive framework that consists of three main components: data processing, the base system, and downstream applications. In the base system, the team has developed a voice synthesis solution based on a language model, which utilizes the strong sequence generation capabilities of the model to transform text sequences into natural and expressive voice sequences.

The system first trains a speech discrete encoder that focuses on semantic information to convert voice signals into discrete label sequences and a speaker global representation. Then, a text-to-speech language model is trained to predict the target voice sequence from the text and speaker representation. To stabilize the predicted discrete voice sequence and convert it into high-fidelity audio, FireRedTTS proposes a two-stage method: first, a large-scale low sampling rate data is used to train a high generalization mel-spectrum generator, and then a small-scale high-fidelity data is used to train a super-resolution neural vocoder to synthesize high sampling rate audio. In addition to the decoder based on flow matching, the system also proposes a decoder based on multi-stream language models to meet the requirements of streaming decoding.

Open Source and Collaboration

The FireRed team has made the technical report and model weights of FireRedTTS publicly available, inviting users to explore and experiment with the system. This move reflects the team’s commitment to promoting the development of AI technology and fostering collaboration within the AI community.

In conclusion, FireRedTTS is a significant advancement in the field of AI voice synthesis, offering a wide range of applications and customization options. As the technology continues to evolve, we can expect to see even more innovative uses of FireRedTTS in the future.


>>> Read more <<<

Views: 0

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注