【亚马逊开创性研发:全球最大文本转语音模型问世,展现人工智能新高度】

全球电商巨头亚马逊的科研团队近日取得重大突破,成功开发出迄今为止最大的文本转语音模型,这一创新成果或将重塑人工智能语音合成领域。该模型被命名为“可扩展流式文本转语音模型”(BASE TTS),以其惊人的9.8亿个参数数量和广泛的训练数据集,刷新了业界纪录。

据亚马逊人工智能研究团队透露,BASE TTS模型在训练过程中使用了多达10万小时的录音资料,这些数据主要来源于公共网站,以英语语音为主。这一大规模的训练使得模型能够更准确、自然地模拟人类语音,展现出前所未有的“涌现能力”,即在复杂任务中展现出超越预期的智能表现。

这一创新成果已由亚马逊的研究人员在知名的arXiv预印本服务器上发表论文,详细阐述了模型的开发过程和技术细节。BASE TTS模型的诞生,不仅体现了亚马逊在人工智能领域的深厚积累,也为未来语音交互、智能客服、有声读物等领域带来了无限可能。随着技术的不断进步,人工智能在语音合成方面的表现或将更加接近人类,进一步推动人机交互的自然性和智能化。

英语如下:

News Title: “Amazon Breaks Records: Creating the World’s Largest Text-to-Speech Model, Demonstrating New Heights in AI”

Keywords: Amazon, largest text-to-speech model, emergent capabilities

News Content:

**Amazon’s Groundbreaking Development: The World’s Largest Text-to-Speech Model Marks a New Era in AI**

Global e-commerce giant Amazon’s research team has recently made a significant breakthrough by successfully developing the largest text-to-speech model to date, potentially reshaping the field of artificial intelligence (AI) voice synthesis. The model, named “Scalable Streaming Text-to-Speech Model” (BASE TTS), boasts an impressive 980 million parameters and a vast training dataset, setting a new industry benchmark.

According to Amazon’s AI research team, the BASE TTS model was trained using up to 100,000 hours of audio recordings, predominantly in English, sourced from public websites. This extensive training enables the model to mimic human speech more accurately and naturally, exhibiting unprecedented “emergent capabilities.” These are instances where the model demonstrates smarter performance than anticipated in complex tasks.

The groundbreaking achievement has been documented by Amazon researchers in a paper published on the prestigious arXiv preprint server, outlining the model’s development process and technical specifics. The birth of the BASE TTS model not only underscores Amazon’s deep AI expertise but also opens up endless possibilities for voice interaction, smart customer service, and audiobooks. As technology advances, AI’s performance in voice synthesis may become even more human-like, further enhancing the naturality and intelligence of human-computer interaction.

【来源】https://www.ithome.com/0/750/680.htm

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注