随着人工智能技术的不断进步,亚马逊人工智能研究团队宣布了一个令人瞩目的成就:他们开发出了一个拥有9.8亿个参数的文本转语音模型,这一模型不仅参数数量庞大,而且使用了超过10万小时的录音数据进行训练,这标志着人工智能技术在语音合成领域的重大突破。
该模型名为“可扩展流式文本转语音模型”(BASE TTS),它在人工智能模型参数规模和训练数据量上均创造了历史记录。研究人员通过公开渠道收集了大量英语语音数据,这些数据为模型的训练提供了丰富的语料支持。模型的“涌现能力”意味着它在处理复杂文本内容时能够展现出更高的准确性和流畅性,为用户提供更加自然、逼真的语音合成效果。
这一研究成果已经在知名学术预印本服务器arXiv上发表,详细介绍了模型的开发和训练过程。这不仅是对人工智能技术的一次重要推进,也为未来的语音交互和智能助手的发展提供了新的可能。
英文翻译内容:
Title: Amazon Unveils Largest Text-to-Speech Model in History, Showcases Emergent Abilities
Keywords: AI, Text-to-Speech, Model Parameters, Training Datasets
News content:
In a significant advancement for AI technology, Amazon’s AI research team has announced the development of a text-to-speech model with 980 million parameters, the largest ever recorded in terms of model size and training data. The “Scalable Streaming Text-to-Speech Model” (BASE TTS) has been trained on more than 100,000 hours of recorded speech data, primarily in English, collected from public sources. The model’s “emergent abilities” indicate higher accuracy and fluency in handling complex text, offering users a more natural and realistic synthetic speech experience. This breakthrough has been detailed in a paper published on the academic preprint server arXiv, marking a major milestone for AI and setting new possibilities for voice interaction and intelligent assistant development.
【来源】https://www.ithome.com/0/750/680.htm
Views: 1