亚马逊公司近日宣布开发出迄今为止最大的文本转语音(TTS)模型。该模型被称为“可扩展流式文本转语音模型”(BASE TTS),拥有9.8亿个参数,是迄今为止同类模型中规模最大的。这一突破性的进展标志着人工智能技术在语音生成领域的重大飞跃。
BASE TTS模型的开发和训练使用了超过10万小时的录音数据,其中大部分是英语语音。这些数据来源于公共网站,为模型的学习提供了丰富的素材。亚马逊的人工智能研究团队在arXiv预印本服务器上发表了一篇论文,详细介绍了模型的开发和训练过程。
该模型的推出,不仅展示了亚马逊在人工智能领域的强大研发能力,也预示着未来在语音助手、有声书、语音播报等领域的应用潜力。亚马逊表示,BASE TTS模型将能够提供更加自然、流畅的语音输出,提高用户体验。
英文标题:Amazon Unveils Largest Text-to-Speech Model
英文关键词:Amazon, Text-to-Speech, AI Model
英文新闻内容:
Amazon has recently announced the development of the largest text-to-speech (TTS) model to date. Named the “Scalable Streaming Text-to-Speech Model” (BASE TTS), it boasts 9.8 billion parameters, making it the largest of its kind. This breakthrough signifies a significant leap in AI technology for voice generation.
The BASE TTS model was developed and trained using over 100,000 hours of audio data, primarily in English, sourced from public websites. Amazon’s AI research team has published a paper on the arXiv preprint server detailing the development and training process of the model.
The unveiling of the model demonstrates Amazon’s robust R&D capabilities in AI and foreshadows its potential applications in areas such as voice assistants, audiobooks, and voice broadcasting. Amazon states that the BASE TTS model will provide more natural and fluid speech output, enhancing user experience.
【来源】https://www.ithome.com/0/750/680.htm
Views: 1