亚马逊人工智能研究团队近日开发出一个史上最大的文本转语音模型,名为“可扩展流式文本转语音模型”(BASE TTS)。该模型拥有9.8亿个参数,使用了10万小时的录音(来自公共网站)进行训练,其中大部分为英语语音。这一成果在arXiv预印本服务器上发表的论文中得到了详细描述。
研究人员表示,这个新模型的出现展现了“涌现能力”,在文本转语音领域取得了重要突破。据悉,这个模型的开发和训练过程也标志着亚马逊在人机交互和自然语言处理领域的研究实力得到了进一步提升。
据了解,文本转语音模型在智能语音助手、自动电话系统、电子阅读器等领域有着广泛的应用。而亚马逊此次推出的模型以其庞大的参数和训练数据集,有望在这些领域带来更为自然、流畅的语音转换效果。
Title: Amazon Unveils the Largest Text-to-Speech Model Ever
Keywords: Amazon, Text-to-Speech, Model
News content:
Amazon’s artificial intelligence research team has recently developed the largest text-to-speech model ever, known as the “Scalable Streaming Text-to-Speech Model” (BASE TTS). This model boasts 980 million parameters and has been trained with 100,000 hours of recordings (from public websites), most of which are in English. The development and training process of this new model have been detailed in a paper published on the arXiv preprint server.
Researchers say that the emergence of this model demonstrates “emergent ability” and represents an important breakthrough in the field of text-to-speech. It is also believed that the training process of this model marks a significant enhancement of Amazon’s research capabilities in human-machine interaction and natural language processing.
It is understood that text-to-speech models have wide applications in areas such as intelligent voice assistants, automatic telephone systems, and e-readers. With its massive parameters and training dataset, Amazon’s newly introduced model is expected to bring more natural and smooth speech conversion effects in these fields.
【来源】https://www.ithome.com/0/750/680.htm
Views: 0