亚马逊创纪录：打造全球最大文本转语音模型，开启语音合成新纪元

作者智能小编

4 月 18, 2024 #亚马逊, #最大文本转语音模型, #每日AI快讯, #涌现能力

最新消息

【亚马逊推出史上最大文本转语音模型，展现人工智能新突破】近日，全球电商巨头亚马逊的AI研究团队在文本转语音技术领域取得重大进展，开发出有史以来规模最大的模型，该模型展现出前所未有的“涌现能力”。这一创新模型被称为“可扩展流式文本转语音模型”（BASE TTS），在arXiv预印本服务器上发表的论文中，研究人员详细阐述了其研发和训练的全过程。

据透露，BASE TTS模型拥有惊人的9.8亿个参数，远超以往任何同类模型，这使其在处理语音合成任务时具备了更高的精准度和自然度。为了训练这个庞然大物，亚马逊团队使用了多达10万小时的录音数据，这些数据主要来源于公共网站，以英语语音为主，这使得模型能够适应各种口音和表达方式。

这一创新对于人工智能语音技术的应用前景具有重大意义，不仅可能提升智能助手、有声读物和语音导航等领域的用户体验，还可能在教育、娱乐和无障碍技术等方面开辟新的可能。亚马逊的这一突破再次证明了科技巨头在AI研发领域的领先地位，同时也预示着人工智能语音技术即将迎来新的革命性飞跃。

英语如下：

**News Title:** “Amazon Breaks Records: Creates the World’s Largest Text-to-Speech Model, Pioneering a New Era in Voice Synthesis”

**Keywords:** Amazon, largest text-to-speech model, emergent capabilities

**News Content:** _Amazon Unveils the Largest Text-to-Speech Model Ever, Marking a New Frontier in AI Breakthroughs_

Recently, the AI research team at global e-commerce giant Amazon has made significant strides in the field of text-to-speech technology, developing the largest model of its kind, showcasing unprecedented “emergent capabilities.” This innovative model is dubbed the “Scalable Streaming Text-to-Speech Model” (BASE TTS), with the research and development process outlined in a paper published on the arXiv preprint server.

Sources reveal that the BASE TTS model boasts an astonishing 980 million parameters, surpassing any previous comparable models, thereby enhancing its precision and naturalness when tackling speech synthesis tasks. To train this massive model, the Amazon team utilized up to 100,000 hours of audio data, primarily from public websites, focusing on English speech, enabling the model to adapt to various accents and expressions.

This innovation holds significant implications for the future of AI voice technology applications, potentially improving user experiences in areas such as smart assistants, audiobooks, and voice navigation, while also opening up new possibilities in education, entertainment, and accessibility solutions. Amazon’s breakthrough underscores the tech giant’s leadership in AI research and foreshadows a revolutionary leap forward in the realm of artificial intelligence voice technology.

【来源】https://www.ithome.com/0/750/680.htm