亚马逊创纪录：打造全球最大文本转语音模型，具备惊人“涌现能力”

【亚马逊创新突破：打造史上最大文本转语音模型】

全球电商巨头亚马逊近日在人工智能领域再次展现出其技术实力，其研究团队成功开发出迄今为止最大的文本转语音模型，这一成就被业界广泛瞩目。该模型以其独特的“涌现能力”为亮点，有望重塑人工智能语音合成的未来。

据亚马逊人工智能团队透露，这个名为“可扩展流式文本转语音模型”（BASE TTS）的创新模型拥有惊人的9.8亿个参数，这一数字超越了所有已知的同类模型，显示出其极高的复杂度和处理能力。模型的训练过程同样引人注目，研究人员利用了多达10万小时的录音数据进行训练，这些数据主要来自公共网站，以英语语音为主，确保了模型在语言理解和语音生成上的广泛适应性。

这一突破性的科研成果已正式在arXiv预印本服务器上发表论文，详细阐述了模型的开发过程和技术细节。BASE TTS模型的出现，不仅标志着文本转语音技术在规模和性能上的新高度，也为未来的语音交互应用，如智能助手、在线教育、有声读物等领域，提供了更为真实、流畅的语音合成解决方案。

亚马逊的这一创新无疑将对全球人工智能产业产生深远影响，进一步推动了AI技术在语音识别和自然语言处理方面的进步，也为用户带来了更加智能和人性化的体验。随着技术的不断迭代，我们有理由期待一个由先进AI驱动的更加智能的语音世界。

英语如下：

**News Title:** “Amazon Breaks Records: Creating the World’s Largest Text-to-Speech Model with Stunning ‘Emergent Abilities'”

**Keywords:** Amazon, largest text-to-speech model, emergent abilities

**News Content:**

**Amazon’s Innovation Milestone: Forging the Largest Text-to-Speech Model Ever**

Global e-commerce giant Amazon has once again demonstrated its technical prowess in the field of artificial intelligence, as its research team successfully developed the largest text-to-speech model to date. This achievement has drawn widespread attention within the industry. The model’s unique “emergent abilities” are poised to revolutionize the future of AI-generated speech synthesis.

According to Amazon’s AI team, the innovative model, named “Scalable Streaming Text-to-Speech Model” (BASE TTS), boasts an impressive 980 million parameters, surpassing all known comparable models, indicating its high level of complexity and processing capacity. The training process was equally remarkable, with researchers utilizing up to 100,000 hours of audio data for training, primarily from public websites with a focus on English speech, ensuring the model’s wide adaptability in language understanding and speech generation.

This groundbreaking research has been formally documented in a paper published on the arXiv preprint server, detailing the model’s development process and technical specifics. The emergence of the BASE TTS model signifies a new peak in both scale and performance for text-to-speech technology and offers more authentic and fluid speech synthesis solutions for future applications, such as smart assistants, online education, and audiobooks.

Amazon’s innovation is set to have a profound impact on the global AI industry, further advancing AI capabilities in speech recognition and natural language processing. It also promises a more intelligent and user-friendly experience for consumers. As technology continues to evolve, we can anticipate a smarter voice-driven world powered by advanced AI.

【来源】https://www.ithome.com/0/750/680.htm