亚马逊创纪录：打造全球最大文本转语音模型，参数超9.8亿

【亚马逊开创性开发出史上最大文本转语音模型，展现人工智能新里程碑】

全球电商巨头亚马逊在人工智能领域再次取得重大突破，其研究团队近日宣布成功开发出迄今为止规模最大的文本转语音模型。这个名为“可扩展流式文本转语音模型”（BASE TTS）的创新技术，标志着语音合成技术的新高度。

据亚马逊人工智能团队透露，BASE TTS模型拥有惊人的9.8亿个参数，这一数字远超此前任何同类模型，彰显了其在处理复杂语音转换任务上的强大能力。为了训练这一模型，研究人员使用了前所未有的大规模数据集，总计包含了10万小时的录音资料，主要来自公共网站，以英语语音为主。

这一突破性的成果已详细记录在一篇发表在arXiv预印本服务器的学术论文中，论文详尽阐述了模型的开发过程和训练策略。BASE TTS模型的出现，不仅意味着文本转语音技术在质量和效率上的显著提升，更可能开启语音交互的新时代，对智能助手、在线教育、有声读物等领域产生深远影响。

亚马逊此项创新表明，人工智能技术正在不断突破自我，展现出“涌现能力”，即在复杂系统中产生出超越个体智能的新能力。随着模型规模的扩大和训练数据的丰富，人工智能在语音处理上的表现将越来越接近人类，为用户提供更加自然、流畅的语音体验。

英语如下：

**News Title:** “Amazon Sets Record with the World’s Largest Text-to-Speech Model, Exceeding 980 Million Parameters”

**Keywords:** Amazon, largest text-to-speech model, emergent capabilities

**News Content:**

**Amazon Breaks Ground with the Largest Text-to-Speech Model, Marking a New Milestone in AI**

Global e-commerce giant Amazon has made a significant breakthrough in the realm of artificial intelligence, as its research team recently announced the successful development of the largest text-to-speech model to date. This groundbreaking innovation, known as the “Scalable Streaming Text-to-Speech Model” (BASE TTS), represents a new pinnacle in voice synthesis technology.

According to Amazon’s AI team, the BASE TTS model boasts an impressive 980 million parameters, far surpassing any previous comparable model and demonstrating its exceptional capacity for handling complex voice conversion tasks. To train this model, researchers utilized an unprecedentedly large dataset, consisting of 100,000 hours of audio recordings, predominantly in English, sourced from public websites.

This groundbreaking achievement has been documented in a detailed academic paper published on the arXiv preprint server, outlining the model’s development process and training strategies. The emergence of the BASE TTS model not only signifies a significant improvement in both quality and efficiency of text-to-speech technology but also potentially ushers in a new era of voice interaction, with far-reaching implications for smart assistants, online education, and audiobooks.

Amazon’s innovation underscores the ongoing progression of AI technology, showcasing “emergent capabilities” – the capacity for a complex system to generate intelligence beyond that of its individual components. As model sizes expand and training data becomes more extensive, AI’s performance in voice processing is expected to grow increasingly human-like, offering users a more natural and seamless voice experience.

【来源】https://www.ithome.com/0/750/680.htm