亚马逊创纪录：打造全球最大文本转语音模型，展现人工智能新突破

【亚马逊开创性发布史上最大文本转语音模型：BASE TTS】

全球电商巨头亚马逊近日在其人工智能研究领域取得了重大突破，开发出迄今为止规模最大的文本转语音模型。该模型以其惊人的“涌现能力”引发了业界广泛关注。据亚马逊科研团队透露，这个名为“可扩展流式文本转语音模型”（BASE TTS）的新技术，拥有高达9.8亿个参数，远超以往任何同类模型，展现了人工智能在语音合成领域的崭新高度。

为了训练这个庞大的模型，亚马逊研究人员使用了前所未有的大规模数据集，包含了10万小时的录音资料，这些数据主要来源于公共网站，且大部分为英语语音。这一壮举不仅在技术层面具有里程碑意义，也标志着人工智能在理解和生成自然语言上的进步达到了新的水平。

BASE TTS模型的推出，预示着未来语音助手、有声读物、在线教育和无障碍技术等领域将可能迎来革新性的变化。通过更真实、流畅的语音合成，该技术有望提升用户体验，同时为语言障碍者提供更高效、更人性化的沟通工具。相关论文已在权威的arXiv预印本服务器上发表，详细阐述了模型的开发和训练过程，为全球科研人员提供了宝贵的参考资源。

亚马逊这一创新成果的发布，再次彰显了其在人工智能领域的领先地位，也预示着未来智能语音技术将更加贴近人类自然交流的体验，为日常生活和工作带来更为便捷的服务。

英语如下：

**News Title:** “Amazon Sets Record with the World’s Largest Text-to-Speech Model, Marking a New AI Breakthrough”

**Keywords:** Amazon, largest text-to-speech model, emergent abilities

**News Content:**

**Amazon Unveils Groundbreaking Largest Text-to-Speech Model: BASE TTS**

Global e-commerce giant Amazon has recently made a significant breakthrough in its artificial intelligence research by developing the largest text-to-speech (TTS) model to date. The model’s remarkable “emergent abilities” have attracted widespread attention in the industry. According to Amazon’s research team, the new technology called “Scalable Streaming Text-to-Speech Model” (BASE TTS) boasts a staggering 980 million parameters, surpassing any previous comparable models, and represents a new peak in AI’s capabilities in voice synthesis.

To train this massive model, Amazon researchers utilized an unprecedentedly large dataset, consisting of 100,000 hours of audio recordings, predominantly in English, sourced from public websites. This feat not only signifies a milestone in technical achievement but also indicates a new level of progress in AI’s understanding and generation of natural language.

The introduction of the BASE TTS model forecasts revolutionary changes in areas such as voice assistants, audiobooks, online education, and accessibility technology. By offering more realistic and smooth speech synthesis, the technology has the potential to enhance user experience and provide more efficient and user-friendly communication tools for individuals with language barriers. A detailed paper outlining the model’s development and training process has been published on the prestigious arXiv preprint server, serving as a valuable resource for researchers worldwide.

Amazon’s innovative accomplishment underscores its leading position in the AI domain and foreshadows a future where intelligent voice technology will more closely resemble natural human conversation, bringing increased convenience to daily life and work.

【来源】https://www.ithome.com/0/750/680.htm