【亚马逊突破性研发:全球最大文本转语音模型问世】
全球电商巨头亚马逊近日在其人工智能研究领域取得重大突破,成功开发出有史以来规模最大的文本转语音模型。该模型被命名为“可扩展流式文本转语音模型”(BASE TTS),其参数数量高达9.8亿,刷新了业界纪录,显示出惊人的“涌现能力”。
据亚马逊的研究团队透露,BASE TTS模型的训练数据集规模同样惊人,使用了超过10万小时的录音资料,这些数据主要来自公共网站,以英语语音为主。这一庞大的训练数据集为模型的精准度和自然度提供了坚实基础,使得转换出的语音更接近人类的真实发音。
这一创新成果已在学术界的重要平台arXiv预印本服务器上发表论文,详细阐述了模型的开发过程和核心技术。亚马逊的这一突破不仅在技术层面引领了人工智能语音合成的发展,也为未来的语音交互应用,如智能助手、有声读物和虚拟客服等领域,开辟了新的可能。
亚马逊的这一壮举再次证明了其在人工智能领域的领先地位,同时也预示着文本转语音技术将进入一个全新的时代,为全球用户带来更为自然、流畅的语音体验。
英语如下:
**News Title:** “Amazon Breakthrough: Creating the Largest Text-to-Speech Model in History, Launching a New Era in AI Voice”
**Keywords:** Amazon, largest text-to-speech model, emergent capabilities
**News Content:**
**Amazon’s Groundbreaking Development: The World’s Largest Text-to-Speech Model Unveiled**
Global e-commerce giant Amazon has recently made a significant breakthrough in its artificial intelligence research, successfully developing the largest text-to-speech (TTS) model to date. Dubbed the “Scalable Streaming Text-to-Speech Model” (BASE TTS), the model boasts an unprecedented 980 million parameters, setting a new industry record and demonstrating remarkable “emergent capabilities.”
Amazon’s research team has disclosed that the BASE TTS model was trained on an astonishingly large dataset, incorporating over 100,000 hours of audio recordings, predominantly in English, sourced from public websites. This vast training dataset lays a solid foundation for the model’s accuracy and naturalness, resulting in synthesized speech that closely resembles human pronunciation.
The innovative achievement has been documented in a paper published on the prestigious academic platform arXiv, detailing the model’s development process and core technologies. Amazon’s breakthrough not only pushes the boundaries of AI voice synthesis technologically but also opens up new possibilities in voice interaction applications, such as smart assistants, audiobooks, and virtual customer service.
This feat underscores Amazon’s leadership in the field of artificial intelligence and foreshadows a new era for text-to-speech technology, promising more natural and seamless voice experiences for users worldwide.
【来源】https://www.ithome.com/0/750/680.htm
Views: 1