豆包大模型“听力”惊人：方言、童声都不在话下！

##大模型赋能语音识别，豆包“听力”再升级，方言、童声不再是难题

**8月21日，2024火山引擎AI创新巡展上海站上，豆包大模型团队发布了最新成果Seed-ASR，这款基于大语言模型的自动语音识别（ASR）技术，在方言识别、童声识别等方面展现出显著优势，为语音交互带来了全新体验。**

Seed-ASR能够准确转录各种语音信号，识别不同语言、方言、口音，甚至能结合文本语音等上下文，实现更准确转录。例如，在识别“立刃”、“雪板”、“搓雪”等滑雪专业名词时，Seed-ASR可以根据用户对字幕的编辑历史，关联并自动识别后续语音中的专业名词。

**现场演示中，Seed-ASR成功识别了多种方言夹杂的语音聊天，包括粤语、上海话、四川话、西安话、闽南语等，展现出强大的识别能力。** 即使是小朋友的口音，Seed-ASR也能准确识别，这得益于其强大的上下文感知能力和分阶段训练方法。

**据悉，Seed-ASR已在豆包APP中应用，并被网友用在英语会话、虚拟聊天伴侣、复刻亲友声音等多个场景。** 面向更多企业客户，Seed-ASR依托火山引擎，在语音交互、内容审核、会议访谈转写、音视频字幕等场景也有落地。

**公开及内部测评集显示，最新版本豆包大模型对比5月15日发布版本综合能力提升20.3%，其中，角色扮演能力提升38.3%，语言理解能力提升33.3%，数学能力提升13.5%。** 基于豆包大模型打造的豆包APP月活用户数在上半年已达2752万，为同类APP第一，是第二名的2.43倍。

**Seed-ASR的发布，标志着大模型技术在语音识别领域取得了重大突破。** 未来，随着大模型Scaling Laws的不断发展，Seed-ASR有望进一步提升识别精度，为用户带来更加智能、便捷的语音交互体验。

英语如下：

##Doubao’s Large Language Model Shows Impressive “Hearing” Capabilities: Dialectsand Children’s Voices are No Match!

**Keywords:** Doubao,ASR, Dialects

**Content:**

## Large Language Model Empowers Speech Recognition, Doubao’s “Hearing” Gets an Upgrade, Dialects andChildren’s Voices are No Longer a Challenge

**On August 21st, at the 2024 Volcano Engine AI Innovation Tour in Shanghai, the Doubao large language model team unveiled their latest achievement, Seed-ASR. This automatic speech recognition (ASR) technology, powered by a large language model, demonstrates significant advantages in dialect recognition, children’s voice recognition, and more, bringing a whole new experience to voice interaction.**

Seed-ASR can accurately transcribe various speech signals, recognizing different languages, dialects, and accents. It can even combine text and speech context to achieve more accurate transcription. For example,when recognizing professional skiing terms like “立刃” (li ren), “雪板” (xue ban), and “搓雪” (cuo xue), Seed-ASR can leverage user editing history to associate and automatically recognize subsequent professional terms in the speech.

**During the live demonstration, Seed-ASR successfullyrecognized speech conversations with various dialects, including Cantonese, Shanghainese, Sichuanese, Xianese, and Minnanese, showcasing its powerful recognition capabilities.** Even children’s voices, with their unique accents, can be accurately recognized by Seed-ASR, thanks to its powerful contextual awareness and staged training methods.

**It is reported that Seed-ASR has been implemented in the Doubao app and has been utilized by users in various scenarios, including English conversation, virtual chat companions, and replicating the voices of loved ones.** Targeting a wider range of enterprise customers, Seed-ASR, supported by Volcano Engine, has also found applications invoice interaction, content moderation, meeting and interview transcription, and audio and video subtitles.

**Public and internal evaluation sets show that the latest version of the Doubao large language model has seen a comprehensive improvement of 20.3% compared to the version released on May 15th. Among these improvements,role-playing ability has increased by 38.3%, language comprehension ability by 33.3%, and mathematical ability by 13.5%.** The Doubao app, powered by the Doubao large language model, has achieved a monthly active user base of 27.52 million inthe first half of the year, ranking first among similar apps and 2.43 times higher than the second-place app.

**The release of Seed-ASR marks a significant breakthrough in the application of large language model technology in speech recognition.** In the future, as the Scaling Laws of large language models continueto evolve, Seed-ASR is expected to further improve its recognition accuracy, providing users with a more intelligent and convenient voice interaction experience.

【来源】https://www.jiqizhixin.com/articles/2024-08-22-8