Interspeech 2024 Boosting Speech Tech Speed Reducing Costs with Mixing

作者智能小编

9 月 4, 2024 #cost, #Samsung

As a professional journalist and editor with extensive experience across leading news media outlets, you are now tasked with summarizing a series of advancements in speech technology research presented at the Interspeech 2024 conference. This conference, a leading global forum for speech recognition, synthesis, speaker recognition, and speech and language processing, showcases cutting-edge trends and standards in the field.

Here is a concise summary of the research papers featured in the series:

Relational Proxy Loss for Audio-Text Based Keyword Spotting – Developed by Samsung Research, this paper introduces a novel loss function designed to improve the accuracy of keyword spotting in audio-text applications.
NL-ITI: Probing Optimization for Improvement of LLM Intervention Method – This research, presented by the Samsung R&D Institute Poland, focuses on enhancing the truthfulness of large language models (LLMs) through internal modifications, aiming to improve their reliability and coherence.
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model – Samsung Research’s contribution outlines a method for generating high-quality, natural-sounding text-to-speech (TTS) by utilizing discrete tokens, transducers, and masked language models.
Speaker Personalization for Automatic Speech Recognition Using Weight-Decomposed Low-Rank Adaptation – This paper, presented by the Samsung R&D Institute India-Bangalore, discusses a technique for personalizing automatic speech recognition (ASR) systems to enhance accuracy for individual users.
Speech Boosting: Developing an Efficient On-Device Live Speech Enhancement – Samsung Research’s research in this area aims to provide low-latency, live speech enhancement for true wireless stereo (TWS) earbuds, improving speech clarity and user experience.
SummaryMixing Makes Speech Technologies Faster and Cheaper – The Samsung AI Center Cambridge (SAIC-C) led this project, which introduces SummaryMixing, a method that replaces self-attention in deep learning models to enhance the responsiveness and stability of speech technologies, making them faster and cheaper to train.
A Unified Approach to Multilingual Automatic Speech Recognition with Improved Language Identification for Indic Languages – This paper, presented by the Samsung R&D Institute India-Bangalore, proposes a unified method for multilingual ASR that improves language identification, particularly for languages in the Indic script.

These advancements, showcased in the Interspeech 2024 conference, demonstrate Samsung’s commitment to driving innovation in speech technology and enhancing user experiences through cutting-edge research and development.