Introduction
In the realm of speech recognition technology, the challenge of multilingual automatic speech recognition (ASR) is particularly complex, especially when dealing with Indic languages. The traditional multilingual ASR systems often face limitations due to the scarcity of Indic language data and the need for language-specific models. This can lead to scalability and efficiency issues.
Background
To address these challenges, researchers at Samsung R&D Institute India-Bangalore have introduced a novel unified approach to multilingual ASR that includes improved language identification for Indic languages. This approach leverages the symbiotic relationship between language identification (LID) and multilingual ASR to enhance the performance of both tasks.
Proposed Approach
The proposed approach consists of two main methods, both utilizing the open-source Whisper model for experiments. Whisper is known for its generalizability across various datasets and domains but can be further optimized for specific languages and tasks through fine-tuning.
Approach 1: Proposed-v1
In the first approach, the team has fine-tuned the Whisper model to improve its performance on Indic languages. This fine-tuning process is designed to take advantage of the Whisper model’s pre-trained knowledge and adapt it to the specific requirements of Indic languages.
Approach 2: Proposed-v2
The second approach introduces a novel method that combines language identification capabilities with multilingual ASR. This method not only improves the ASR performance but also enhances the language identification accuracy.
Model Architecture and Flow
The architecture and flow of the proposed model are depicted in Figure 1. It outlines the steps taken to integrate language identification into the multilingual ASR framework.
[]
Experimental Results
The effectiveness of the proposed approach has been demonstrated through experiments on benchmark datasets. The results show a significant improvement in Word Error Rate (WER) with an absolute improvement of 19.1%. Additionally, the language identification performance has been enhanced by 6% in terms of Diarization Error Rate (DER).
Conclusion
The unified approach to multilingual ASR with improved language identification for Indic languages presented by Samsung R&D Institute India-Bangalore is a significant advancement in the field. It not only addresses the challenges of limited Indic language data but also offers a scalable and efficient solution for multilingual ASR tasks. The promising results from the experiments underscore the potential of this approach for practical applications in various domains.
Views: 0