View publication

Code Switching refers to the phenomenon of changing languages within a sentence or discourse, and it represents a challenge for conventional automatic speech recognition systems deployed to tackle a single target language. The code switching problem is complicated by the lack of multi-lingual training data needed to build new and ad hoc multi-lingual acoustic and language models. In this work, we present a prototype research code-switching speech recognition system, currently not in production, that leverages existing monolingual acoustic and language models, i.e., no ad hoc training is needed. To generate high quality pronunciation of foreign language words in the native language phoneme set, we use a combination of existing acoustic phone decoders and an LSTM-based grapheme-to-phoneme model. In addition, a code-switching language model was developed by using translated word pairs to borrow statistics from the native language model. We demonstrate that our approach handles accented foreign pronunciations better than techniques based on human labeling. Our best system reduces the WER from 34.4%, obtained with a conventional monolingual speech recognition system, to 15.3% on an intra-sentential code-switching task, without harming the monolingual accuracy.

Related readings and updates.

End-to-End Speech Translation for Code Switched Speech

Code switching (CS) refers to the phenomenon of interchangeably using words and phrases from different languages. CS can pose significant accuracy challenges to NLP, due to the often monolingual nature of the underlying systems. In this work, we focus on CS in the context of English/Spanish conversations for the task of speech translation (ST), generating and evaluating both transcript and translation. To evaluate model performance on this task…
See paper details

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The…
See paper details