View publication

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64% worse (absolute) for individuals with fluency disorders. We show that by simply tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24% (relative) for individuals with fluency disorders. Tuning these parameters translates to 3.6% better domain recognition and 1.7% better intent recognition relative to the default setup for the 18 study participants across all stuttering severities.

Related readings and updates.

Non-verbal Sound Detection for Disordered Speech

Voice assistants have become an essential tool for people with various disabilities because they enable complex phone- or tablet-based interactions without the need for fine-grained motor control, such as with touchscreens. However, these systems are not tuned for the unique characteristics of individuals with speech disorders, including many of those who have a motor-speech disorder, are deaf or hard of hearing, have a severe stutter, or are…
See paper details

Exploring Retraining-free Speech Recognition for Intra-sentential Code-switching

Code Switching refers to the phenomenon of changing languages within a sentence or discourse, and it represents a challenge for conventional automatic speech recognition systems deployed to tackle a single target language. The code switching problem is complicated by the lack of multi-lingual training data needed to build new and ad hoc multi-lingual acoustic and language models. In this work, we present a prototype research code-switching speech…
See paper details