View publication

This paper proposes a new channel normalization algorithm called parametric cepstral mean normalization (PCMN) to increase robust- ness of speech recognition to varying acoustic conditions. Rather than using a simple average of input speech features as channel es- timate, as done in the traditional CMN, PCMN weighs the running average of input speech frames in a frequency dependent manner. These weights are jointly optimized together with parameters of the acoustic model training. Experimental results show that, in contrast to traditional CMN, which degrades performance on clean data, PCMN provides 5% relative improvement on clean data, while also providing 11.2% relative improvement on far-field test data. We also propose an adaptive version of PCMN, called aPCMN, where both input speech features and channel estimates have weights. These weights are computed at run time and they change dynamically based on the input speech. aPCMN provides 13.0% relative im- provement on far-field test set, while still maintaining 5% relative improvement on clean data.

Related readings and updates.

Bandwidth Embeddings for Mixed-Bandwidth Speech Recognition

In this paper, we tackle the problem of handling narrowband and wideband speech by building a single acoustic model (AM), also called mixed bandwidth AM. In the proposed approach, an auxiliary input feature is used to provide the bandwidth information to the model, and bandwidth embeddings are jointly learned as part of acoustic model training. Experimental evaluations show that using bandwidth embeddings helps the model to handle the variability…
See paper details

Deep Learning for Siri’s Voice: On-device Deep Mixture Density Networks for Hybrid Unit Selection Synthesis

Siri is a personal assistant that communicates using speech synthesis. Starting in iOS 10 and continuing with new features in iOS 11, we base Siri voices on deep learning. The resulting voices are more natural, smoother, and allow Siri’s personality to shine through. This article presents more details about the deep learning based technology behind Siri’s voice.

See article details