Pitch Accent Detection Improves Pretrained Automatic Speech Recognition

AuthorsDavid Sasu, Natalie Schluter

We show the performance of Automatic Speech Recognition (ASR) systems that use semi-supervised speech representations can be boosted by a complimentary pitch accent detection module, by introducing a joint ASR and pitch accent detection model. The pitch accent detection component of our model achieves a significant improvement on the state-of-the-art for the task, closing the gap in F1-score by 41%. Additionally, the ASR performance in joint training decreases WER by 28.3% on LibriSpeech, under limited resource fine-tuning. With these results, we show the importance of extending pretrained speech models to retain or re-learn important prosodic cues such as pitch accent.

Related readings and updates.

Controllable Neural Text-To-Speech Synthesis Using Intuitive Prosodic Features

October 6, 2020research area Human-Computer Interaction, research area Speech and Natural Language Processingconference Interspeech

Modern neural text-to-speech (TTS) synthesis can generate speech that is indistinguishable from natural speech. However, the prosody of generated utterances often represents the average prosodic style of the database instead of having wide prosodic variation. Moreover, the generated prosody is solely defined by the input text, which does not allow for different styles for the same sentence. In this work, we train a sequence-to-sequence neural…

ICASSP 2020

May 4, 2020research area Speech and Natural Language Processingconference ICASSP

Apple sponsored the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in May 2020. With a focus on signal processing and its applications, the conference took place virtually from May 4 - 8. Read Apple’s accepted papers below.

Pitch Accent Detection Improves Pretrained Automatic Speech Recognition

Related readings and updates.

Controllable Neural Text-To-Speech Synthesis Using Intuitive Prosodic Features

ICASSP 2020

Discover opportunities in Machine Learning.