Apple sponsored the 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). The conference focuses on signal processing and its applications and takes place virtually from June 6 to 11.

Accepted Papers

Conference Accepted Papers

Dynamic curriculum learning via data parameters for noise robust keyword spotting

Takuya Higuchi, Shreyas Saxena, Mehrez Souden, Tien Dung Tran, Masood Delfarah, Chandra Dhir

Error-driven Pruning of Language Models for Virtual Assistants

Sashank Gondala, Lyan Verwimp, Ernie Pusateri, Manos Tsagkias, Christophe Van Gysel

Generating Natural Questions from Images for Multimodal Assistants

Alkesh Patel, Akanksha Bindal, Hadas Kotek, Christopher Klein, Jason Williams

Knowledge Transfer for Efficient On-device False Trigger Mitigation

Pranay Dighe, Erik Marchi, Srikanth Vishnubhotla, Sachin Kajarekar, Devang Naik

Multimodal Punctuation Prediction with Contextual Dropout

Andrew Silva, Barry Theobald, Nick Apostoloff

Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End Metric

Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel

Progressive Voice Trigger Detection: Accuracy vs Latency

Siddharth Sigtia, John Bridle, Hywel Richards, Pascal Clark, Vineet Garg, Erik Marchi

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Ting-Yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel

Sep-28k: A Dataset for Stuttering Event Detection from Podcasts with People Who Stutter

Colin Lea, Vikram Mitra, Aparna Joshi, Sachin Kajarekar, Jeffrey Bigham

Special Session Accepted Paper

On the role of visual cues in audiovisual speech enhancement

Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen Abdelaziz

Conference Talks and Workshops

Apple organized a special session, Recent Advances in Multichannel and Multimodal Machine Learning for Speech Applications on June 10 at 4:30 PDT. We had an accepted paper at this session, On the Role of Visual Cues in Audiovisual Speech Enhancement.

Apple sponsored the Women in Signal Processing virtual panel which was held on June 10 at 9:30 am PDT.

Let's innovate together. Build amazing machine-learned experiences with Apple. Discover opportunities for researchers, students, and developers by visiting our Work With Us page.

Related readings and updates.


Apple sponsored Acoustics, Speech, and Signal Processing (ICASSP), which was held in a hybrid format. The virtual event took place on May 7 to 13, and the hybrid main conference on May 22 to 27. ICASSP is the IEEE Signal Processing Society’s flagship conference on signal processing and its applications.

See event details

On the Role of Visual Cues in Audiovisual Speech Enhancement

We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual cues provide not only high-level information about speech activity, i.e., speech/silence, but also fine-grained visual information about the place of articulation. One byproduct of this finding is…
See paper details