eventJune 6, 2023

International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023

Apple sponsored the International Conference on Acoustics, Speech and Signal Processing (ICASSP), which took place in person from June 4 to 10 in Rhodes Island, Greece. ICASSP is the IEEE Signal Processing Society's flagship conference on signal processing and its applications. Below was the schedule of Apple sponsored workshops and events at ICASSP 2023.

Schedule

Tuesday, June 6

ORAL PRESENTATION
I See What You Hear: A Vision-inspired Method to Localize Words
10:50 AM - 12:20 PM LT in Salon des Roses A
Mohammad Samragh, Arnav Kundu, Ting-Yao Hu, Aman Chadha, Ashish Srivastava, Minsik Cho, Oncel Tuzel, Devang Naik
POSTER PRESENTATION
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
10:50 AM - 12:20 PM LT in Poster Area 4 - Garden
Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang
POSTER PRESENTATION
Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
2:00 - 3:30 PM LT in Poster Area 2 - Garden
Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel
POSTER PRESENTATION
Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
2:00 - 3:30 PM LT in Poster Area 3 - Garden
Stefan Braun, Erik McDermott, Roger Hsiao
POSTER PRESENTATION
More Speaking or More Speakers?
2:00 - 3:30 PM LT in Poster Area 3 - Garden
Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko
POSTER PRESENTATION
Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR
2:00 - 3:30 PM LT in Poster Area 4 - Garden
Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik
ORAL PRESENTATION
SLT-L6: Language Modeling and Spoken Language Understanding
3:35 - 5:05 PM EEST in Room Delphi

Wednesday, June 7

POSTER PRESENTATION
HEiMDaL: Highly Efficient Method for Detection and Localization of wake-words
8:15 - 9:45 AM LT in Poster Area 8 - Dome
Arnav Kundu, Mohammad Samragh Razlighi, Minsik Cho, Priyanka Padmanabhan, Devang Naik
PANEL
Past, Present and Future of Signal Processing
5:15 - 6:45 PM LT in the Jupiter Ballroom
Alex Acero
LUNCHEON
Women in Signal Processing
12:20 - 2:20 PM LT at the Ambrosia Restaurant

Thursday, June 8

ORAL PRESENTATION
Naturalistic Head Motion Generation From Speech
10:50 AM - 12:20 PM LT in Salon des Roses A
Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald
JOB FAIR
Student Job Fair and Luncheon
12:00 - 3:00 PM LT at the Ambrosia Restaurant
POSTER PRESENTATION
Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis
2:00 - 3:30 PM LT in Poster Area 4 - Garden
Vikramjit Mitra, Vasudha Kowtha, Hsiang-Yun Sherry Chien, Erdrin Azemi, Carlos Avendano
POSTER PRESENTATION
On the Role of Lip Articulation in Visual Speech Perception
2:00 - 3:30 PM LT in Poster Area 10 - Dome
Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald
POSTER PRESENTATION
Less Is More: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types
2:00 - 3:30 PM LT in Poster Area 4- Garden
Ognjen Rudovic, Wonil Chang, Vineet Garg, Pranay Dighe, Pramod Jaya Simha, John Berkowitz, Ahmed Hussen Abdelaziz, Erik Marchi, Sachin Kajarekar, Saurabh Adya
POSTER PRESENTATION
Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations
3:35 - 5:05 PM LT in Poster Area 2 - Garden
Vasudha Kowtha, Miquel Espi, Jonathan J Huang, Yichi Zhang, Carlos Avendano

Friday, June 9

POSTER PRESENTATION
Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings
8:15 - 9:45 AM in Poster Area 4 - Garden
Hao Yen, Woojay Jeon

Accepted Papers

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik

HEiMDaL: Highly Efficient Method for Detection and Localization of wake-words

Arnav Kundu, Mohammad Samragh Razlighi, Minsik Cho, Priyanka Padmanabhan, Devang Naik

I See What You Hear: A Vision-inspired Method to Localize Words

Mohammad Samragh, Arnav Kundu, Ting-Yao Hu, Aman Chadha, Ashish Srivastava, Minsik Cho, Oncel Tuzel, Devang Naik

Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings

Hao Yen, Woojay Jeon

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations

Vasudha Kowtha, Miquel Espi, Jonathan J Huang, Yichi Zhang, Carlos Avendano

More Speaking or More Speakers?

Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko

Naturalistic Head Motion Generation From Speech

Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

Stefan Braun, Erik McDermott, Roger Hsiao

On the Role of Lip Articulation in Visual Speech Perception

Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald

Less Is More: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types

Oggi Rudovic, Wonil Chang, Vineet Garg, Pranay Dighe, Pramod Simha, Jack Berkowitz, Ahmed H. Abdelaziz, Sachin Kajarekar, Erik Marchi, Saurabh Adya

Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis

Vikramjit Mitra, Vasudha Kowtha, Hsiang-Yun Sherry Chien, Erdrin Azemi, Carlos Avendano

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang

Demo

Please stop by the Apple booth (number 16, located next to the Dome Bar main entrance of the Rodos Palace Luxury Convention Resort) anytime from Tuesday to Friday to interact with our demo.

Contextual Understanding in Siri

This is a demonstration of the context understanding technology shipped in Siri. Users can refer to an aforementioned entity using anaphora or nominal ellipsis, refer to an entity on screen, or correct a previous error by Siri or the user. Context understanding for Siri leverages several backend ML solutions such as query rewriting and reference resolution. This work is a step towards having more natural conversations with Siri, and was shipped in iOS 16.

All ICASSP attendees were invited to stop by the Apple booth to experience this demo in person.

Acknowledgements

Tatiana Likhomanenko, Arnav Kundu, Stefan Braun, Vikram Mitra, and Pawel Swietojanski are reviewers for ICASSP 2023.

Yannis Stylianou is a Seasonal School & Short Course Chair for ICASSP 2023.

Ahmed Hussen Abdelaziz is the Meta Reviewer of SLT-L6: Language Modeling and Spoken Language Understanding for ICASSP 2023.

Let's innovate together. Build amazing machine-learned experiences with Apple. Discover opportunities for researchers, students, and developers by visiting our Work with us page.

International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023

Schedule

Tuesday, June 6

Wednesday, June 7

Thursday, June 8

Friday, June 9

Accepted Papers

Demo

Acknowledgements

Related readings and updates.

Neural Information Processing Systems (NeurIPS) 2024

International Conference on Machine Learning (ICML) 2023

Discover opportunities in Machine Learning.