Apple sponsored the International Conference on Acoustics, Speech and Signal Processing (ICASSP), which took place in person from June 4 to 10 in Rhodes Island, Greece. ICASSP is the IEEE Signal Processing Society's flagship conference on signal processing and its applications. Below was the schedule of Apple sponsored workshops and events at ICASSP 2023.
Schedule
Tuesday, June 6
- I See What You Hear: A Vision-inspired Method to Localize Words
- 10:50 AM - 12:20 PM LT in Salon des Roses A
- Mohammad Samragh, Arnav Kundu, Ting-Yao Hu, Aman Chadha, Ashish Srivastava, Minsik Cho, Oncel Tuzel, Devang Naik
- Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
- 10:50 AM - 12:20 PM LT in Poster Area 4 - Garden
- Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang
- Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
- 2:00 - 3:30 PM LT in Poster Area 2 - Garden
- Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel
- Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
- 2:00 - 3:30 PM LT in Poster Area 3 - Garden
- Stefan Braun, Erik McDermott, Roger Hsiao
- More Speaking or More Speakers?
- 2:00 - 3:30 PM LT in Poster Area 3 - Garden
- Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko
- Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR
- 2:00 - 3:30 PM LT in Poster Area 4 - Garden
- Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik
- SLT-L6: Language Modeling and Spoken Language Understanding
- 3:35 - 5:05 PM EEST in Room Delphi
Wednesday, June 7
- HEiMDaL: Highly Efficient Method for Detection and Localization of wake-words
- 8:15 - 9:45 AM LT in Poster Area 8 - Dome
- Arnav Kundu, Mohammad Samragh Razlighi, Minsik Cho, Priyanka Padmanabhan, Devang Naik
- Past, Present and Future of Signal Processing
- 5:15 - 6:45 PM LT in the Jupiter Ballroom
- Alex Acero
- Women in Signal Processing
- 12:20 - 2:20 PM LT at the Ambrosia Restaurant
Thursday, June 8
- Naturalistic Head Motion Generation From Speech
- 10:50 AM - 12:20 PM LT in Salon des Roses A
- Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald
- Student Job Fair and Luncheon
- 12:00 - 3:00 PM LT at the Ambrosia Restaurant
- Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis
- 2:00 - 3:30 PM LT in Poster Area 4 - Garden
- Vikramjit Mitra, Vasudha Kowtha, Hsiang-Yun Sherry Chien, Erdrin Azemi, Carlos Avendano
- On the Role of Lip Articulation in Visual Speech Perception
- 2:00 - 3:30 PM LT in Poster Area 10 - Dome
- Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald
- Less Is More: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types
- 2:00 - 3:30 PM LT in Poster Area 4- Garden
- Ognjen Rudovic, Wonil Chang, Vineet Garg, Pranay Dighe, Pramod Jaya Simha, John Berkowitz, Ahmed Hussen Abdelaziz, Erik Marchi, Sachin Kajarekar, Saurabh Adya
- Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations
- 3:35 - 5:05 PM LT in Poster Area 2 - Garden
- Vasudha Kowtha, Miquel Espi, Jonathan J Huang, Yichi Zhang, Carlos Avendano
Friday, June 9
- Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings
- 8:15 - 9:45 AM in Poster Area 4 - Garden
- Hao Yen, Woojay Jeon
Accepted Papers
Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR
Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik
HEiMDaL: Highly Efficient Method for Detection and Localization of wake-words
Arnav Kundu, Mohammad Samragh Razlighi, Minsik Cho, Priyanka Padmanabhan, Devang Naik
I See What You Hear: A Vision-inspired Method to Localize Words
Mohammad Samragh, Arnav Kundu, Ting-Yao Hu, Aman Chadha, Ashish Srivastava, Minsik Cho, Oncel Tuzel, Devang Naik
Hao Yen, Woojay Jeon
Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations
Vasudha Kowtha, Miquel Espi, Jonathan J Huang, Yichi Zhang, Carlos Avendano
More Speaking or More Speakers?
Dan Berrebbi, Ronan Collobert, Navdeep Jaitly, Tatiana Likhomanenko
Naturalistic Head Motion Generation From Speech
Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald
Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
Stefan Braun, Erik McDermott, Roger Hsiao
On the Role of Lip Articulation in Visual Speech Perception
Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald
Oggi Rudovic, Wonil Chang, Vineet Garg, Pranay Dighe, Pramod Simha, Jack Berkowitz, Ahmed H. Abdelaziz, Sachin Kajarekar, Erik Marchi, Saurabh Adya
Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis
Vikramjit Mitra, Vasudha Kowtha, Hsiang-Yun Sherry Chien, Erdrin Azemi, Carlos Avendano
Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang
Demo
Please stop by the Apple booth (number 16, located next to the Dome Bar main entrance of the Rodos Palace Luxury Convention Resort) anytime from Tuesday to Friday to interact with our demo.
Contextual Understanding in Siri
This is a demonstration of the context understanding technology shipped in Siri. Users can refer to an aforementioned entity using anaphora or nominal ellipsis, refer to an entity on screen, or correct a previous error by Siri or the user. Context understanding for Siri leverages several backend ML solutions such as query rewriting and reference resolution. This work is a step towards having more natural conversations with Siri, and was shipped in iOS 16.
All ICASSP attendees were invited to stop by the Apple booth to experience this demo in person.
Acknowledgements
Tatiana Likhomanenko, Arnav Kundu, Stefan Braun, Vikram Mitra, and Pawel Swietojanski are reviewers for ICASSP 2023.
Yannis Stylianou is a Seasonal School & Short Course Chair for ICASSP 2023.
Ahmed Hussen Abdelaziz is the Meta Reviewer of SLT-L6: Language Modeling and Spoken Language Understanding for ICASSP 2023.
Let's innovate together. Build amazing machine-learned experiences with Apple. Discover opportunities for researchers, students, and developers by visiting our Work with us page.
Related readings and updates.
International Conference on Computer Vision (ICCV) 2023
Apple is sponsoring the International Conference on Computer Vision (ICCV), which will take place in person from October 2 to 6 in Paris, France. ICCV is an international conference that includes computer vision workshops and tutorials. Below is the schedule of Apple-sponsored workshops and events at ICCV 2023.
International Conference on Machine Learning (ICML) 2023
Apple sponsored the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which took place in person from June 18 to 22 in Vancouver, Canada. CVPR is an annual computer vision event comprising the main conference and several co-located workshops and courses. Below was the schedule of our sponsored workshops and events at CVPR 2023.