Apple is sponsoring the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), which takes place in person from April 6 to 11 in Hyderabad, India. ICASSP is a conference in the field of signal processing and its applications. Below is the schedule of Apple-sponsored workshops and events at ICASSP 2025.
Schedule
Stop by the Apple booth from April 6 to 11 from 09:00 to 17:00 at Booth C3 in the Hyderabad International Convention Center. All times listed are in GMT +5:30.
Monday, April 7
- WORKSHOP KEYNOTE TALK
- FLute: Federated Learning for Audio Understanding
- 09:00 - 10:00
- Location: Hall 4
- Private Federated Learning for Speech Recognition
- Tatiana Likhomanenko
Wednesday, April 9
- ORAL PRESENTATION
- Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
- 09:00 - 09:10, Enhancing ASR with Large Language Models, Room MRG.04
- Takaaki Hori, Martin Kocour (Brno University of Technology), Adnan Haider, Erik McDermott, Xiaodan Zhuang
- ORAL PRESENTATION
- Contextualization of ASR with LLM Using Phonetic Retrieval-Based Augmentation
- 09:30 - 09:40, Enhancing ASR with Large Language Models, Room MRG.04
- Zhihong Lei, Xingyu Na, Mingbin Xu, Ernest Pusateri, Christophe Van Gysel, Yuanyuan Zhang, Shiyi Han (Character AI), Zhen Huang
- POSTER
- Retrieval-Augmented Correction of Named Entity Speech Recognition Errors
- 11:30 - 13:00, Advances in LLMs and E2E Architectures for ASR, Room Poster 2F
- Ernest Pusateri, Anmol Walia, Anirudh Kashi, Bortik Bandyopadhyay, Nadia Hyder, Sayantan Mahinder, Raviteja Anantha, Daben Liu (Capital One), Sashank Gondala (Further AI)
- ORAL PRESENTATION
- An Efficient and Streaming Audio Visual Active Speaker Detection System
- 11:45 - 11:55, Image and Video Content Analysis I, Room MR1.04
- Arnav Kundu, Yanzi Jin, Max Horton, Mohammad Sekhavat, Danny Tormoen, Devang Naik
- ORAL PRESENTATION
- ImmerseDiffusion: Generative Spatial Audio Latent Diffusion Model
- 11:45 - 11:55, Sound Generation and Synthesis II, Room MR1.02
- Moji Heydari (University of Rochester), Mehrez Souden, Josh Atkins, Bruno Conejo
- NETWORKING LUNCHEON
- Women in Signal Processing (WiSP)
- 13:00 - 15:00
- Location: HICC Architecture Room
- Ahmed Tewfik and colleagues will be representing Apple at the Women in Signal Processing Luncheon.
- INDUSTRY FORUM TALK
- ICASSP 2025 Industry Forum
- 14:00 - 14:30
- Location: Industry Forum Hall
- Integrating Large Language Models with Distributed Systems: Architecture Patterns, Challenges and Innovations
- Venkataramanan Subramanian
- PLENARY TALK
- IEEE Norbert Weiner Plenary Talk
- 15:45 - 16:45
- Location: Room Hall 4
- Bridging Generative AI and Statistical Signal Processing
- Ahmed Tewfik
- POSTER
- Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector Based Pseudo-Labels
- 17:00 - 18:30, Speaker Recognition II, Room Poster 2D
- Shinji Watanabe (Carnegie Mellon University), Jeeweon Jung (Carnegie Mellon University), Ahmed Hussen Abdelaziz, Takuya Higuchi, Zak Aldeneh, Li-Wei Chen (Carnegie Mellon University), Stephen Shum, Tatiana Likhomanenko, Barry-John Theobald
Thursday, April 10
- POSTER
- SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions
- 08:30 - 10:00, Novel Audio, Speech and Language Modeling Techniques, Room Poster 2D
- Dominik Wagner (Friedrich-Alexander-Universitaet Erlangen-Nuernberg), Alex Churchill, Siddharth Sigtia, Erik Marchi
- POSTER
- SLiCK: Exploiting Subsequences for Length-Constrained Keyword Spotting
- 08:30 - 10:00, Audio-Visual and Robust Speech Synthesis, Room Poster 2C
- Kumari Nishu, Minsik Cho, Devang Naik
- POSTER
- Towards Automatic Assessment of Self-Supervised Speech Models using Rank
- 08:30 - 10:00, Speech Datasets and Model Assessment, Room Poster 2F
- Zak Aldeneh, Vimal Thilak, Takuya Higuchi, Tatiana Likhomanenko, Barry Theobald
- ORAL PRESENTATION
- Compact Neural TTS Voices for Accessibility
- 09:15 - 09:25, TTS Architecture, Room MRG.04
- Kunal Jain, Eoin Murphy, Deepanshu Gupta, Jonathan Dyke, Saumya Hiren Shah, Vasileios Tsiaras, Petko Petkov, Alistair Conkie
- ORAL PRESENTATION
- Exploring Prediction Targets in Masked Pre-training for Speech Foundation Models
- 15:00 - 15:10, Multilingual Speech Processing and Identification, Room MRG.04
- Li-Wei Chen (Carnegie Mellon University), Zak Aldeneh, Takuya Higuchi, Tatiana Likhomanenko, Richard Bai, Ahmed Hussen Abdelaziz, Barry Theobald
Accepted Papers
An Efficient and Streaming Audio Visual Active Speaker Detection System
Arnav Kundu, Yanzi Jin, Max Horton, Mohammad Sekhavat, Danny Tormoen, Devang Naik
Compact Neural TTS Voices for Accessibility
Kunal Jain, Eoin Murphy, Deepanshu Gupta, Jonathan Dyke, Saumya Hiren Shah, Vasileios Tsiaras, Petko Petkov, Alistair Conkie
Contextualization of ASR with LLM Using Phonetic Retrieval-Based Augmentation
Zhihong Lei, Xingyu Na, Mingbin Xu, Ernest Pusateri, Christophe Van Gysel, Yuanyuan Zhang, Shiyi Han (Character AI), Zhen Huang
Takaaki Hori, Martin Kocour (Brno University of Technology), Adnan Haider, Erik McDermott, Xiaodan Zhuang
Exploring Prediction Targets in Masked Pre-training for Speech Foundation Models
Li-Wei Chen (Carnegie Mellon University), Zak Aldeneh, Takuya Higuchi, Tatiana Likhomanenko, Richard Bai, Ahmed Hussen Abdelaziz, Barry Theobald
ImmerseDiffusion: Generative Spatial Audio Latent Diffusion Model
Moji Heydari (University of Rochester), Mehrez Souden, Josh Atkins, Bruno Conejo
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector Based Pseudo-Labels
Shinji Watanabe (Carnegie Mellon University), Jeeweon Jung (Carnegie Mellon University), Ahmed Hussen Abdelaziz, Takuya Higuchi, Zak Aldeneh, Li-Wei Chen (Carnegie Mellon University), Stephen Shum, Tatiana Likhomanenko, Barry-John Theobald
Retrieval-Augmented Correction of Named Entity Speech Recognition Errors
Ernest Pusateri, Anmol Walia, Anirudh Kashi, Bortik Bandyopadhyay, Nadia Hyder, Sayantan Mahinder, Raviteja Anantha, Daben Liu (Capital One), Sashank Gondala (Further AI)
SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions
Dominik Wagner (Friedrich-Alexander-Universitaet Erlangen-Nuernberg), Alex Churchill, Siddharth Sigtia, Erik Marchi
SLiCK: Exploiting Subsequences for Length-Constrained Keyword Spotting
Kumari Nishu, Minsik Cho, Devang Naik
Towards Automatic Assessment of Self-Supervised Speech Models Using Rank
Zak Aldeneh, Vimal Thilak, Takuya Higuchi, Tatiana Likhomanenko, Barry Theobald
Acknowledgements
Tatiana Likhomanenko is an Area Chair and Meta Reviewer for ICASSP 2025.
Arnav Kundu, Aswin Sivaraman, Kunal Jain, Kuan-Lin Chen, Kumari Nishu, Nimish Venkat Marigo, Parnia Bahar, Sameer Badaskar, Takaaki Hori, Tatiana Likhomanenko, Venki Nagesha, and Zak Aldeneh are reviewers for ICASSP 2025.
Venkataramanan Subramanian is presenting at the ICASSP 2025 Industry Forum.
Related readings and updates.
Neural Information Processing Systems (NeurIPS) 2024
Apple is presenting new research at the annual conference on Neural Information Processing Systems (NeurIPS), which takes place in person in Vancouver, Canada, from December 10 - 15. We are proud to again sponsor the multi-track interdisciplinary conference, which brings together the scientific and industrial research communities surrounding Machine Learning. Below is an overview of Apple’s participation at NeurIPS 2024.
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Apple sponsored the International Conference on Acoustics, Speech and Signal Processing (ICASSP), which took place in person from April 14 to 19 in Seoul, South Korea. ICASSP is the IEEE Signal Processing Society's flagship conference on signal processing and its applications.