Apple sponsored the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which took place in person from June 17 to 21 in Seattle, Washington. CVPR is the annual computer vision event comprising the main conference and several co-located workshops and short courses. Below was the schedule of our sponsored workshops and events at CVPR 2024.
Schedule
Stop by the Apple booth in the Arch Building, Exhibit Hall Level 4, booth #1905, from 10:30am - 6:30pm PST June 19 and 20; 10:00am - 3:00pm PST on June 21.
Monday, June 17
- Workshop on EFFICIENT LARGE VISION MODELS 2024
- 8:00am PST - 12:35pm PST, Summit 420-422
- SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
- Haoxiang Wang (University of Illinois Urbana-Champaign), Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pour Ansari
- Workshop on Image Matching: Local Features & Beyond 2024
- 1:00pm PST - 5:45pm PST, Summit 323
- Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching
- Hongkai Chen, Zixin Luo, Ray Tian, Aron Wang, Lei (VE) Zhou, Xuyang Bai, Mingmin Zhen, Tian Fang, Yanghai Tsin, David McKinnon, Long Quan (The Hong Kong University of Science of Technology)
Tuesday, June 18
- LatinX in CV (LXCV) at CVPR 2024
- 8:30am PST - 6:00pm PST, Arch 203
- Marcel Santos, Conor O'Brien and Angus Choi are representing Apple at the Latin X workshop events.
- Women in Computer Vision (WiCV) at CVPR 2024
- 8:30am PST - 6:00pm PST, Arch 201
- Lisa Ta, Tanya Glozman, Diane Zhu, and Syenny Syenny are representing Apple at the WiCV workshop events.
Wednesday, June 19
- HUGS: Human Gaussian Splatting
- 10:30am PST - 12:00pm PST, #32, Poster Session 1 & Exhibit Hall (Arch 4A-E)
- Muhammed Kocabas (Max Planck Institute for Intelligent Systems), Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan
- HumMUSS: Human Motion Understanding using State Space Models
- 10:30am PST - 12:00pm PST, #207, Poster Session 1 & Exhibit Hall (Arch 4A-E)
- Arnab Mondal (McGill University), Stefano Alletto, Denis Tome’
- Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
- 5:15pm PST - 6:45pm PST, #382, Poster Session 2 & Exhibit Hall (Arch 4A-E)
- Yuanxun Lu (Nanjing University), Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan (The Hong Kong University of Science and Technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University)
- KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
- 5:15pm PST - 6:45pm PST, #66, Poster Session 2 & Exhibition Hall (Arch 4A-E)
- Hugues Thomas, Hubert Tsai, Tim Barfoot (University of Toronto), Jian (AIML) Zhang
- Efficient Diffusion Models without Attention
- 5:15pm PST - 6:45pm PST, #334, Poster Session 2 & Exhibit Hall (Arch 4A-E)
- Jing Nathan Yan (Cornell University), Jiatao Gu, Alexander M. Rush (Cornell University)
Thursday, June 20
- MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
- 5:15pm PST - 6:45pm PST, #130, Poster Session 4 & Exhibit Hall (Arch 4A-E)
- Pavan Kumar Anasosalu Vasu, Hadi Pour Ansari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel
Friday, June 21
- Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
- 5:15pm PST - 6:45pm PST, #302, Poster Session 6 & Exhibit Hall (Arch 4A-E)
- Karren Yang, Anurag Ranjan, Rick Chang, Raviteja Vemulapalli, Oncel Tuzel
Accepted Papers
Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching Hongkai Chen, Zixin Luo, Ray Tian, Aron Wang, Lei (VE) Zhou, Xuyang Bai, Mingmin Zhen, Tian Fang, Yanghai Tsin, David McKinnon, Long Quan (The Hong Kong University of Science of Technology)
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion Yuanxun Lu (Nanjing University), Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan (The Hong Kong University of Science and Technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University)
KPConvX: Modernizing Kernel Point Convolution with Kernel Attention Hugues Thomas, Hubert Tsai, Tim Barfoot (University of Toronto), Jian (AIML) Zhang
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding Haoxiang Wang (University of Illinois Urbana-Champaign), Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pour Ansari
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training Pavan Kumar Anasosalu Vasu, Hadi Pour Ansari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications Karren Yang, Anurag Ranjan, Rick Chang, Raviteja Vemulapalli, Oncel Tuzel
Efficient Diffusion Models without Attention Jing Nathan Yan (Cornell University), Jiatao Gu, Alexander M. Rush (Cornell University)
HUGS: Human Gaussian Splatting Muhammed Kocabas (Max Planck Institute for Intelligent Systems), Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan
HumMUSS: Human Motion Understanding using State Space Models Arnab Mondal (McGill University), Stefano Alletto, Denis Tome’
Demos
MobileCLIP: Real-Time Image-Text Models
Wednesday, June 19 - Friday June 21, during exhibition hours
Demo shows zero-shot scene classification running real-time on an iPhone. Since these models align image and text modalities, they can perform zero-shot image classification or image-text/text-image retrieval at blazing speeds. The app showcases the research work, “MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training” being presented at the same venue. The app is built by David Koski, Megan Maher Welsh with contributions from Hugues Thomas, Mouli Sivapurapu, Jian Zhang.
Flow Composer for Apple ML
Wednesday, June 19 - Friday June 21, during exhibition hours
Demo shows usage of Apple ML features on Mac Book Pro and iPad, which leverages several technologies such as Vision, CoreML, Core Graphics.
Acknowledgements
Alex Schwing and Philipp Kraehenbuehl are Senior Area Chairs for CVPR 2024.
Alex Toshev, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pour Ansari and Fartash Faghri are Area Chairs for CVPR 2024.
Fartash Faghri, Jason Ren, Jianrui Cai, Jiajia Luo, Jierui Lin, Liangchen Song, Or Dinari, Pavan Kumar Anasosalu Vasu, Peter Fu, Raviteja Vemulapalli, Haotian Zhang, Hong-You Chen, Wen Shi, Yongzhi Su, Yuyan Li, Trevine Oorloff, Yongxi Lu and Jeff Lai are reviewers for CVPR 2024.
Anshul Shah is a co-organizer for the workshop Learning from Procedural Videos and Language: What is Next?
Jeff Bigham is a co-organizer for the VizWiz Grand Challenge Workshop
Pau Rodriguez Lopez is a co-organizer for the Workshop on Continual Learning in Computer Vision
Jeff Lai has a PhD dissertation selected for Doctoral Consortium.
Related readings and updates.
Neural Information Processing Systems (NeurIPS) 2024
Apple is presenting new research at the annual conference on Neural Information Processing Systems (NeurIPS), which takes place in person in Vancouver, Canada, from December 10 - 15. We are proud to again sponsor the multi-track interdisciplinary conference, which brings together the scientific and industrial research communities surrounding Machine Learning. Below is an overview of Apple’s participation at NeurIPS 2024.
International Conference on Learning Representations (ICLR) 2024
Apple sponsored the International Conference on Learning Representations (ICLR), which took place in person from May 7 to 11 in Vienna Austria. ICLR brings together professionals dedicated to the advancement of deep learning.