EMBridge: Enhancing Gesture Generalization from EMG Signals through Cross-Modal Representation Learning

AuthorsWenhui Cui†**, Christopher M. Sandino, Hadi Pouransar, Ran Liu, Juri Minxha, Ellen L. Zippi, Erdrin Azemi, Behrooz Mahasseni

View publication

Hand gesture classification using high-quality structured data such as videos, im- ages, and hand skeletons is a well-explored problem in computer vision. Alterna- tively, leveraging low-power, cost-effective bio-signals, e.g., surface electromyo- graphy (sEMG), allows for continuous gesture prediction on wearable devices. In this work, we aim to enhance EMG representation quality by aligning it with embeddings obtained from structured, high-quality modalities that provide richer semantic guidance, ultimately enabling zero-shot gesture generalization. Specif- ically, we propose EMBridge, a cross-modal representation learning framework that bridges the modality gap between EMG and pose. EMBridge learns high- quality EMG representations by introducing a Querying Transformer (Q-Former), a masked pose reconstruction loss, and a community-aware soft contrastive learn- ing objective that aligns the relative geometry of the embedding spaces. We eval- uate EMBridge on both in-distribution and unseen gesture classification tasks and demonstrate consistent performance gains over all baselines. To the best of our knowledge, EMBridge is the first cross-modal representation learning framework to achieve zero-shot gesture classification from wearable EMG signals, showing potential toward real-world gesture recognition on wearable devices.

† University of Southern California
** Work done while at Apple

EMBridge: Enhancing Gesture Generalization from EMG Signals through Cross-Modal Representation Learning

Related readings and updates.

CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals

Vision-Based Hand Gesture Customization from a Single Demonstration

Discover opportunities in Machine Learning.