Automatic Class Discovery and One-Shot Interactions for Acoustic Activity Recognition

In collaboration with Carnegie Mellon University

AuthorsJason Wu, Chris Harrison, Jeffrey P. Bigham, Gierad Laput

Acoustic activity recognition has emerged as a foundational element for imbuing devices with context-driven capabilities, enabling richer, more assistive, and more accommodating computational experiences. Traditional approaches rely either on custom models trained in situ, or general models pre-trained on preexisting data, with each approach having accuracy and user burden implications. We present Listen Learner, a technique for activity recognition that gradually learns events specific to a deployed environment while minimizing user burden. Specifically, we built an end-to-end system for self-supervised learning of events labelled through one-shot interaction. We describe and quantify system performance 1) on preexisting audio datasets, 2) on real-world datasets we collected, and 3) through user studies which uncovered system behaviors suitable for this new type of interaction. Our results show that our system can accurately and automatically learn acoustic events across environments (e.g., 97% precision, 87% recall), while adhering to users’ preferences for non-intrusive interactive behavior.

Automatic Class Discovery and One-Shot Interactions for Acoustic Activity Recognition

Related readings and updates.

Apple Workshop on Human-Centered Machine Learning 2024

CHI 2020

Discover opportunities in Machine Learning.