Acoustic activity recognition has emerged as a foundational element for imbuing devices with context-driven capabilities, enabling richer, more assistive, and more accommodating computational experiences. Traditional approaches rely either on custom models trained in situ, or general models pre-trained on preexisting data, with each approach having accuracy and user burden implications. We present Listen Learner, a technique for activity recognition that gradually learns events specific to a deployed environment while minimizing user burden. Specifically, we built an end-to-end system for self-supervised learning of events labelled through one-shot interaction. We describe and quantify system performance 1) on preexisting audio datasets, 2) on real-world datasets we collected, and 3) through user studies which uncovered system behaviors suitable for this new type of interaction. Our results show that our system can accurately and automatically learn acoustic events across environments (e.g., 97% precision, 87% recall), while adhering to users’ preferences for non-intrusive interactive behavior.
Related readings and updates.
Apple had three papers accepted at the conference of Human-Computer Interaction (CHI), the premier international conference on interactive technology, in April 2020. Researchers from across the world gather at CHI to discuss, research, and design new ways for people to interact using technology. Although the conference was not held this year, you can read the accepted papers below.
Apple introduced the "Hey Siri" feature with the iPhone 6 (iOS 8). This feature allows users to invoke Siri without having to press the home button. When a user says, "Hey Siri, how is the weather today?" the phone wakes up upon hearing "Hey Siri" and processes the rest of the utterance as a Siri request. The feature's ability to listen continuously for the "Hey Siri" trigger phrase lets users access Siri in situations where their hands might be otherwise occupied, such as while driving or cooking, as well as in situations when their respective devices are not within arm's reach. Imagine a scenario where a user is asking his or her iPhone 6 on the kitchen counter to set a timer while putting a turkey into the oven.