View publication

Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the user’s query. Detecting the intent of a query from a short, isolated utterance is a difficult task. Intent cannot always be obtained from speech-recognized transcriptions. A transcription-driven approach can interpret what has been said but fails to acknowledge how it has been said, and as a consequence, may ignore the expres- sion present in the voice. Our work investigates whether a system can reliably detect vocal expression in queries using acoustic and paralinguistic embedding. Results show that the proposed method offers a relative equal error rate (EER) decrease of 60% compared to a bag-of-word based system, corroborating that expression is significantly represented by vocal attributes, rather than being purely lexical. Addition of emotion embedding helped to reduce the EER by 30% relative to the acoustic embedding, demonstrating the relevance of emotion in expressive voice.

Related readings and updates.

Apple at ICASSP 2020

Apple sponsored the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in May 2020. With a focus on signal processing and its applications, the conference took place virtually from May 4 - 8. Read Apple’s accepted papers below.

See event details

Apple at Interspeech 2019

Apple attended Interspeech 2019, the world's largest conference on the science and technology of spoken language processing. The conference took place in Graz, Austria from September 15th to 19th. See accepted papers below.

Apple continues to build cutting-edge technology in the space of machine hearing, speech recognition, natural language processing, machine translation, text-to-speech, and artificial intelligence, improving the lives of millions of customers every day.

See event details