View publication

With the help of creative prompt engineering and in-context learning, large language models (LLMs) are known to generalize well on a variety of text-based natural language processing (NLP) tasks. However, for performing well on spoken language understanding (SLU) tasks, LLMs either need to be equipped with in-built speech modality or they need to rely on speech-to-text conversion from an off-the-shelf automation speech recognition (ASR) system. In this work, we focus on the latter setup where the accuracy of LLM on SLU tasks is constrained by the accuracy of a frozen ASR system on the given speech input. Specifically, we tackle the task of speech intent classification where a high word-error-rate (WER) implies that the LLM may not have the correct textual information to understand the spoken intent. To alleviate this problem, we propose to prompt the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis. We first explore prompting the LLM with descriptive prompts which explain the concept of n-best lists to invoke LLM's emergent abilities to understand the task; followed by finetuning of LoRA adapters on the intent classification task. We demonstrate the efficacy of our approach on a binary device-directed speech detection task as well as on a keyword spotting task on Google speech commands dataset where systems using n-best list prompts outperform the ones using 1-best ASR outputs; thus paving way for an efficient method to exploit ASR uncertainty via LLMs for speech-based applications.

Related readings and updates.

Keyframer: Empowering Animation Design using Large Language Models

Large language models (LLMs) have the potential to impact a wide range of creative domains, as exemplified in popular text-to-image generators like DALL·E and Midjourney. However, the application of LLMs to motion-based visual design has not yet been explored and presents novel challenges such as how users might effectively describe motion in natural language. Further, many existing generative design tools lack support for iterative refinement of…
See paper details

Gender Bias in LLMs

Large Language Models (LLMs) have made substantial progress in the past several months, shattering state-of-the-art benchmarks in many domains. This paper investigates LLMs' behavior with respect to gender stereotypes, a known stumbling block for prior models. We propose a simple paradigm to test the presence of gender bias, building on but differing from WinoBias, a commonly used gender bias dataset which is likely to be included in the training…
See paper details