Using LLMs for Late Multimodal Sensor Fusion for Activity Recognition
AuthorsIlker Demirel, Karan Ketankumar Thakkar, Benjamin Elizalde, Miquel Espi Marques, Shirley Ren, Jaya Narain
Using LLMs for Late Multimodal Sensor Fusion for Activity Recognition
AuthorsIlker Demirel, Karan Ketankumar Thakkar, Benjamin Elizalde, Miquel Espi Marques, Shirley Ren, Jaya Narain
This paper was accepted at the Learning from Time Series for Health workshop at NeurIPS 2025.
Sensor data streams provide valuable information around activities and context for downstream applications, though integrating complementary information can be challenging. We show that large language models (LLMs) can be used for late fusion for activity classification from audio and motion time series data. We curated a subset of data for diverse activity recognition across contexts (e.g., household activities, sports) from the Ego4D dataset. Evaluated LLMs achieved 12-class zero- and one-shot classification F1-scores significantly above chance, with no task-specific training. Zero-shot classification via LLM-based fusion from modality-specific models can enable multimodal temporal applications where there is limited aligned training data for learning a shared embedding space. Additionally, LLM-based fusion can enable model deploying without requiring additional memory and computation for targeted application-specific multimodal models.
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
January 18, 2025research area Speech and Natural Language Processingconference ICASSP
This paper presents an efficient decoding approach for end-to-end automatic speech recognition (E2E-ASR) with large language models (LLMs). Although shallow fusion is the most common approach to incorporate language models into E2E-ASR decoding, we face two practical problems with LLMs. (1) LLM inference is computationally costly. (2) There may be a vocabulary mismatch between the ASR model and the LLM. To resolve this mismatch, we need to…
Towards Time-Series Reasoning with LLMs
December 3, 2024research area Methods and Algorithms, research area Speech and Natural Language Processingconference NeurIPS
Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance in time-series forecasting, very few works show how an LLM could be used for time-series reasoning in natural language. We propose a novel multi-modal time-series LLM approach that…