Online Automatic Speech Recognition With Listen, Attend and Spell Model
AuthorsRoger Hsiao, Dogan Can, Tim Ng, Ruchir Travadi, Arnab Ghoshal
Online Automatic Speech Recognition With Listen, Attend and Spell Model
AuthorsRoger Hsiao, Dogan Can, Tim Ng, Ruchir Travadi, Arnab Ghoshal
The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this letter, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propose a novel and simple technique that can achieve fully online recognition while meeting accuracy and latency targets. For the Mandarin dictation task, our proposed approach can achieve a character error rate in online operation that is within 4% relative to an offline LAS model. The proposed online LAS model operates at 12% lower latency relative to a conventional neural network hidden Markov model hybrid of comparable accuracy. We have validated the proposed method through a production scale deployment, which, to the best of our knowledge, is the first such deployment of a fully online LAS model.
Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection
September 23, 2025research area Methods and Algorithms, research area Tools, Platforms, Frameworks
With the deployment of Large Language Models (LLMs) in interactive applications, online malicious intent detection has become increasingly critical. However, existing approaches fall short of handling diverse and complex user queries in real time. To address these challenges, we introduce ADRAG (Adversarial Distilled Retrieval-Augmented Guard), a two-stage framework for robust and efficient online malicious intent detection. In the training…
BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale
December 11, 2024research area Knowledge Bases and Search, research area Methods and Algorithmsconference AAAI
Information Retrieval (IR) systems used in search and recommendation platforms frequently employ Learning-to-Rank (LTR) models to rank items in response to user queries. These models heavily rely on features derived from user interactions, such as clicks and engagement data. This dependence introduces cold start issues for items lacking user engagement and poses challenges in adapting to non-stationary shifts in user behavior over time. We…