View publication

In this paper, we address the problem of noise-robust multiple-input multiple-output (MIMO) adaptive filtering that is optimal in least-squares sense with application to multichannel acoustic echo cancellation. We formulate the problem as minimization of a multichannel least squares cost function that incorporates near-end speech and noise statistics resulting in a novel noise-robust framework for MIMO adaptive filtering. Although the issue of numerical stability has been widely explored in the context of recursive least squares (RLS) filtering, a rigorous mathematical treatment of the MIMO case in the context of numerically stable noise-robust multichannel echo cancellation remains absent. Guided by quantization-error modeling, we resolve the issue of numerical instability in our noise-robust scheme by utilizing transversal RLS filtering of Type 2. Thereafter, an explicit derivation of its inverse QR-decomposition (IQRD) counterpart based on Givens rotations is presented. We also derive computationally efficient lattice forms for our noise-robust RLS Type-2 and IQRD algorithms. It is highlighted that propagation of angle-normalized errors occurs naturally within the numerically stable least squares lattice (LSL). Thus, our approach combines the four sought after attributes in a multichannel echo cancellation scheme, i.e., computational efficiency, numerical stability, fast convergence and tracking, and robustness against noise. We analyze our formulations using simulations in terms of convergence, re-convergence, robustness in the presence of double-talk, and numerical stability.

Related readings and updates.

Robust Multichannel Linear Prediction for Online Speech Dereverberation Using Weighted Householder Least Squares Lattice Adaptive Filter

Speech dereverberation has been an important component of effective far-field voice interfaces in many applications. Algorithms based on multichannel linear prediction (MCLP) have been shown to be especially effective for blind speech dereverberation and numerous variants have been introduced in the literature. Most of these approaches can be derived from a common framework, where the MCLP problem for speech dereverberation is formulated as a…
See paper details

Optimizing Siri on HomePod in Far‑Field Settings

The typical audio environment for HomePod has many challenges — echo, reverberation, and noise. Unlike Siri on iPhone, which operates close to the user’s mouth, Siri on HomePod must work well in a far-field setting. Users want to invoke Siri from many locations, like the couch or the kitchen, without regard to where HomePod sits. A complete online system, which addresses all of the environmental issues that HomePod can experience, requires a tight integration of various multichannel signal processing technologies. Accordingly, the Audio Software Engineering and Siri Speech teams built a system that integrates both supervised deep learning models and unsupervised online learning algorithms and that leverages multiple microphone signals. The system selects the optimal audio stream for the speech recognizer by using top-down knowledge from the “Hey Siri” trigger phrase detectors. In this article, we discuss the machine learning techniques we use for online signal processing, as well as the challenges we faced and our solutions for achieving environmental and algorithmic robustness while ensuring energy efficiency.

See highlight details