Double-talk Robust Multichannel Acoustic Echo Cancellation Using Least Squares MIMO Adaptive Filtering: Transversal, Array, and Lattice Forms
AuthorsSarmad Malik, Jason Wung, Joshua Atkins, Devang Naik
AuthorsSarmad Malik, Jason Wung, Joshua Atkins, Devang Naik
In this paper, we address the problem of noise-robust multiple-input multiple-output (MIMO) adaptive filtering that is optimal in least-squares sense with application to multichannel acoustic echo cancellation. We formulate the problem as minimization of a multichannel least squares cost function that incorporates near-end speech and noise statistics resulting in a novel noise-robust framework for MIMO adaptive filtering. Although the issue of numerical stability has been widely explored in the context of recursive least squares (RLS) filtering, a rigorous mathematical treatment of the MIMO case in the context of numerically stable noise-robust multichannel echo cancellation remains absent. Guided by quantization-error modeling, we resolve the issue of numerical instability in our noise-robust scheme by utilizing transversal RLS filtering of Type 2. Thereafter, an explicit derivation of its inverse QR-decomposition (IQRD) counterpart based on Givens rotations is presented. We also derive computationally efficient lattice forms for our noise-robust RLS Type-2 and IQRD algorithms. It is highlighted that propagation of angle-normalized errors occurs naturally within the numerically stable least squares lattice (LSL). Thus, our approach combines the four sought after attributes in a multichannel echo cancellation scheme, i.e., computational efficiency, numerical stability, fast convergence and tracking, and robustness against noise. We analyze our formulations using simulations in terms of convergence, re-convergence, robustness in the presence of double-talk, and numerical stability.
The typical audio environment for HomePod has many challenges — echo, reverberation, and noise. Unlike Siri on iPhone, which operates close to the user’s mouth, Siri on HomePod must work well in a far-field setting. Users want to invoke Siri from many locations, like the couch or the kitchen, without regard to where HomePod sits. A complete online system, which addresses all of the environmental issues that HomePod can experience, requires a tight integration of various multichannel signal processing technologies. Accordingly, the Audio Software Engineering and Siri Speech teams built a system that integrates both supervised deep learning models and unsupervised online learning algorithms and that leverages multiple microphone signals. The system selects the optimal audio stream for the speech recognizer by using top-down knowledge from the “Hey Siri” trigger phrase detectors. In this article, we discuss the machine learning techniques we use for online signal processing, as well as the challenges we faced and our solutions for achieving environmental and algorithmic robustness while ensuring energy efficiency.