View publication

Neural Network Language Models (NNLMs) of Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to scale and maintain them. Combining NNLMs for one or more of the categories could be one way to improve scalability. In this work, we combine regional variants of English by building a "World English" NNLM. We examine three data sampling techniques and we experiment with adding adapter bottlenecks to the existing production NNLMs to model dialect-specific characteristics and investigate different strategies to train adapters. We find that adapter modules are more effective in modeling dialects than specialized sub-networks containing a set of feedforward layers. Our experimental results show that adapter-based architectures can achieve up to 4.57% Word Error Rate (WER) reduction over single-dialect baselines on head-heavy test sets and up to 8.22% on tail entities.

Related readings and updates.

Training Large-Vocabulary Neural Language Model by Private Federated Learning for Resource-Constrained Devices

*= Equal Contributors Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, which often prevents…
See paper details

Bridging the Domain Gap for Neural Models

Deep neural networks are a milestone technique in the advancement of modern machine perception systems. However, in spite of the exceptional learning capacity and improved generalizability, these neural models still suffer from poor transferability. This is the challenge of domain shift—a shift in the relationship between data collected across different domains (e.g., computer generated vs. captured by real cameras). Models trained on data collected in one domain generally have poor accuracy on other domains. In this article, we discuss a new domain adaptation process that takes advantage of task-specific decision boundaries and the Wasserstein metric to bridge the domain gap, allowing the effective transfer of knowledge from one domain to another. As an additional advantage, this process is completely unsupervised, i.e., there is no need for new domain data to have labels or annotations.

See highlight details