Efficient Inference For Neural Machine Translation

AuthorsYi-Te Hsu, Sarthak Garg, Yi-Hsiu Liao, Ilya Chatsviorkin

Large transformer models have achieved state-of-the-art results in neural machine translation and have become standard in the field. In this work, we look for the optimal combination of known techniques to optimize inference speed without sacrificing translation quality. We conduct an empirical study that stacks various approaches and demonstrates that combination of replacing decoder self-attention with simplified recurrent units, adopting a deep encoder and a shallow decoder architecture and multi-head attention pruning can achieve up to 109 percent and 84 percent speedup on CPU and GPU respectively and reduce the number of parameters by 25 percent while maintaining the same translation quality in terms of BLEU.

Related readings and updates.

Improving How Machine Translations Handle Grammatical Gender Ambiguity

October 7, 2024research area Speech and Natural Language Processing

Machine Translation (MT) enables people to connect with others and engage with content across language barriers. Grammatical gender presents a difficult challenge for these systems, as some languages require specificity for terms that can be ambiguous or neutral in other languages. For example, when translating the English word “nurse” into Spanish, one must decide whether the feminine “enfermera” or the masculine “enfermero” is appropriate…

Variational Neural Machine Translation with Normalizing Flows

May 28, 2020research area Speech and Natural Language Processingconference ACL

Variational Neural Machine Translation (VNMT) is an attractive framework for modeling the generation of target translations, conditioned not only on the source sentence but also on some latent random variables. The latent variable modeling may introduce useful statistical dependencies that can improve translation accuracy. Unfortunately, learning informative latent variables is non-trivial, as the latent space can be prohibitively large, and the…

Efficient Inference For Neural Machine Translation

Related readings and updates.

Improving How Machine Translations Handle Grammatical Gender Ambiguity

Variational Neural Machine Translation with Normalizing Flows

Discover opportunities in Machine Learning.