State Spaces Aren’t Enough: Machine Translation Needs Attention

In collaboration with University of Amsterdam

AuthorsAli Vardasbi*, Telmo Pessoa Pires*, Robin M. Schmidt, Stephan Peitz

*= Equal Contributors

Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g., vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state and is able to capture long-range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT) and evaluate several encoder-decoder variants on WMT’14 and WMT’16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points and counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4’s inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.

Related readings and updates.

Improving How Machine Translations Handle Grammatical Gender Ambiguity

October 7, 2024research area Speech and Natural Language Processing

Machine Translation (MT) enables people to connect with others and engage with content across language barriers. Grammatical gender presents a difficult challenge for these systems, as some languages require specificity for terms that can be ambiguous or neutral in other languages. For example, when translating the English word “nurse” into Spanish, one must decide whether the feminine “enfermera” or the masculine “enfermero” is appropriate…

Efficient Representation Learning via Adaptive Context Pooling

July 11, 2022research area Methods and Algorithmsconference ICML

Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success…

State Spaces Aren’t Enough: Machine Translation Needs Attention

Related readings and updates.

Improving How Machine Translations Handle Grammatical Gender Ambiguity

Efficient Representation Learning via Adaptive Context Pooling

Discover opportunities in Machine Learning.