Machine Translation (MT) enables people to connect with others and engage with content across language barriers. Grammatical gender presents a difficult challenge for these systems, as some languages require specificity for terms that can be ambiguous or neutral in other languages. For example, when translating the English word "nurse" into Spanish, one must decide whether the feminine "enfermera" or the masculine "enfermero" is appropriate. However, particularly when contextual clues are absent, such as in translating a single sentence, a model cannot determine which would be correct. This challenge is especially prevalent for many European languages, which often require gender specificity not only for professional titles, but also for terms like child, friend, and member, as well as sometimes for animals. Often, machine translation systems will be biased toward the gendered form most prevalent in their training data, but in addition to not necessarily providing an accurate translation for the user, this can inadvertently reinforce harmful societal stereotypes.
To address this problem and give people more control over grammatical gender in machine translations, we presented Generating Gender Alternatives in Machine Translation at GeBNLP 2024 (Workshop on Gender Bias in NLP). Our approach trains translation models that give users fine-grained control over how gendered entities are translated, without requiring any additional components or inference overhead. With our method, a single pass of translation inference delivers all grammatically correct alternatives for gendered terms, enabling a user to select the most appropriate for their context. In addition to publishing this work, we have released training and test datasets to enable the broader ML community to more easily develop systems that allow such control over translation of gendered entities.
At Apple, we have also leveraged this research advancement to benefit users of the Translate app, which uses the underlying method to power the "grammatical gender" feature. This feature currently allows users to choose the most appropriate translations from all possible combinations when translating English content with ambiguously gendered entities into Spanish, French, or Portuguese.
Addressing the Challenge of Grammatical Gender in Machine Translation
Prior work simplified the problem by producing only "all masculine" or "all feminine" translations, i.e., enforcing that all entities in the translation are of the same gender. However, requiring that both, "doctor" and "nurse" in the sentence "The doctor met the nurse" are either both male or both female obviously does not provide adequate flexibility.
A central challenge to giving users control over the grammatical gender of translated terms is the number of possible combinations. For (n) gendered entities, there are 2n possible translations. For the very brief example sentence above with only two entities, there are four possible Spanish translations, depending on the gender choices for "nurse" and "doctor;" if there were four entities, however, there would be 16 possible translations. Prior approaches have used specialized systems like "ambiguous entity detection" and/or "rewriters" at inference time to provide different gendered translations, but these systems have additional computational overhead at inference time, introduce additional latency, and do not scale well to handle all possible combinations.
Rather than using those systems at inference time, our approach instead uses them to distill training data for our translation models. During the distillation process, these specialized systems add gender structure and alignment information to the training data (see Figure 1). Each gender-sensitive phrase is translated into a "gender structure" with both masculine and feminine forms. These gender structures are also aligned to the corresponding gender-ambiguous entity in the source sentence that controls its form.
The different forms of gender-sensitive phrases like "El anfitrión / La anfitriona," "el pintor / la pintora" are structured together as gender structures. Each gender structure is aligned to its corresponding ambiguously gendered entity (shown here by color coding). Given a translation containing gender structures and alignments, translations corresponding to any combination of gender choices can be easily derived.
By training on this distilled data with gender structures and alignments, our translation models learn to provide a single translation with the appropriate grammatical gender alternatives, without requiring any additional components that would add computational overhead and latency at inference time.
New Resources for the Research Community
To enable the research community to build translations systems that generate gender alternatives, we have released supervised training datasets for five language pairs (English to Spanish, French, German, Russian, Portuguese) and evaluation benchmarks for six language pairs (those listed above, as well as English to Italian). These datasets contain gender ambiguity annotations in the English source sentences and gender structure and alignment annotations in the translations.
Giving Users Control of Grammatical Gender in the Translate App
At Apple, we’ve brought this advancement from research into production to enhance the Translate app experience for users (see Figure 2). With iOS 17, we shipped the "grammatical gender" feature, which is powered by the underlying method from this research. The feature currently allows users to select the appropriate grammatical gender of ambiguous entities when translating from English to Spanish, French, or Portuguese. Because our method trains a single translation model and doesn’t require multiple complete translations or other components at inference time, it does not incur additional computational overhead or result in additional latency for users, and was suitable for production application at scale.
Conclusion
Translating content with ambiguously gendered entities into languages with gender-specific forms has been a challenge for machine translation, and it has often resulted in systems providing translations that reinforce societal stereotypes reflected in models' training data. Our research provides a new approach to training translation models, which enables them to provide all appropriate alternative translations for gendered entities, without requiring additional components or computational overhead at inference time. Using this method, we have given users of the Translate app greater control over their translations, so that they can select the appropriate grammatical gender in cases of ambiguity when translating from English into several European languages.
While we have made important progress on this problem in machine translation, there remain additional challenges for future research, including expanding to additional language pairs, including gender neutral forms, and appropriately addressing non-binary gender identities. Our hope is that our publication and the new resources we have shared help to accelerate progress within the broader research community to continue to improve translation systems for everyone.
Acknowledgements
Many people contributed to this project including: Qin Gao, Sarthak Garg, Mozhdeh Gheini, Yi-Hsiu Liao, Tatiana Likhomanenko, Louie Livon-Bemel, Udhay Nallasamy, Alex Ovchinnikov, Matthias Paulik, Telmo Pessoa Pires, and Hendra Setiawan.