ProText: A Benchmark Dataset for Measuring (Mis)gendering in Long-Form Texts

AuthorsHadas Kotek, Margit Bowler, Patrick Sonnenberg, Yu'an Yang

We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional pronoun resolution benchmarks and beyond the gender binary. We validated ProText through a mini case study, showing that even with just two prompts and two models, we can draw nuanced insights regarding gender bias, stereotyping, misgendering, and gendering. We reveal systematic gender bias, particularly when inputs contain no explicit gender cues or when models default to heteronormative assumptions.

Related readings and updates.

Improving How Machine Translations Handle Grammatical Gender Ambiguity

October 7, 2024research area Speech and Natural Language Processing

Machine Translation (MT) enables people to connect with others and engage with content across language barriers. Grammatical gender presents a difficult challenge for these systems, as some languages require specificity for terms that can be ambiguous or neutral in other languages. For example, when translating the English word “nurse” into Spanish, one must decide whether the feminine “enfermera” or the masculine “enfermero” is appropriate…

Generating Gender Alternatives in Machine Translation

August 7, 2024research area Speech and Natural Language Processingconference ACL

This paper was accepted at the Workshop on Gender Bias in Natural Language Processing 2024.

Machine translation (MT) systems often translate terms with ambiguous gender (e.g., English term “the nurse”) into the gendered form that is most prevalent in the systems’ training data (e.g., “enfermera”, the Spanish term for a female nurse). This often reflects and perpetuates harmful stereotypes present in society. With MT user interfaces in mind that…

ProText: A Benchmark Dataset for Measuring (Mis)gendering in Long-Form Texts

Related readings and updates.

Improving How Machine Translations Handle Grammatical Gender Ambiguity

Generating Gender Alternatives in Machine Translation

Discover opportunities in Machine Learning.