Bias after Prompting: Persistent Discrimination in Large Language Models
AuthorsNivedha Sivakumar*, Natalie Mackraz*, Samira Khorshidi, Krishna Patel†, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff
Bias after Prompting: Persistent Discrimination in Large Language Models
AuthorsNivedha Sivakumar*, Natalie Mackraz*, Samira Khorshidi, Krishna Patel†, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff
A dangerous assumption that can be made from prior work on the bias transfer hypothesis (BTH) is that biases do not transfer from pre-trained large language models (LLMs) to adapted models. We invalidate this assumption by studying the BTH in causal models under prompt adaptations, as prompting is an extremely popular and accessible adaptation strategy used in real-world applications. In contrast to prior work, we find that biases can transfer through prompting and that popular prompt-based mitigation methods do not consistently prevent biases from transferring. Specifically, the correlation between intrinsic biases and those after prompt adaptation remain moderate to strong across demographics and tasks — for example, gender (rho >= 0.94) in co-reference resolution, and age (rho >= 0.98) and religion (rho >= 0.69) in question answering. Further, we find that biases remain strongly correlated when varying few-shot composition parameters, such as sample size, stereotypical content, occupational distribution and representational balance (rho >= 0.90). We evaluate several prompt-based debiasing strategies and find that different approaches have distinct strengths, but none consistently reduce bias transfer across models, tasks or demographics. These results demonstrate that correcting bias, and potentially improving reasoning ability, in intrinsic models may prevent propagation of biases to downstream tasks.
Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models
December 10, 2024research area Fairness, research area Speech and Natural Language Processingconference NeurIPS
*Equal Contributors
Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning…
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
November 4, 2024research area Computer Vision, research area Methods and Algorithmsconference NeurIPS
Large pretrained vision-language models like CLIP have shown promising generalization capability, but may struggle in specialized domains (e.g., satellite imagery) or fine-grained classification (e.g., car models) where the visual concepts are unseen or under-represented during pretraining. Prompt learning offers a parameter-efficient finetuning framework that can adapt CLIP to downstream tasks even when limited annotation data are available. In…