Improving Language Model Personas via Rationalization with Psychological Scaffolds
AuthorsBrihi Joshi**†, Xiang Ren†, Swabha Swayamdipta†, Rik Koncel-Kedziorski, Tim Paek
Improving Language Model Personas via Rationalization with Psychological Scaffolds
AuthorsBrihi Joshi**†, Xiang Ren†, Swabha Swayamdipta†, Rik Koncel-Kedziorski, Tim Paek
Language models prompted with a user description or persona are being used to predict the user’s preferences and opinions. However, existing approaches to building personas mostly rely on a user’s demographic attributes and/or prior judgments, but not on any underlying reasoning behind a user’s judgments. We introduce PB&J (Psychology of Behavior and Judgments), a framework that improves LM personas by incorporating potential rationales for why the user could have made a certain judgment. Our rationales are generated by a language model to explicitly reason about a user’s behavior on the basis of their experiences, personality traits, or beliefs. Our method employs psychological scaffolds: structured frameworks such as the Big 5 Personality Traits or Primal World Beliefs to help ground the generated rationales in existing theories. Experiments on public opinion and movie preference prediction tasks demonstrate that language model personas augmented with PB&J rationales consistently outperform personas conditioned only on user demographics and / or judgments, including those that use a model’s default chain-of-thought, which is not grounded in psychological theories. Additionally, our PB&J personas perform competitively with those using human-written rationales, suggesting the potential of synthetic rationales guided by existing theories.
PrimeX: A Dataset of Worldview, Opinion, and Explanation
October 27, 2025research area Data Science and Annotation, research area Speech and Natural Language Processingconference EMNLP
As the adoption of language models advances, so does the need to better represent individual users to the model. Are there aspects of an individual’s belief system that a language model can utilize for improved alignment? Following prior research, we investigate this question in the domain of opinion prediction by developing PrimeX, a dataset of public opinion survey data from 858 US residents with two additional sources of belief information:…
PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
September 26, 2025research area Human-Computer Interaction, research area Methods and AlgorithmsWorkshop at NeurIPS
This paper was accepted at the Workshop on Regulatable ML (ReML) at NeurIPS 2025.
Recent developments in AI governance and safety research have called for red-teaming methods that can effectively surface potential risks posed by AI models. Many of these calls have emphasized how the identities and backgrounds of red-teamers can shape their red-teaming strategies, and thus the kinds of risks they are likely to uncover. While automated red-teaming…