PERSONA: A Reproducible Testbed for Pluralistic Alignment

🔊

💬 Ask

The rapid advancement of language models (LMs) necessitates robust alignment with diverse user values. However, current preference optimization approaches often fail to capture the plurality of user opinions, instead reinforcing majority viewpoints and marginalizing minority perspectives.

This paper introduces PERSONA, a reproducible test bed designed to evaluate and improve pluralistic alignment of LMs. PERSONA generates a large-scale evaluation dataset containing 3,868 prompts and 317,200 feedback pairs obtained from procedurally generated synthetic personas. These personas are built from US Census data, incorporating idiosyncratic attributes and resulting in 1,586 unique individuals with varied demographic characteristics. The authors leverage this dataset to systematically evaluate LM capabilities in role-playing diverse users.

For example, one persona in the PERSONA dataset might be a 73-year-old Filipino woman who is retired and enjoys gardening. She lives in a married couple household with children ages 5 to 17. She identifies as a liberal Democrat and enjoys reading in her free time. This persona would have a specific set of opinions on various topics, such as politics, religion, and healthcare.

The researchers evaluate the performance of LMs on a variety of tasks, such as generating text that reflects the preferences of a specific persona, as well as generating text that is consistent with a specific persona’s values. They find that LMs struggle to accurately reflect the preferences of diverse personas, particularly when those preferences are not aligned with the majority opinion.

The paper also explores the role of “chain of thought” in personalized responses. They find that providing an LM with a chain of thought, which is a step-by-step reasoning process, does not necessarily lead to better performance. In fact, they find that chain of thought can actually degrade performance.

The paper’s main contribution is the creation of PERSONA, a comprehensive test bed for evaluating pluralistic alignment in LMs. It enables researchers to systematically evaluate the performance of different alignment methods, as well as to explore the challenges of aligning LMs with diverse user preferences.

The paper also provides a set of valuable insights into the challenges of aligning LMs with diverse user preferences. The researchers highlight the importance of considering both demographic and idiosyncratic attributes, and they caution against relying on “representative” users or on simplistic methods for capturing diverse preferences. They also argue that there is no single “right” way to align LMs with diverse user preferences, and that further research is needed to develop more robust and effective alignment methods.

The paper’s findings have important implications for the development and deployment of LMs. As LMs become increasingly used in a variety of applications, it is essential that they are aligned with the values of diverse users. The PERSONA test bed provides a valuable tool for evaluating and improving pluralistic alignment methods, and the paper’s insights into the challenges of aligning LMs with diverse user preferences are essential reading for anyone working in this field.

AI Papers Reader

Personalized digests of latest AI research

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Chat about this paper