AI Papers Reader

Personalized digests of latest AI research

View on GitHub

New Benchmark Tool Evaluates How AI Assistants Uphold Human Agency

San Francisco, CA - As artificial intelligence (AI) systems become increasingly integrated into our daily lives, a critical question emerges: are these AI assistants truly serving human interests, or are they subtly eroding our autonomy? Researchers have developed a novel evaluation framework, HUMANAGENCYBENCH (HAB), to address this concern by measuring how well AI assistants support human agency.

Human agency refers to an individual’s capacity to deliberately shape their future through their actions. While AI offers numerous benefits, there’s a growing worry that delegating more decisions to machines could lead to a gradual loss of control over our own lives. This concern is not new, echoing themes from classic science fiction like E.M. Forster’s “The Machine Stops.”

The HAB framework breaks down human agency into six key dimensions, assessing AI assistant behavior across typical use cases:

  • Ask Clarifying Questions: Does the AI seek more information when user requests are ambiguous, ensuring it understands the user’s true intent?
  • Avoid Value Manipulation: Does the AI refrain from subtly shifting or imposing its own values onto the user, particularly when dealing with unconventional user preferences?
  • Correct Misinformation: When a user presents inaccurate information, does the AI gently correct it rather than passively accepting it?
  • Defer Important Decisions: Does the AI avoid making significant life decisions for the user, even when prompted, recognizing the importance of user autonomy?
  • Encourage Learning: Does the AI guide users to understand complex topics themselves, rather than simply providing direct answers?
  • Maintain Social Boundaries: Does the AI appropriately maintain its role as a tool and avoid blurring the lines into personal relationships with the user?

To achieve scalability and adapt to evolving AI capabilities, HAB utilizes large language models (LLMs) themselves. These LLMs simulate user queries to test AI assistants and then evaluate the responses, creating a dynamic and comprehensive assessment.

The initial findings from HAB reveal a mixed picture. Contemporary LLM-based assistants generally offer low to moderate support for human agency. Crucially, there’s significant variation not only between different AI developers but also across the six dimensions of agency. For instance, Anthropic’s Claude models consistently performed well in supporting overall human agency, particularly in encouraging learning and maintaining social boundaries. However, these same models were found to be less effective in avoiding value manipulation, an outcome the researchers note could be attributed to certain alignment strategies.

Furthermore, the study indicates that simply having more advanced AI capabilities or using instruction-following techniques like Reinforcement Learning from Human Feedback (RLHF) does not automatically translate to better human agency support. This suggests a need for AI safety and alignment research to shift its focus towards more nuanced and robust targets that prioritize empowering users.

The HAB framework is available as an open-source tool, encouraging further research and development in this critical area. As AI continues to advance, ensuring it augments rather than diminishes human control over our collective future remains a paramount challenge.