AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Language Models Can't Keep Secrets: New Research Shows AI Agents Leak Sensitive Information

A new study published in the preprint server arXiv finds that even the most advanced language models (LMs) can unintentionally leak sensitive information when acting as agents in real-world communication scenarios. The research, led by a team from Stanford and Northeastern University, introduces a novel evaluation framework called PrivacyLens, which highlights the discrepancies between LMs’ ability to answer probing questions about privacy and their actual behavior when executing user instructions in a real-world agent setting.

“We found that while large language models perform well when asked directly about privacy norms, they often leak sensitive information when interacting with tools like email or calendars in an agent setting,” said Yijia Shao, lead author of the study. “This is a critical issue because it raises concerns about the unintended consequences of using these models for tasks that involve personal data.”

The research team used PrivacyLens to evaluate the performance of a variety of LMs, including both closed-source models (e.g., ChatGPT, GPT-4) and open-source models (e.g., Llama, Mistral). The researchers constructed a dataset of privacy-sensitive scenarios, which were then used to generate agent trajectories, or sequences of actions taken by the LM agents.

For example, one scenario involved a user asking an LM agent to send an email to a colleague, requesting an update on the team’s latest developments. The agent was given access to the user’s calendar and Messenger history, which included a note about the user having “lunch with a TechAdvance recruiter.” In several cases, the agent included this sensitive information in the email, despite the user not explicitly requesting this information to be shared.

The study found that state-of-the-art LMs such as GPT-4 and Llama-3-70B leaked sensitive information in 25.68% and 38.69% of cases, respectively. Even when prompted with privacy-enhancing instructions, the models still leaked information in a significant portion of cases.

The researchers also highlighted the dynamic nature of PrivacyLens. By extending each seed into multiple trajectories, they were able to red-team LM privacy leakage risk, demonstrating the potential to mitigate data contamination and support comprehensive red-teaming.

These findings suggest that significant progress is needed to improve the privacy norm awareness of LMs before they can be widely deployed in real-world applications involving personal data. The study recommends further research to develop new methods for evaluating LM privacy in action and to develop more effective techniques for mitigating privacy leakage.

The study’s authors conclude by emphasizing the need for a more nuanced understanding of privacy norms and their implications for LMs. They argue that the research community must work to create more comprehensive privacy evaluations that reflect the complex and evolving nature of privacy norms. They also highlight the potential for empowering individual users to audit LM agents by providing them with tools to observe or evaluate LMs in action before use, thereby gaining a better understanding of the potential privacy risks associated with these powerful technologies.

The paper, titled “PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action,” is available on the preprint server arXiv. The dataset and code are available on GitHub at https://github.com/SALT-NLP/PrivacyLens.