AI Papers Reader

Personalized digests of latest AI research

View on GitHub

The Sum of the Parts: How Multi-Agent AI Systems Can Accidentally Reveal Sensitive Information

In our increasingly interconnected digital world, Large Language Models (LLMs) are no longer confined to single, isolated tasks. Instead, they are being integrated into complex multi-agent systems, where several AI agents collaborate to achieve sophisticated goals. While this collaboration promises enhanced capabilities, a new study by researchers at UNC Chapel Hill and The University of Texas at Austin reveals a critical, and previously underappreciated, privacy risk: “compositional privacy leakage.”

This phenomenon occurs when seemingly innocuous pieces of information, shared independently by different AI agents, can be combined by an adversary to reveal sensitive or private data that was never explicitly disclosed by any single agent. Imagine, for example, an AI assistant that handles employee expenses, another that manages HR records, and a third that oversees compliance. Individually, their responses might seem harmless. However, an attacker could potentially piece together fragments from each to infer deeply personal information.

The researchers have developed a framework to understand this risk, demonstrating how auxiliary knowledge and the very interactions between agents can amplify privacy vulnerabilities. They illustrate this with a compelling example: an attacker might obtain customer ID to name mappings from one agent, product purchase logs from another, and insurance claim information from a third. While each piece of data is benign on its own, their combination could reveal something like: “John, who has no diagnosed heart condition, is self-monitoring for potential undiagnosed heart issues.” This inference is made by correlating his purchase of a blood pressure monitor and cholesterol test kit with the absence of a known heart condition in his insurance claims.

To combat this emerging threat, the study proposes two defense strategies. The first is a “Theory-of-Mind (ToM) defense.” In this approach, defender agents are trained to anticipate an opponent’s intentions. They try to “reason” about whether their responses could be used by an adversary to infer sensitive information and, if so, they withhold or obfuscate the data. The second strategy is a “Collaborative Consensus Defense (CoDef).” Here, multiple defender agents collaborate, sharing aggregated information about their interactions and collectively voting on whether a query is safe to answer. A single dissenting vote to block the query is enough to prevent it from being answered, creating a robust collective safeguard.

The research found that while simple “chain-of-thought” reasoning (where an agent explains its steps) offered minimal protection against this compositional leakage, the ToM defense significantly improved the blocking of sensitive queries, sometimes up to 97%. However, this came at the cost of potentially hindering legitimate tasks. The CoDef strategy, on the other hand, struck a better balance. It effectively blocked sensitive information while maintaining a higher rate of successful, benign interactions.

The study highlights that as LLMs become more integrated into multi-agent systems, privacy risks evolve beyond simple data memorization. The way these agents communicate and share contextual fragments can inadvertently create pathways for sensitive information to be exposed. The findings underscore the need for novel, collaborative defense mechanisms that can protect against these emergent, context-driven privacy leaks.