AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Progent: A New Programming Language for Securing AI Agents

Large language model (LLM) agents are powerful AI systems that can automate tasks by utilizing a diverse set of tools. However, their ability to interact with the external world also presents significant security risks, as malicious actors can inject prompts that lead agents to perform dangerous actions. To address these challenges, researchers from UC Berkeley and UC Santa Barbara have introduced Progent, a novel privilege control mechanism that allows developers to programmatically define and enforce security policies for LLM agents.

Progent operates on the principle of least privilege, allowing only essential actions for task completion while blocking unnecessary ones. At the core of Progent is a domain-specific language (DSL) that enables flexible expression of privilege control policies during agent execution. These policies provide fine-grained constraints over tool calls, deciding when they are permissible and specifying fallback actions if they are not.

For instance, imagine an LLM agent designed to manage a user’s email. Using Progent, a developer can create a policy that allows the agent to send emails and select recipients but blocks the ability to delete emails, preventing malicious actors from exploiting the agent to erase important messages.

Another example is an agent designed to help clinicians manage electronic health records (EHRs). A Progent policy could allow the agent to retrieve patient information but forbid it from deleting database entries, safeguarding data integrity against potential attacks.

Progent also accounts for the dynamic nature of agent execution by supporting policy updates. For example, a banking agent tasked with sending money to a recipient may initially have a broad policy allowing transfers to any account. However, once the agent retrieves the correct account number for the recipient, the policy can be tightened to permit transfers only to that specific account, preventing attackers from redirecting funds to their own accounts.

The researchers evaluated Progent across various scenarios, including the AgentDojo benchmark for prompt injection attacks. Their results show that Progent significantly reduces attack success rates while preserving high utility. For example, Progent reduced the attack success rate on AgentDojo from 41.2% to 2.2% using a combination of manual and LLM-managed policies.

The modular design of Progent enhances its practicality and potential for widespread adoption, as it doesn’t alter agent internals and requires only minimal changes to agent implementation. This enables developers to integrate Progent into their existing AI agent systems with ease. Furthermore, to automate policy writing, the researchers leverage state-of-the-art LLMs to generate Progent policies based on user queries, which can then be dynamically updated for improved security and utility.

In conclusion, Progent offers a promising approach to securing LLM agents by providing a programmable privilege control mechanism that balances security and utility. By enabling developers to define and enforce fine-grained security policies, Progent helps mitigate the risks associated with LLM agent interactions with the external world, paving the way for safer and more reliable AI systems.