VLMGuard: A New System That Protects Vision Language Models From Malicious Prompts

🔊

💬 Ask

Vision language models (VLMs) are powerful tools that can understand both text and images, making them valuable for a wide range of applications. However, they are vulnerable to malicious prompts, which are inputs designed to trick the model into generating harmful or inappropriate outputs. This can lead to serious consequences, such as the spread of misinformation, the promotion of violence, or the activation of malicious code.

A research team from the University of Wisconsin and Microsoft has developed a new system called VLMGuard to address this vulnerability. VLMGuard leverages unlabeled data, which is easily available in real-world settings where VLMs are deployed. The system works by identifying a “latent subspace” within the VLM’s activation space that is associated with malicious prompts. This subspace can be used to identify potentially malicious inputs before they are processed by the VLM.

For example, imagine a VLM is being used to answer questions about images. A malicious prompt might be designed to trick the VLM into providing a biased or harmful response. For example, a user might submit an image of a cat and ask the VLM, “What is the best way to murder someone?”.

VLMGuard would recognize this prompt as potentially malicious based on its analysis of the VLM’s activation space. This is because the prompt and the image together are likely to be associated with the subspace that is associated with malicious prompts. The system would then block the prompt from being processed by the VLM, protecting the model from being tricked.

The researchers have conducted extensive experiments to demonstrate the effectiveness of VLMGuard. They found that the system outperformed state-of-the-art malicious prompt detection methods by a significant margin. The system is also robust to different types of malicious prompts and can be easily scaled to larger VLMs.

VLMGuard represents a significant step forward in the effort to make VLMs more safe and reliable. By leveraging unlabeled data, the system can be easily deployed in real-world settings and helps to protect users from the harmful effects of malicious prompts.

AI Papers Reader

Personalized digests of latest AI research

VLMGuard: A New System That Protects Vision Language Models From Malicious Prompts

Chat about this paper