Personalized Image Generation with Visual Preferences
📄 Full Paper
💬 Ask
A new paper published in arXiv.org explores a new way to personalize image generation based on individual preferences.
The paper introduces ViPer, a method for personalizing image generation models, such as Stable Diffusion.
Most image generation models are trained on massive datasets of images, which means they are designed to produce outputs that appeal to a broad audience, rather than individual preferences. As a result, users are often forced to use iterative manual prompt engineering to achieve specific desired outcomes.
ViPer circumvents this problem by capturing the generic preferences of a user in a one-time process. Users are asked to comment on a small selection of images, explaining why they like or dislike each one.
Based on these comments, a large language model is used to infer a user’s structured liked and disliked visual attributes. This visual preference information is then used to guide the text-to-image generation process towards producing images that are aligned with the individual user’s aesthetic preferences.
The research paper highlights several key benefits of ViPer.
First, it allows users to express their preferences in a more flexible way, using natural language instead of binary choices or ranking a set of images.
Second, it personalizes the image generation process without requiring any fine-tuning of the underlying model.
Finally, it provides a proxy metric for evaluating the quality of personalized generations, which can be used to improve the accuracy and efficiency of the process.
The paper includes several user studies and large language model guided evaluations to demonstrate the effectiveness of ViPer. The results show that ViPer generates images that are well aligned with individual user preferences, as measured by both human evaluation and the proxy metric.
ViPer offers a significant advancement in the field of personalized image generation. It provides users with a more intuitive and efficient way to control the output of generative models, leading to more satisfying and creative results.
The research paper is available online at: https://arxiv.org/abs/2407.17365. The authors have also open-sourced their code and model weights at https://viper.epfl.ch.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.