TurboEdit: Fast Text-Based Image Editing with Diffusion Models
đź“„ Full Paper
đź’¬ Ask
Researchers from Adobe have developed a new method called TurboEdit for text-based image editing that leverages the power of few-step diffusion models. Unlike previous methods that rely on multi-step diffusion models and are often slow and computationally intensive, TurboEdit performs image edits in just a few steps, making it suitable for real-time applications.
Here’s how TurboEdit works:
-
Real Image Inversion: The method uses a single-step encoder-based inversion technique to efficiently project a real image into the noise space of the diffusion model. This is significantly faster than previous methods that rely on multi-step inversion, which often requires 50 or more steps.
-
Disentangled Text Prompt Conditioning: Instead of relying on attention-based methods that freeze attention maps to preserve structural similarity, TurboEdit exploits the property of few-step diffusion models, where long and detailed text prompts are inherently disentangled. This means that modifying a single attribute in the text prompt results in only minor changes to the text embedding, leading to disentangled edits in the image space.
-
Instruction-Based Editing: Users can leverage the power of large language models (LLMs) to generate highly descriptive text prompts. By providing a short instruction to the LLM, like “change the dog to a cat,” the LLM can generate a detailed text prompt that captures the desired edits. This makes it easier for users to create complex edits with just a few instructions.
TurboEdit significantly outperforms state-of-the-art methods in both speed and quality. It achieves high-quality edits with minimal artifacts, requires only 8 steps for inversion (a one-time cost) and 4 steps per edit, and can perform continuous editing by interpolating between text prompts.
Here are some examples of what TurboEdit can do:
- Replace an object: A user can replace a cat in an image with a dog by providing a simple instruction like “change the cat to a dog.”
- Add or remove attributes: TurboEdit can add a hat to a person, change the color of their hair, or make an image look like a child’s drawing.
- Perform complex edits: Users can modify multiple attributes simultaneously, such as changing a person’s shirt and hair color at the same time.
- Apply edits to specific regions: TurboEdit supports local masking, which enables users to apply edits only to specific regions of the image.
TurboEdit is a significant advancement in text-based image editing. It is faster, more efficient, and more user-friendly than previous methods, opening up new possibilities for creative image manipulation.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.