TurboEdit: Fast Text-Based Image Editing with Diffusion Models

🔊

💬 Ask

Researchers from Adobe have developed a new method called TurboEdit for text-based image editing that leverages the power of few-step diffusion models. Unlike previous methods that rely on multi-step diffusion models and are often slow and computationally intensive, TurboEdit performs image edits in just a few steps, making it suitable for real-time applications.

Here’s how TurboEdit works:

Real Image Inversion: The method uses a single-step encoder-based inversion technique to efficiently project a real image into the noise space of the diffusion model. This is significantly faster than previous methods that rely on multi-step inversion, which often requires 50 or more steps.
Disentangled Text Prompt Conditioning: Instead of relying on attention-based methods that freeze attention maps to preserve structural similarity, TurboEdit exploits the property of few-step diffusion models, where long and detailed text prompts are inherently disentangled. This means that modifying a single attribute in the text prompt results in only minor changes to the text embedding, leading to disentangled edits in the image space.
Instruction-Based Editing: Users can leverage the power of large language models (LLMs) to generate highly descriptive text prompts. By providing a short instruction to the LLM, like “change the dog to a cat,” the LLM can generate a detailed text prompt that captures the desired edits. This makes it easier for users to create complex edits with just a few instructions.

TurboEdit significantly outperforms state-of-the-art methods in both speed and quality. It achieves high-quality edits with minimal artifacts, requires only 8 steps for inversion (a one-time cost) and 4 steps per edit, and can perform continuous editing by interpolating between text prompts.

Here are some examples of what TurboEdit can do:

Replace an object: A user can replace a cat in an image with a dog by providing a simple instruction like “change the cat to a dog.”
Add or remove attributes: TurboEdit can add a hat to a person, change the color of their hair, or make an image look like a child’s drawing.
Perform complex edits: Users can modify multiple attributes simultaneously, such as changing a person’s shirt and hair color at the same time.
Apply edits to specific regions: TurboEdit supports local masking, which enables users to apply edits only to specific regions of the image.

TurboEdit is a significant advancement in text-based image editing. It is faster, more efficient, and more user-friendly than previous methods, opening up new possibilities for creative image manipulation.

AI Papers Reader

Personalized digests of latest AI research

TurboEdit: Fast Text-Based Image Editing with Diffusion Models

Chat about this paper