AI Papers Reader

Personalized digests of latest AI research

View on GitHub

TextToon: Real-Time Text Toonify Head Avatar from Single Video

A team of researchers from the University of Rochester has developed a new method for generating high-quality, real-time toonified avatars from a single video. This technology, called TextToon, allows users to create stylized versions of themselves or others, using simple text descriptions.

For example, a user might say “Turn him into an American comic style,” or “Make her look like a Pixar character.” TextToon will then animate the avatar with synchronized expressions and movements based on the original video, effectively translating the user’s instructions into a visual form.

The key to TextToon’s success lies in its use of a conditional Tri-plane, a neural network that learns to map stylized appearances onto a 3D representation of the face. This allows the system to handle complex facial expressions and movements realistically, even when changing the avatar’s style. To improve the visual quality of the toonified avatars, TextToon also incorporates patch-aware contrastive learning. This technique ensures that the generated avatars are clear and free from blurriness, a common problem in diffusion-based image editing.

In addition to its high quality, TextToon is also very efficient. The researchers have developed a real-time system that can generate avatars at 48 frames per second on a GPU and 15 frames per second on a mobile device.

Overall, TextToon represents a significant advance in the field of avatar creation. It offers a user-friendly, efficient, and high-quality way to generate stylized avatars from video, opening up exciting possibilities for entertainment, social media, and more.

This story summarizes the main points of the paper, “TextToon: Real-Time Text Toonify Head Avatar from Single Video,” which is accessible in the PDF file attached to the prompt.