Expressive Whole-Body 3D Gaussian Avatars from Short Videos
📄 Full Paper
💬 Ask
Recent advancements in computer vision have made it possible to create 3D human avatars from casually captured videos. However, most of these avatars only support body movements, and lack facial expressions and hand motions. In this paper, the researchers introduce ExAvatar, a novel expressive 3D human avatar that can be animated with a variety of facial expressions, hand gestures, and body poses, learned from just a short monocular video.
The key innovation of ExAvatar is a hybrid representation that combines a standard 3D mesh model (SMPL-X) with a 3D Gaussian splatting (3DGS) model. SMPL-X provides a detailed model of the human body’s structure and pose, while 3DGS represents the avatar’s appearance using a collection of 3D Gaussian distributions. These Gaussians are placed on the surface of the mesh, with each one corresponding to a vertex. This hybrid representation is key to ExAvatar’s ability to produce highly expressive avatars.
To overcome the limitations of limited pose and facial expression data in typical videos, ExAvatar employs a number of techniques:
- Connectivity-based regularizers: To ensure smoothness and prevent unrealistic deformations, the researchers enforce local similarity between neighboring 3D Gaussians, leveraging their connectivity.
- Pose-dependent Gaussian assets: The model learns how to adjust the Gaussian assets (position, scale, color) to match the current pose of the avatar, ensuring that the avatar moves realistically.
The researchers evaluate ExAvatar on several challenging datasets, demonstrating superior results compared to previous methods. For instance, ExAvatar produces much more realistic and detailed representations of faces and hands than prior techniques, especially in novel poses.
Here are some concrete examples of how ExAvatar works:
- Imagine a video of a person walking in a park. The video is likely to have limited facial expressions and hand gestures.
- ExAvatar takes this video as input and, using its hybrid representation, learns to create a 3D avatar that captures the person’s overall shape and appearance.
- Using the connectivity-based regularizers, the model ensures that the avatar moves smoothly and realistically, even when animated with novel poses and facial expressions.
- Using the pose-dependent Gaussian assets, the model can dynamically adjust the appearance of the avatar’s face and hands to match the current pose, creating a more natural and expressive animation.
ExAvatar is a promising step towards the creation of more expressive and realistic 3D human avatars. It has the potential to be used in a variety of applications, such as virtual reality, video games, and animation.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.