AI Papers Reader

Personalized digests of latest AI research

View on GitHub

RewardSDS: Aligning Score Distillation with Reward-Weighted Sampling Boosts 3D Generation

Researchers at the Hebrew University of Jerusalem have developed RewardSDS, a novel technique that significantly improves the quality and alignment of 3D object generation. Their findings, published in a recent preprint, demonstrate that integrating reward models into the score distillation sampling (SDS) process leads to substantial advancements in text-to-3D and other generation tasks.

Score distillation sampling, a widely used method in 3D generation, leverages pre-trained 2D diffusion models to generate 3D objects. This approach works by optimizing the 3D model parameters so that its generated images receive high scores from the 2D diffusion model. However, SDS often struggles with fine-grained alignment to user intent. For instance, a prompt like “a penguin with a brown bag in the snow” might produce a penguin, but the bag might be misplaced or poorly rendered.

RewardSDS addresses this limitation by incorporating a reward model, which judges the quality and alignment of generated images based on the user prompt. Instead of treating all noise samples equally during the optimization process, RewardSDS assigns weights to each sample based on its reward score. High-reward samples, which are closely aligned with the user’s intent, get higher weights, thus influencing the optimization towards more desirable outcomes. This is analogous to selectively focusing on promising search results instead of examining all possible results during an internet search.

The researchers tested RewardSDS on several tasks, including zero-shot text-to-image generation, text-to-3D generation, and image editing. In text-to-image generation, they found that RewardSDS consistently outperforms the standard SDS across multiple metrics, including image quality, alignment to the prompt, and aesthetic appeal. They evaluated using different reward models, like CLIPScore (judging text-image alignment), Aesthetic Score Predictor (judging aesthetic quality), and ImageReward (combining alignment and aesthetics), showing improvements across the board. An example shows how a prompt like “a green apple and a black backpack” yields more realistic and appropriately arranged objects with RewardSDS compared to the standard SDS.

In text-to-3D generation, RewardSDS, used with the state-of-the-art MVDream framework, yielded superior results to the standard MVDream approach. This improvement was consistent across different 3D model backbones (NeRF and 3DGs), demonstrating the versatility and general applicability of RewardSDS. For instance, a prompt like “a skyscraper that reaches the cloud” produced a more realistic and detailed skyscraper.

The researchers also adapted RewardSDS for image editing tasks, creating RewardDDS. Here, they show improvements over existing methods, demonstrating that the approach is also effective for image manipulation.

RewardSDS shows significant improvements over baselines. Through extensive ablation studies, the team demonstrated that even small increases in the number of noise samples considered or optimization steps using RewardSDS already lead to noteworthy quality improvements. They also analyzed the trade-off between computation time and performance. Even with a modest increase in computation, RewardSDS provides significant quality benefits.

In summary, RewardSDS presents a significant advancement in 3D generation. By intelligently weighting noise samples based on a reward model, it achieves better alignment with user intent, producing higher-quality and more relevant outputs for various tasks. Its simplicity, broad applicability, and impressive results suggest that RewardSDS will be an influential technique for future 3D generative models.