AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Improving Video Generation with Human Feedback: A New Approach to Aligning AI with Human Preferences

Recent advances in video generation have produced impressive results, but challenges remain. Videos can suffer from unstable motion, misalignment with prompts, and poor visual quality. A new paper, “Improving Video Generation with Human Feedback,” tackles these issues by systematically incorporating human feedback into the training process. The researchers developed a comprehensive pipeline to align advanced flow-based video generation models with human preferences.

Building a Better Dataset: A major hurdle in improving video generation has been the lack of high-quality, large-scale datasets reflecting human preferences for video. The researchers addressed this by creating a new dataset with approximately 182,000 annotated examples. Each example consists of a text prompt and two videos generated by different models. Human annotators rated the videos across three dimensions:

  • Visual Quality (VQ): Assessing factors like clarity, detail, and overall aesthetic appeal.
  • Motion Quality (MQ): Evaluating smoothness, stability, and naturalness of movement.
  • Text Alignment (TA): Measuring how well the video content matches the prompt’s description.

This multi-dimensional approach provides a more nuanced understanding of human preferences than previous datasets that focused on single metrics. For example, one video might excel in visual quality but fail to accurately reflect the prompt, while another might capture the prompt well but have jerky movements. The new dataset allows for a more precise assessment of these tradeoffs. They also created a secondary benchmark dataset, called VideoGen-RewardBench, to help accurately evaluate video reward models on modern VDMs, supplementing the existing GenAI-Bench.

Reward Modeling and Alignment Algorithms: The researchers then developed VideoReward, a multi-dimensional reward model trained on this new dataset. This model uses a Bradley-Terry model with ties to learn preference relationships, which handles the ambiguity inherent in human judgments where ties between videos are common. Experimentation showed that the model incorporating these ties outperformed models trained only on wins/losses.

To actually improve the video generation models, three new algorithms were introduced, all viewed from a reinforcement learning perspective:

  • Flow-DPO (Direct Preference Optimization for Flow): A training-time algorithm that directly optimizes the model to maximize reward while constraining the model from deviating too far from a pre-trained model.
  • Flow-RWR (Reward Weighted Regression for Flow): Another training-time algorithm that weighs the training updates by the reward signal, prioritizing higher-reward samples.
  • Flow-NRG (Noisy Reward Guidance): An inference-time technique that directly guides the video generation process using reward signals at each step. This allows for personalized video generation by applying custom weights to different objectives (VQ, MQ, TA) during inference, without retraining the model.

Experimental Results: Experiments demonstrated that VideoReward significantly outperforms existing reward models, and Flow-DPO, when using a constant KL divergence constraint, yielded superior performance compared to Flow-RWR and standard supervised fine-tuning. Flow-NRG proved highly effective in allowing users to tailor the output video by emphasizing specific qualities. For example, a user could prioritize Text Alignment if the video needs to match the narrative accurately, or Visual Quality if the video needs to be highly realistic.

Conclusion: This research provides a significant advancement in video generation by addressing the limitations of previous methods. The development of the large-scale multi-dimensional dataset, combined with the innovative reward model and alignment algorithms, marks a substantial step towards aligning AI video generation with human preferences and opens up exciting new possibilities for personalized video creation. Future work focuses on further refining the reward models and exploring different reinforcement learning algorithms to address potential biases and ensure robust performance.