DiscoverDaily Paper CastVisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Update: 2025-01-07
Share

Description

🤗 Upvotes: 12 | cs.CV



Authors:

Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, Yuxiao Dong



Title:

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation



Arxiv:

http://arxiv.org/abs/2412.21059v1



Abstract:

We present a general strategy to aligning visual generation models -- both image and video generation -- with human preference. To start with, we build VisionReward -- a fine-grained and multi-dimensional reward model. We decompose human preferences in images and videos into multiple dimensions, each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accurate score. To address the challenges of video quality assessment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction. Based on VisionReward, we develop a multi-objective preference learning algorithm that effectively addresses the issue of confounding factors within preference data. Our approach significantly outperforms existing image and video scoring methods on both machine metrics and human evaluation. All code and datasets are provided at https://github.com/THUDM/VisionReward.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Jingwen Liang, Gengyu Wang