DiscoverBest AI papers explainedDirect Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences
Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Update: 2025-10-24
Share

Description

The academic paper claims that pairwise-comparison-based RLHF is incapable of learning heterogeneous preferences, whereas tenary comparisons can. They propose **Expectation-Maximization Direct Preference Optimization (EM-DPO)**, a clustering algorithm that discovers latent user preference groups and trains an ensemble of specialized LLMs for each group. Crucially, the authors establish a theoretical link to econometrics, arguing that **binary comparisons are insufficient** for identifying heterogeneous preferences, demonstrating the necessity of collecting **ternary preferences** (preferences among three options). Finally, the paper introduces **MinMax Regret Aggregation (MMRA)** to combine the ensemble models into a single "fair" policy that minimizes the worst-case performance loss across all identified user subgroups, ensuring equitable deployment.
Comments 
In Channel
A Definition of AGI

A Definition of AGI

2025-10-2216:28

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Enoch H. Kang