DiscoverBest AI papers explainedThe Era of Real-World Human Interaction: RL from User Conversations
The Era of Real-World Human Interaction: RL from User Conversations

The Era of Real-World Human Interaction: RL from User Conversations

Update: 2025-10-24
Share

Description

This paper introduces Reinforcement Learning from Human Interaction (RLHI), a new method for aligning large language models by learning directly from in-the-wild user conversations rather than expert-annotated data. This paradigm is built on two complementary approaches: User-Guided Rewrites, which leverage users' natural language follow-ups to revise unsatisfactory model outputs, and User-Based Rewards, which uses a reward model conditioned on a user's long-term interaction history (persona) to rank candidate responses. The authors argue that this technique enables personalized, contextual, and continual learning for models, linking long-term user preferences to turn-level feedback. Experimental results show that RLHI variants significantly outperform baselines in personalization and instruction-following and offer gains on reasoning tasks, suggesting that organic human feedback is a scalable and effective source of supervision. The paper highlights that learning from diverse, dynamic user interactions is essential for achieving multifaceted model improvement beyond current static fine-tuning methods.

Comments 
In Channel
A Definition of AGI

A Definition of AGI

2025-10-2216:28

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

The Era of Real-World Human Interaction: RL from User Conversations

The Era of Real-World Human Interaction: RL from User Conversations

Enoch H. Kang