DiscoverAI BreakdownProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Update: 2025-11-04
Share

Description

In this episode, we discuss ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models by Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong. This paper introduces ProRL, a new reinforcement learning training method that uncovers novel reasoning strategies beyond those found in base language models. Empirical results show that models trained with ProRL consistently outperform base models on challenging reasoning tasks, including cases where base models fail even with extensive attempts. The study demonstrates that prolonged RL can meaningfully expand reasoning capabilities by exploring new solution spaces over time, advancing understanding of how RL enhances language model reasoning.
Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

agibreakdown