DiscoverInterconnects(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses
(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

Update: 2024-12-11
Share

Description

Original post:

https://www.interconnects.ai/p/openais-reinforcement-finetuning

Chapters

00:00 Introduction

04:19 The impact of reinforcement finetuning’s existence

07:29 Hypotheses on reinforcement finetuning’s implementation

Figures

Fig. 1, Yann’s Cake

Fig. 2, Grader config

Fig. 3, RLVR learning curves



Get full access to Interconnects at www.interconnects.ai/subscribe
Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

Nathan Lambert