Listen Top Shows Blog

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

Update: 2024-12-11

Share

Description

Original post:

https://www.interconnects.ai/p/openais-reinforcement-finetuning

Chapters

00:00 Introduction

04:19 The impact of reinforcement finetuning’s existence

07:29 Hypotheses on reinforcement finetuning’s implementation

Figures

Fig. 1, Yann’s Cake

Fig. 2, Grader config

Fig. 3, RLVR learning curves

Get full access to Interconnects at www.interconnects.ai/subscribe

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

(Voiceover) The AI agent spectrum

(Voiceover) The AI agent spectrum

2024-12-1811:00

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

2024-12-1112:40

Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning

2024-12-0501:08:33

(Voiceover) OpenAI's o1 using "search" was a PSYOP

(Voiceover) OpenAI's o1 using "search" was a PSYOP

2024-12-0412:13

(Voiceover) OLMo 2 and building effective teams for training language models

(Voiceover) OLMo 2 and building effective teams for training language models

2024-11-2610:26

(Voiceover) Tülu 3: The next era in open post-training

(Voiceover) Tülu 3: The next era in open post-training

2024-11-2107:59

(Voiceover) Scaling realities

(Voiceover) Scaling realities

2024-11-1404:21

(Voiceover) Saving the National AI Research Resource & my AI policy outlook

(Voiceover) Saving the National AI Research Resource & my AI policy outlook

2024-11-1311:22

Interviewing Tim Dettmers on open-source AI: Agents, scaling, quantization and what's next

Interviewing Tim Dettmers on open-source AI: Agents, scaling, quantization and what's next

2024-11-0701:15:45

Interviewing Andrew Carr of Cartwheel on the State of Generative AI

Interviewing Andrew Carr of Cartwheel on the State of Generative AI

2024-10-3154:10

(Voiceover) Why I build open language models

(Voiceover) Why I build open language models

2024-10-3010:19

(Voiceover) Claude's agentic future and the current state of the frontier models

(Voiceover) Claude's agentic future and the current state of the frontier models

2024-10-2311:23

Interviewing Arvind Narayanan on making sense of AI hype

Interviewing Arvind Narayanan on making sense of AI hype

2024-10-1754:21

(Voiceover) Building on evaluation quicksand

(Voiceover) Building on evaluation quicksand

2024-10-1616:36

Interviewing Andrew Trask on how language models should store (and access) information

Interviewing Andrew Trask on how language models should store (and access) information

2024-10-1001:00:12

How scaling changes model behavior

How scaling changes model behavior

2024-10-0911:47

[Article Voiceover] AI Safety's Crux: Culture vs. Capitalism

[Article Voiceover] AI Safety's Crux: Culture vs. Capitalism

2024-10-0210:29

Interviewing Riley Goodside on the science of prompting

Interviewing Riley Goodside on the science of prompting

2024-09-3001:08:39

[Article Voiceover] Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

[Article Voiceover] Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

2024-09-2714:04

[Article Voiceover] Reverse engineering OpenAI's o1

[Article Voiceover] Reverse engineering OpenAI's o1

2024-09-1718:51

00:00

00:00

x

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

(Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses

Nathan Lambert