RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition

Update: 2024-06-26

Description

Things to be aware of if you work on language model fine-tuning.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/rlhf-roundup-2024

00:00 RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition
04:32 How big is the impact of RLHF relative to pretraining?
05:54 RewardBench retrospective after 100 models and 90% peak accuracy
09:19 LMSYS's reward modeling competition

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf-roundup/img_009.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf-roundup/img_012.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf-roundup/img_017.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf-roundup/img_026.png

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

AI Safety's Crux: Culture vs. Capitalism

2024-10-0210:30

Interviewing Riley Goodside on the science of prompting

2024-09-3001:08:39

Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem

2024-09-2714:04

Reverse engineering OpenAI's o1

2024-09-1718:52

Futures of the data foundry business model

2024-09-1111:32

A post-training approach to AI regulation with Model Specs

2024-09-1005:39

OpenAI's Strawberry, LM self-talk, inference scaling laws, and spending more on inference

2024-09-0510:40

OLMoE and the hidden simplicity in training better foundation models

2024-09-0410:31

On the current definitions of open-source AI and the state of the data commons

2024-08-2808:01

Nous Hermes 3 and exploiting underspecified evaluations

2024-08-1608:32

Interviewing Ross Taylor on LLM reasoning, Llama fine-tuning, Galactica, agents

2024-08-0801:02:22

A recipe for frontier model post-training

2024-08-0710:24

Interviewing Sebastian Raschka on the state of open LLMs, Llama 3.1, and AI education

2024-08-0101:03:42

GPT-4o-mini changed ChatBotArena

2024-07-3107:55

Llama 3.1 405b, Meta's AI strategy, and the new open frontier model ecosystem

2024-07-2315:22

SB 1047, AI regulation, and unlikely allies for open models

2024-07-1714:20

Switched to Claude 3.5

2024-07-0306:40

Interviewing Dean Ball on AI policy

2024-06-2756:31

RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition

2024-06-2611:52

Frontiers in synthetic data

2024-06-2111:27

00:00

RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition

#box-pro-ellipsis-173478178844590{-webkit-line-clamp:2;}RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition

RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition

Nathan Lambert

RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition