RLHF: A thin line between useful and lobotomized

Update: 2024-05-01

Description

Many, many signs of life for preference fine-tuning beyond spoofing chat evaluation tools.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/how-rlhf-works-2

00:00 How RLHF works, part 2: A thin line between useful and lobotomized
04:27 The chattiness paradox
08:09 The mechanism for making models chattier
10:42 Next steps for RLHF research

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf/img_012.webp
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf/img_018.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf/img_025.png

Comments

In Channel

Switched to Claude 3.5

2024-07-0306:40

Interviewing Dean Ball on AI policy

2024-06-2756:31

RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition

2024-06-2611:52

Frontiers in synthetic data

2024-06-2111:27

Text-to-video AI is already abundant

2024-06-1808:18

AI for the rest of us

2024-06-1212:35

A realistic path to robotic foundation models

2024-06-0507:49

We aren't running out of training data, we are running out of open training data

2024-05-2908:29

Name, image, and AI's likeness

2024-05-2209:03

OpenAI chases Her

2024-05-1612:28

OpenAI's Model (behavior) Spec, RLHF transparency, and personalization questions

2024-05-1314:05

RLHF: A thin line between useful and lobotomized

2024-05-0113:08

Phi 3 and Arctic: Outlier LMs are hints

2024-04-3009:46

AGI is what you want it to be

2024-04-2410:38

Llama 3: Scaling open LLMs to AGI

2024-04-2115:05

Stop "reinventing" everything to "solve" alignment

2024-04-1707:32

The end of the "best open LLM"

2024-04-1506:45

Why we disagree on what open-source AI should be

2024-04-0308:57

DBRX: The new best open LLM and Databricks' ML strategy

2024-03-2916:33

Evaluations: Trust, performance, and price (bonus, announcing RewardBench)

2024-03-2112:40

00:00

1.0x

RLHF: A thin line between useful and lobotomized

#box-pro-ellipsis-172025079991798{-webkit-line-clamp:2;}RLHF: A thin line between useful and lobotomized

RLHF: A thin line between useful and lobotomized

Nathan Lambert

RLHF: A thin line between useful and lobotomized