Reward Mismatches in RL Cause Emergent Misalignment

Update: 2025-12-02

Description

Podcast episode for Reward Mismatches in RL Cause Emergent Misalignment.

* 00:00 - Introduction

* 02:53 - Abstract Of The Paper

* 04:15 - The Problem Statement

* 06:16 - The Inoculation Solution

* 08:48 - Cleaning The Data Versus Cleaning The Environments

* 10:31 - No All Of This Does Not Solve Our Most Important Problems

* 15:46 - It Does Help On Important Short Term Problems

The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.

https://open.substack.com/pub/thezvi/p/reward-mismatches-in-rl-cause-emergent?utm_campaign=post-expanded-share&utm_medium=web

Get full access to DWAtV Podcast at dwatvpodcast.substack.com/subscribe

Comments

In Channel

Monthly Roundup #37: December 2026

2025-12-1247:36

AI #146: Chipping In

2025-12-1101:29:58

Childhood and Education #15: Got To Get Out

2025-12-1050:10

Selling H200s to China Is Unwise and Unpopular

2025-12-0926:41

Little Echo

2025-12-0804:36

DeepSeek v3.2 Is Okay And Cheap But Slow

2025-12-0521:46

AI #145: You've Got Soul

2025-12-0402:01:34

On Dwarkesh Patel's Second Interview With Ilya Sutskever

2025-12-0342:54

Reward Mismatches in RL Cause Emergent Misalignment

2025-12-0217:13

Claude Opus 4.5 Is The Best Model Available

2025-12-0149:38

Claude Opus 4.5: Model Card, Alignment and Safety

2025-11-2801:13:06

AI #144: Thanks For the Models

2025-11-2701:37:53

The Big Nonprofits Post 2025

2025-11-2601:59:20

ChatGPT 5.1 Codex Max

2025-11-2518:39

Gemini 3 Pro Is a Vast Intelligence With No Spine

2025-11-2401:03:57

Gemini 3: Model Card and Safety Framework Report

2025-11-2129:33

AI #143: Everything, Everywhere, All At Once

2025-11-2002:16:41

Monthly Roundup #36: November 2025

2025-11-1901:18:02

GPT 5.1 Follows Custom Instructions and Glazes

2025-11-1844:42

On Writing #2

2025-11-1725:41

00:00

1.0x

Reward Mismatches in RL Cause Emergent Misalignment

#box-pro-ellipsis-176565571328537{-webkit-line-clamp:2;}Reward Mismatches in RL Cause Emergent Misalignment

Reward Mismatches in RL Cause Emergent Misalignment

Askwho Casts AI

Reward Mismatches in RL Cause Emergent Misalignment