DiscoverDon't Worry About the Vase PodcastReward Mismatches in RL Cause Emergent Misalignment
Reward Mismatches in RL Cause Emergent Misalignment

Reward Mismatches in RL Cause Emergent Misalignment

Update: 2025-12-02
Share

Description

Podcast episode for Reward Mismatches in RL Cause Emergent Misalignment.

* 00:00 - Introduction

* 02:53 - Abstract Of The Paper

* 04:15 - The Problem Statement

* 06:16 - The Inoculation Solution

* 08:48 - Cleaning The Data Versus Cleaning The Environments

* 10:31 - No All Of This Does Not Solve Our Most Important Problems

* 15:46 - It Does Help On Important Short Term Problems

The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.

https://open.substack.com/pub/thezvi/p/reward-mismatches-in-rl-cause-emergent?utm_campaign=post-expanded-share&utm_medium=web



Get full access to DWAtV Podcast at dwatvpodcast.substack.com/subscribe
Comments 
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Reward Mismatches in RL Cause Emergent Misalignment

Reward Mismatches in RL Cause Emergent Misalignment

Askwho Casts AI