Reward Mismatches in RL Cause Emergent Misalignment
Update: 2025-12-02
Description
Podcast episode for Reward Mismatches in RL Cause Emergent Misalignment.
* 00:00 - Introduction
* 02:53 - Abstract Of The Paper
* 04:15 - The Problem Statement
* 06:16 - The Inoculation Solution
* 08:48 - Cleaning The Data Versus Cleaning The Environments
* 10:31 - No All Of This Does Not Solve Our Most Important Problems
* 15:46 - It Does Help On Important Short Term Problems
The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.
Get full access to DWAtV Podcast at dwatvpodcast.substack.com/subscribe
Comments
In Channel






