“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

Update: 2025-11-14

Description

Is focusing on corrigibility our best shot at getting to ASI alignment?

Max Harms and Jeremy Gillen are current and former MIRI alignment researchers who both see superintelligent AI as an imminent extinction threat, but disagree about Max's proposal of Corrigibility as Singular Target (CAST).

Max thinks focusing on corrigibility is the most plausible path to build ASI without losing control and dying, while Jeremy is skeptical that attempting CAST would lead to better superintelligent AI behavior on a sufficiently early try.

We recorded a friendly debate to understand the crux of Max and Jeremy's disagreement. The conversation also doubles as a way to learn about Max's Corrigibility As Singular Target proposal.

Video

Podcast

Listen on Spotify, import the RSS feed, or search "Doom Debates" in your podcast player.

Plus: Max's New Book, Red Heart

Max just published Red Heart, a realistic sci-fi thriller that brings the corrigibility problem to life through a high-stakes Chinese government AI project.

I thoroughly enjoyed reading it and highly recommend it! The last 20 minutes of my conversation with Max are all about Red Heart.

Transcript

Episode Preview

Max Harms 00:00:00
If you mess up real bad, this thing goes and eats [...]

---

Outline:

(00:14 ) Is focusing on corrigibility our best shot at getting to ASI alignment?

(01:08 ) Video

(01:14 ) Podcast

(01:24 ) Plus: Maxs New Book, Red Heart

(01:55 ) Transcript

(01:58 ) Episode Preview

(13:32 ) Why Corrigibility Matters

(15:20 ) What's Your P(Doom)™

(20:42 ) Max's Case for Corrigibility

(23:46 ) Jeremy's Case Against Corrigibility

(26:21 ) Max's Mainline AI Scenario

(32:44 ) 4. Strategies: Alignment, Control, Corrigibility, Don't Build It

(41:53 ) Corrigibility vs HHH (Helpful, Harmless, Honest)

(47:31 ) Asimov's 3 Laws of Robotics

(52:45 ) Is Corrigibility a Coherent Concept?

(01:03:21 ) Corrigibility vs Shutdown-ability

(01:09:21 ) CAST: Corrigibility as Singular Target, Near Misses, Iterations

(01:19:20 ) Debating if Max is Over-Optimistic

(01:32:46 ) Debating if Corrigibility is the Best Target

(01:39:58 ) Would Max Work for Anthropic?

(01:42:37 ) Max's Modest Hopes

(02:02:35 ) Max's New Book: Red Heart

(02:21:52 ) Outro

---

First published:

November 14th, 2025

Source:

https://www.lesswrong.com/posts/CsXAg8dHSgghDAoPx/ai-corrigibility-debate-max-harms-vs-jeremy-gillen

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments

In Channel

“Understanding and Controlling LLM Generalization” by Daniel Tan

2025-11-1503:55

“AI Craziness: Additional Suicide Lawsuits and The Fate of GPT-4o” by Zvi

2025-11-1513:13

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

2025-11-1402:24:59

“10” by Ben Pace

2025-11-1407:55

“Everyone has a plan until they get lied to the face” by Screwtape

2025-11-1412:49

“The rare, deadly virus lurking in the Southwest US, and the bigger picture” by eukaryote

2025-11-1431:38

“Creditworthiness should not be for sale” by habryka

2025-11-1414:53

“Types of systems that could be useful for agent foundations” by Alex_Altair

2025-11-1409:14

“The Charge of the Hobby Horse” by TsviBT

2025-11-1410:51

“Two can keep a secret if one is dead. So please share everything with at least one person.” by habryka

2025-11-1403:57

“Why Truth First?” by johnswentworth

2025-11-1411:54

“Orient Speed in the 21st Century” by Raemon

2025-11-1405:57

“Tell people as early as possible it’s not going to work out” by habryka

2025-11-1403:20

“Epistemic Spot Check: Expected Value of Donating to Alex Bores’s Congressional Campaign” by MichaelDickens

2025-11-1411:50

“(Fantasy) -> (Planning): A Core Mental Move For Agentic Humans?” by johnswentworth

2025-11-1403:59

“Weight-sparse transformers have interpretable circuits” by leogao

2025-11-1302:48

“What’s so hard about...? A question worth asking” by Ruby

2025-11-1304:48

“Paranoia rules everything around me” by habryka

2025-11-1322:33

“Favorite quotes from ‘High Output Management’” by Nina Panickssery

2025-11-1310:19

“The Pope Offers Wisdom” by Zvi

2025-11-1316:25

00:00

1.0x

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

#box-pro-ellipsis-176319005396760{-webkit-line-clamp:2;}“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen

“AI Corrigibility Debate: Max Harms vs. Jeremy Gillen” by Liron, Max Harms, Jeremy Gillen