“Why I don’t believe Superalignment will work” by Simon Lermen

Update: 2025-09-23

Description

We skip over [..] where we move from the human-ish range to strong superintelligence[1]. [..] the period where we can harness potentially vast quantities of AI labour to help us with the alignment of the next generation of models

- Will MacAskill in his critique of IABIED

I want to respond to Will MacAskill's claim in his IABIED review that we may be able use AI to solve alignment.[1] Will believes that recent developments in AI made it more likely that takeoff will be relatively slow - "Sudden, sharp, large leaps in intelligence now look unlikely". Because of this, he and many others believe that there will likely be a period of time at some point in the future when we can essentially direct the AIs to align more powerful AIs. But it appears to me that a “slow takeoff” is not sufficient at all and that a [...]

---

Outline:

(01:47 ) Fast takeoff is possible

(02:49 ) AIs are unlikely to speed up alignment before capabilities

(04:21 ) What would the AI alignment researchers actually be doing?

(05:29 ) Alignment problem might require genius breakthroughs

(06:57 ) Most labs won't use the time

(07:26 ) The plan could have negative consequences

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

September 22nd, 2025

Source:

https://www.lesswrong.com/posts/kyBGcHfzfZziHm5xL/why-i-don-t-believe-superalignment-will-work

---

Narrated by TYPE III AUDIO.

Comments

In Channel

“D&D.Sci: Serial Healers [Evaluation & Ruleset]” by abstractapplic

2025-09-2307:06

“Notes on fatalities from AI takeover” by ryan_greenblatt

2025-09-2315:47

“The world’s first frontier AI regulation is surprisingly thoughtful: the EU’s Code of Practice” by MKodama

2025-09-2327:55

“Ethics-Based Refusals Without Ethics-Based Refusal Training” by 1a3orn

2025-09-2325:56

[Linkpost] “We are likely in an AI overhang, and this is bad.” by Gabriel Alfour

2025-09-2303:11

“Why I don’t believe Superalignment will work” by Simon Lermen

2025-09-2309:06

“Accelerando as a ‘Slow, Reasonably Nice Takeoff’ Story” by Raemon

2025-09-2348:33

“Rejecting Violence as an AI Safety Strategy” by James_Miller

2025-09-2308:16

“Research Agenda: Synthesizing Standalone World-Models (+ Bounties, + Seeking Funding)” by Thane Ruthenis

2025-09-2323:17

[Linkpost] “Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures” by Charbel-Raphaël

2025-09-2203:21

“Focus transparency on risk reports, not safety cases” by ryan_greenblatt

2025-09-2211:45

“This is a review of the reviews” by Recurrented

2025-09-2204:17

“What do people mean when they say that something will become more like a utility maximizer?” by Nina Panickssery

2025-09-2104:32

“And Yet, Defend your Thoughts from AI Writing” by Michael Samoilov

2025-09-2111:43

“Astralcodexten IRB history error” by Paul Crowley

2025-09-2104:14

“Book Review: If Anyone Builds It, Everyone Dies” by Zvi

2025-09-2155:49

“Book Review: If Anyone Builds It, Everyone Dies” by Nina Panickssery

2025-09-2020:56

“Contra Collier on IABIED” by Max Harms

2025-09-2036:45

“AI Lobbying is Not Normal” by Algon

2025-09-2005:49

“The Problem with Defining an ‘AGI Ban’ by Outcome (a lawyer’s take).” by Katalina Hernandez

2025-09-2010:36

00:00

1.0x

“Why I don’t believe Superalignment will work” by Simon Lermen

#box-pro-ellipsis-175873084739893{-webkit-line-clamp:2;}“Why I don’t believe Superalignment will work” by Simon Lermen

“Why I don’t believe Superalignment will work” by Simon Lermen

“Why I don’t believe Superalignment will work” by Simon Lermen