Abstract advice to researchers tackling the difficult core problems of AGI alignment

Update: 2025-11-22

Description

Crosspost from my blog.

This some quickly-written, better-than-nothing advice for people who want to make progress on the hard problems of technical AGI alignment.

Background assumptions

The following advice will assume that you're aiming to help solve the core, important technical problem of desigining AGI that does stuff humans would want it to do.
- This excludes everything that isn't about minds and designing minds and so on; so, excluding governance, recruiting, anything social, fieldbuilding, fundraising, whatever. (Not saying those are unimportant; just, this guide is not about that.)
- I don't especially think you should try to do that. It's very hard, and it's more important that AGI capabilities research gets stopped. I think it's so hard that human intelligence amplification is a better investment.
- However, many people say that they want to help with technical AI safety. If you're mainly looking to get a job, this is not the guide for you. This guide is only aimed at helping you help solve the important parts of the problem, which is a very very neglected task among people who say they want to help with technical AI safety generally.

[...]

---

Outline:

(00:21 ) Background assumptions

(02:29 ) Dealing with deference

(04:44 ) Sacrifices

(06:28 ) True doubt

(07:27 ) Iterative babble and prune

(08:43 ) Learning to think

(09:22 ) Grappling with the size of minds

(10:05 ) Zooming

(11:05 ) Generalize a lot

(12:51 ) Notes to mentors

(13:59 ) Object level stuff

---

First published:

November 22nd, 2025

Source:

https://www.lesswrong.com/posts/rZQjk7T6dNqD5HKMg/abstract-advice-to-researchers-tackling-the-difficult-core

---

Narrated by TYPE III AUDIO.

Comments

In Channel

Be Naughty

2025-11-2207:22

Abstract advice to researchers tackling the difficult core problems of AGI alignment

2025-11-2215:29

Why Not Just Train For Interpretability?

2025-11-2208:15

“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

2025-11-2118:46

“AI #143: Everything, Everywhere, All At Once” by Zvi

2025-11-2102:03:46

“Rescuing truth in mathematics from the Liar’s Paradox using fuzzy values” by Adrià Garriga-alonso

2025-11-2122:24

Contra Collisteru: You Get About One Carthage

2025-11-2108:30

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

2025-11-2122:01

Reading My Diary: 10 Years Since CFAR

2025-11-2111:02

“Evrart Claire: A Case Study in Anti-Epistemology” by Ben Pace

2025-11-2113:23

“The Boring Part of Bell Labs” by Elizabeth

2025-11-2125:58

“[Paper] Output Supervision Can Obfuscate the CoT” by jacob_drori, lukemarks, cloud, TurnTrout

2025-11-2010:15

“Dominance: The Standard Everyday Solution To Akrasia” by johnswentworth

2025-11-2003:29

Gemini 3 is Evaluation-Paranoid and Contaminated

2025-11-2015:00

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

2025-11-2022:16

“What Is The Basin Of Convergence For Kelly Betting?” by johnswentworth

2025-11-2006:21

“In Defense of Goodness” by abramdemski

2025-11-2006:45

“Out-paternalizing the government (getting oxygen for my baby)” by Ruby

2025-11-2014:17

“Beren’s Essay on Obedience and Alignment” by StanislavKrym

2025-11-2016:21

“Preventing covert ASI development in countries within our agreement” by Aaron_Scher

2025-11-2022:47

00:00

1.0x

Abstract advice to researchers tackling the difficult core problems of AGI alignment

#box-pro-ellipsis-176382731700513{-webkit-line-clamp:2;}Abstract advice to researchers tackling the difficult core problems of AGI alignment

Abstract advice to researchers tackling the difficult core problems of AGI alignment

Abstract advice to researchers tackling the difficult core problems of AGI alignment