DiscoverLessWrong (30+ Karma)“The Plan - 2025 Update” by johnswentworth, David Lorell
“The Plan - 2025 Update” by johnswentworth, David Lorell

“The Plan - 2025 Update” by johnswentworth, David Lorell

Update: 2025-12-31
Share

Description

What's “The Plan”?

For several years now, around the end of the year, I (John) write a post on our plan for AI alignment. That plan hasn’t changed too much over the past few years, so both this year's post and last year's are written as updates to The Plan - 2023 Version.

I’ll give a very quick outline here of what's in the 2023 Plan post. If you have questions or want to argue about points, you should probably go to that post to get the full version.

  • What is The Plan for AI alignment? Briefly: Sort out our fundamental confusions about agency and abstraction enough to do interpretability that works and generalizes robustly. Then, look through our AI's internal concepts for a good alignment target, and Retarget the Search.
  • That plan is not the One Unique Plan we’re targeting; it's a median plan, among a whole space of possibilities. Generally, we aim to work on things which are robust bottlenecks to a broad space of plans. In particular, our research mostly focuses on natural abstraction, because that seems like the most robust bottleneck on which (not-otherwise-doomed) plans get stuck.
  • Most of the 2023 Plan post explains why natural abstraction [...]

---

Outline:

(00:11 ) What's The Plan?

(03:00 ) So, how's progress? What are you up to?

(04:05 ) What are the next bottlenecks to understanding natural abstraction?

(04:28 ) What's the territory-first prong?

(06:36 ) What's the mind-first prong?

(08:26 ) So what has and hasn't been figured out on the territory prong?

(11:40 ) And what has and hasn't been figured out on the mind prong?

---


First published:

December 31st, 2025



Source:

https://www.lesswrong.com/posts/vh5ZjdmJYJgnbpq8C/the-plan-2025-update


---


Narrated by TYPE III AUDIO.

Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

“The Plan - 2025 Update” by johnswentworth, David Lorell

“The Plan - 2025 Update” by johnswentworth, David Lorell