“A Rocket–Interpretability Analogy” by plex

Update: 2024-10-25

Description

1.

4.4% of the US federal budget went into the space race at its peak.

This was surprising to me, until a friend pointed out that landing rockets on specific parts of the moon requires very similar technology to landing rockets in soviet cities.[1]

I wonder how much more enthusiastic the scientists working on Apollo were, with the convenient motivating story of “I’m working towards a great scientific endeavor” vs “I’m working to make sure we can kill millions if we want to”.

2.

The field of alignment seems to be increasingly dominated by interpretability. (and obedience[2])

This was surprising to me[3], until a friend pointed out that partially opening the black box of NNs is the kind of technology that would scaling labs find new unhobblings by noticing ways in which the internals of their models are being inefficient and having better tools to evaluate capabilities advances.[4]

I [...]

---

Outline:

(00:03 ) 1.

(00:35 ) 2.

(01:20 ) 3.

The original text contained 6 footnotes which were omitted from this narration.

---

First published:
October 21st, 2024

Source:
https://www.lesswrong.com/posts/h4wXMXneTPDEjJ7nv/a-rocket-interpretability-analogy

---

Narrated by TYPE III AUDIO.

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov

2024-11-2004:58

“OpenAI Email Archives” by habryka

2024-11-1901:03:06

“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon

2024-11-1809:02

“Neutrality” by sarahconstantin

2024-11-1724:08

“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio

2024-11-1614:20

“OpenAI Email Archives (from Musk v. Altman)” by habryka

2024-11-1601:03:44

“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

2024-11-1527:19

“The Online Sports Gambling Experiment Has Failed” by Zvi

2024-11-1222:11

“o1 is a bad idea” by abramdemski

2024-11-1204:40

“Current safety training techniques do not fully transfer to the agent setting” by Simon Lermen, Govind Pimpale

2024-11-0910:10

“Explore More: A Bag of Tricks to Keep Your Life on the Rails” by Shoshannah Tekofsky

2024-11-0421:00

“Survival without dignity” by L Rudolf L

2024-11-0429:37

“The Median Researcher Problem” by johnswentworth

2024-11-0402:58

“The Compendium, A full argument about extinction risk from AGI” by adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell, Andrea_Miotti

2024-11-0104:18

“What TMS is like” by Sable

2024-10-3111:01

“The hostile telepaths problem” by Valentine

2024-10-2828:38

“A bird’s eye view of ARC’s research” by Jacob_Hilton

2024-10-2711:05

“A Rocket–Interpretability Analogy” by plex

2024-10-2502:30

“I got dysentery so you don’t have to” by eukaryote

2024-10-2431:39

“Overcoming Bias Anthology” by Arjun Panickssery

2024-10-2308:33

00:00

“A Rocket–Interpretability Analogy” by plex

#box-pro-ellipsis-173223886995182{-webkit-line-clamp:2;}“A Rocket–Interpretability Analogy” by plex

“A Rocket–Interpretability Analogy” by plex

“A Rocket–Interpretability Analogy” by plex