Listen Top Shows Blog

20 - 'Reform' AI Alignment with Scott Aaronson

20 - 'Reform' AI Alignment with Scott Aaronson

Update: 2023-04-12

Share

Description

How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make progress in AI alignment, as well as his work on watermarking the output of language models, and how he moved from a background in quantum complexity theory to working on AI.

Note: this episode was recorded before this story (vice.com/en/article/pkadgm/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says) emerged of a man committing suicide after discussions with a language-model-based chatbot, that included discussion of the possibility of him killing himself.

Patreon: https://www.patreon.com/axrpodcast

Ko-fi: https://ko-fi.com/axrpodcast

Topics we discuss, and timestamps:

- 0:00:36 - 'Reform' AI alignment

- 0:01:52 - Epistemology of AI risk

- 0:20:08 - Immediate problems and existential risk

- 0:24:35 - Aligning deceitful AI

- 0:30:59 - Stories of AI doom

- 0:34:27 - Language models

- 0:43:08 - Democratic governance of AI

- 0:59:35 - What would change Scott's mind

- 1:14:45 - Watermarking language model outputs

- 1:41:41 - Watermark key secrecy and backdoor insertion

- 1:58:05 - Scott's transition to AI research

- 2:03:48 - Theoretical computer science and AI alignment

- 2:14:03 - AI alignment and formalizing philosophy

- 2:22:04 - How Scott finds AI research

- 2:24:53 - Following Scott's research

The transcript: axrp.net/episode/2023/04/11/episode-20-reform-ai-alignment-scott-aaronson.html

Links to Scott's things:

- Personal website: scottaaronson.com

- Book, Quantum Computing Since Democritus: amazon.com/Quantum-Computing-since-Democritus-Aaronson/dp/0521199565/

- Blog, Shtetl-Optimized: scottaaronson.blog

Writings we discuss:

- Reform AI Alignment: scottaaronson.blog/?p=6821

- Planting Undetectable Backdoors in Machine Learning Models: arxiv.org/abs/2204.06974

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

37 - Jaime Sevilla on AI Forecasting

37 - Jaime Sevilla on AI Forecasting

2024-10-0401:44:25

36 - Adam Shai and Paul Riechers on Computational Mechanics

36 - Adam Shai and Paul Riechers on Computational Mechanics

2024-09-2901:48:27

New Patreon tiers + MATS applications

New Patreon tiers + MATS applications

2024-09-2805:32

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

2024-08-2402:17:24

34 - AI Evaluations with Beth Barnes

34 - AI Evaluations with Beth Barnes

2024-07-2802:14:02

33 - RLHF Problems with Scott Emmons

33 - RLHF Problems with Scott Emmons

2024-06-1201:41:24

32 - Understanding Agency with Jan Kulveit

32 - Understanding Agency with Jan Kulveit

2024-05-3002:22:29

31 - Singular Learning Theory with Daniel Murfet

31 - Singular Learning Theory with Daniel Murfet

2024-05-0702:32:07

30 - AI Security with Jeffrey Ladish

30 - AI Security with Jeffrey Ladish

2024-04-3002:15:44

29 - Science of Deep Learning with Vikrant Varma

29 - Science of Deep Learning with Vikrant Varma

2024-04-2502:13:46

28 - Suing Labs for AI Risk with Gabriel Weil

28 - Suing Labs for AI Risk with Gabriel Weil

2024-04-1701:57:30

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

2024-04-1102:56:05

26 - AI Governance with Elizabeth Seger

26 - AI Governance with Elizabeth Seger

2023-11-2601:57:13

25 - Cooperative AI with Caspar Oesterheld

25 - Cooperative AI with Caspar Oesterheld

2023-10-0303:02:09

24 - Superalignment with Jan Leike

24 - Superalignment with Jan Leike

2023-07-2702:08:29

23 - Mechanistic Anomaly Detection with Mark Xu

23 - Mechanistic Anomaly Detection with Mark Xu

2023-07-2702:05:52

Survey, store closing, Patreon

Survey, store closing, Patreon

2023-06-2804:26

22 - Shard Theory with Quintin Pope

22 - Shard Theory with Quintin Pope

2023-06-1503:28:21

21 - Interpretability for Engineers with Stephen Casper

21 - Interpretability for Engineers with Stephen Casper

2023-05-0201:56:02

20 - 'Reform' AI Alignment with Scott Aaronson

20 - 'Reform' AI Alignment with Scott Aaronson

2023-04-1202:27:35

00:00

00:00

1.0x

20 - 'Reform' AI Alignment with Scott Aaronson

20 - 'Reform' AI Alignment with Scott Aaronson