Listen Top Shows Blog

22 - Shard Theory with Quintin Pope

22 - Shard Theory with Quintin Pope

Update: 2023-06-15

Share

Description

What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will superhuman AI look like ruthless coherent utility optimization, or more like a mishmash of contextually activated desires? This episode's guest, Quintin Pope, has been thinking about these questions as a leading researcher in the shard theory community. We talk about what shard theory is, what it says about humans and neural networks, and what the implications are for making AI safe.

Patreon: patreon.com/axrpodcast

Ko-fi: ko-fi.com/axrpodcast

Episode art by Hamish Doodles: hamishdoodles.com

Topics we discuss, and timestamps:

- 0:00:42 - Why understand human value formation?

- 0:19:59 - Why not design methods to align to arbitrary values?

- 0:27:22 - Postulates about human brains

- 0:36:20 - Sufficiency of the postulates

- 0:44:55 - Reinforcement learning as conditional sampling

- 0:48:05 - Compatibility with genetically-influenced behaviour

- 1:03:06 - Why deep learning is basically what the brain does

- 1:25:17 - Shard theory

- 1:38:49 - Shard theory vs expected utility optimizers

- 1:54:45 - What shard theory says about human values

- 2:05:47 - Does shard theory mean we're doomed?

- 2:18:54 - Will nice behaviour generalize?

- 2:33:48 - Does alignment generalize farther than capabilities?

- 2:42:03 - Are we at the end of machine learning history?

- 2:53:09 - Shard theory predictions

- 2:59:47 - The shard theory research community

- 3:13:45 - Why do shard theorists not work on replicating human childhoods?

- 3:25:53 - Following shardy research

The transcript: axrp.net/episode/2023/06/15/episode-22-shard-theory-quintin-pope.html

Shard theorist links:

- Quintin's LessWrong profile: lesswrong.com/users/quintin-pope

- Alex Turner's LessWrong profile: lesswrong.com/users/turntrout

- Shard theory Discord: discord.gg/AqYkK7wqAG

- EleutherAI Discord: discord.gg/eleutherai

Research we discuss:

- The Shard Theory Sequence: lesswrong.com/s/nyEFg3AuJpdAozmoX

- Pretraining Language Models with Human Preferences: arxiv.org/abs/2302.08582

- Inner alignment in salt-starved rats: lesswrong.com/posts/wcNEXDHowiWkRxDNv/inner-alignment-in-salt-starved-rats

- Intro to Brain-like AGI Safety Sequence: lesswrong.com/s/HzcM2dkCq7fwXBej8

- Brains and transformers:

- The neural architecture of language: Integrative modeling converges on predictive processing: pnas.org/doi/10.1073/pnas.2105646118

- Brains and algorithms partially converge in natural language processing: nature.com/articles/s42003-022-03036-1

- Evidence of a predictive coding hierarchy in the human brain listening to speech: nature.com/articles/s41562-022-01516-2

- Singular learning theory explainer: Neural networks generalize because of this one weird trick: lesswrong.com/posts/fovfuFdpuEwQzJu2w/neural-networks-generalize-because-of-this-one-weird-trick

- Singular learning theory links: metauni.org/slt/

- Implicit Regularization via Neural Feature Alignment, aka circles in the parameter-function map: arxiv.org/abs/2008.00938

- The shard theory of human values: lesswrong.com/s/nyEFg3AuJpdAozmoX/p/iCfdcxiyr2Kj8m8mT

- Predicting inductive biases of pre-trained networks: openreview.net/forum?id=mNtmhaDkAr

- Understanding and controlling a maze-solving policy network, aka the cheese vector: lesswrong.com/posts/cAC4AXiNC5ig6jQnc/understanding-and-controlling-a-maze-solving-policy-network

- Quintin's Research agenda: Supervising AIs improving AIs: lesswrong.com/posts/7e5tyFnpzGCdfT4mR/research-agenda-supervising-ais-improving-ais

- Steering GPT-2-XL by adding an activation vector: lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

Links for the addendum on mesa-optimization skepticism:

- Quintin's response to Yudkowsky arguing against AIs being steerable by gradient descent: lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#Yudkowsky_argues_against_AIs_being_steerable_by_gradient_descent_

- Quintin on why evolution is not like AI training: lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#Edit__Why_evolution_is_not_like_AI_training

- Evolution provides no evidence for the sharp left turn: lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn

- Let's Agree to Agree: Neural Networks Share Classification Order on Real Datasets: arxiv.org/abs/1905.10854

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

38.1 - Alan Chan on Agent Infrastructure

38.1 - Alan Chan on Agent Infrastructure

2024-11-1624:48

38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems

38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems

2024-11-1422:42

37 - Jaime Sevilla on AI Forecasting

37 - Jaime Sevilla on AI Forecasting

2024-10-0401:44:25

36 - Adam Shai and Paul Riechers on Computational Mechanics

36 - Adam Shai and Paul Riechers on Computational Mechanics

2024-09-2901:48:27

New Patreon tiers + MATS applications

New Patreon tiers + MATS applications

2024-09-2805:32

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

2024-08-2402:17:24

34 - AI Evaluations with Beth Barnes

34 - AI Evaluations with Beth Barnes

2024-07-2802:14:02

33 - RLHF Problems with Scott Emmons

33 - RLHF Problems with Scott Emmons

2024-06-1201:41:24

32 - Understanding Agency with Jan Kulveit

32 - Understanding Agency with Jan Kulveit

2024-05-3002:22:29

31 - Singular Learning Theory with Daniel Murfet

31 - Singular Learning Theory with Daniel Murfet

2024-05-0702:32:07

30 - AI Security with Jeffrey Ladish

30 - AI Security with Jeffrey Ladish

2024-04-3002:15:44

29 - Science of Deep Learning with Vikrant Varma

29 - Science of Deep Learning with Vikrant Varma

2024-04-2502:13:46

28 - Suing Labs for AI Risk with Gabriel Weil

28 - Suing Labs for AI Risk with Gabriel Weil

2024-04-1701:57:30

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

2024-04-1102:56:05

26 - AI Governance with Elizabeth Seger

26 - AI Governance with Elizabeth Seger

2023-11-2601:57:13

25 - Cooperative AI with Caspar Oesterheld

25 - Cooperative AI with Caspar Oesterheld

2023-10-0303:02:09

24 - Superalignment with Jan Leike

24 - Superalignment with Jan Leike

2023-07-2702:08:29

23 - Mechanistic Anomaly Detection with Mark Xu

23 - Mechanistic Anomaly Detection with Mark Xu

2023-07-2702:05:52

Survey, store closing, Patreon

Survey, store closing, Patreon

2023-06-2804:26

22 - Shard Theory with Quintin Pope

22 - Shard Theory with Quintin Pope

2023-06-1503:28:21

00:00

00:00

x

22 - Shard Theory with Quintin Pope

22 - Shard Theory with Quintin Pope