Listen Top Shows Blog

Davidad Dalrymple: Towards Provably Safe AI

Davidad Dalrymple: Towards Provably Safe AI

Update: 2024-09-05

Share

Description

Episode 137

I spoke with Davidad Dalrymple about:

* His perspectives on AI risk

* ARIA (the UK’s Advanced Research and Invention Agency) and its Safeguarded AI Programme

Enjoy—and let me know what you think!

Davidad is a Programme Director at ARIA. He was most recently a Research Fellow in technical AI safety at Oxford. He co-invented the top-40 cryptocurrency Filecoin, led an international neuroscience collaboration, and was a senior software engineer at Twitter and multiple startups.

Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.

Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter

Outline:

* (00:00 ) Intro

* (00:36 ) Calibration and optimism about breakthroughs

* (03:35 ) Calibration and AGI timelines, effects of AGI on humanity

* (07:10 ) Davidad’s thoughts on the Orthogonality Thesis

* (10:30 ) Understanding how our current direction relates to AGI and breakthroughs

* (13:33 ) What Davidad thinks is needed for AGI

* (17:00 ) Extracting knowledge

* (19:01 ) Cyber-physical systems and modeling frameworks

* (20:00 ) Continuities between Davidad’s earlier work and ARIA

* (22:56 ) Path dependence in technology, race dynamics

* (26:40 ) More on Davidad’s perspective on what might go wrong with AGI

* (28:57 ) Vulnerable world, interconnectedness of computers and control

* (34:52 ) Formal verification and world modeling, Open Agency Architecture

* (35:25 ) The Semantic Sufficiency Hypothesis

* (39:31 ) Challenges for modeling

* (43:44 ) The Deontic Sufficiency Hypothesis and mathematical formalization

* (49:25 ) Oversimplification and quantitative knowledge

* (53:42 ) Collective deliberation in expressing values for AI

* (55:56 ) ARIA’s Safeguarded AI Programme

* (59:40 ) Anthropic’s ASL levels

* (1:03:12 ) Guaranteed Safe AI —

* (1:03:38 ) AI risk and (in)accurate world models

* (1:09:59 ) Levels of safety specifications for world models and verifiers — steps to achieve high safety

* (1:12:00 ) Davidad’s portfolio research approach and funding at ARIA

* (1:15:46 ) Earlier concerns about ARIA — Davidad’s perspective

* (1:19:26 ) Where to find more information on ARIA and the Safeguarded AI Programme

* (1:20:44 ) Outro

Links:

* Davidad’s Twitter

* ARIA homepage

* Safeguarded AI Programme

* Papers

* Guaranteed Safe AI

* Davidad’s Open Agency Architecture for Safe Transformative AI

* Dioptics: a Common Generalization of Open Games and Gradient-Based Learners (2019)

* Asynchronous Logic Automata (2008)

Get full access to The Gradient at thegradientpub.substack.com/subscribe

Comments

In Channel

Iason Gabriel: Value Alignment and the Ethics of Advanced AI Systems

Iason Gabriel: Value Alignment and the Ethics of Advanced AI Systems

2025-11-2658:39

2024 in AI, with Nathan Benaich

2024 in AI, with Nathan Benaich

2024-12-2601:48:43

Philip Goff: Panpsychism as a Theory of Consciousness

Philip Goff: Panpsychism as a Theory of Consciousness

2024-12-1201:00:04

Some Changes at The Gradient

Some Changes at The Gradient

2024-11-2134:25

Jacob Andreas: Language, Grounding, and World Models

Jacob Andreas: Language, Grounding, and World Models

2024-10-1001:52:43

Evan Ratliff: Our Future with Voice Agents

Evan Ratliff: Our Future with Voice Agents

2024-09-2601:19:59

Meredith Ringel Morris: Generative AI's HCI Moment

Meredith Ringel Morris: Generative AI's HCI Moment

2024-09-1201:37:45

Davidad Dalrymple: Towards Provably Safe AI

Davidad Dalrymple: Towards Provably Safe AI

2024-09-0501:20:50

Clive Thompson: Tales of Technology

Clive Thompson: Tales of Technology

2024-08-2902:27:35

Judy Fan: Reverse Engineering the Human Cognitive Toolkit

Judy Fan: Reverse Engineering the Human Cognitive Toolkit

2024-08-2201:32:39

L.M. Sacasas: The Questions Concerning Technology

L.M. Sacasas: The Questions Concerning Technology

2024-08-1501:47:20

Pete Wolfendale: The Revenge of Reason

Pete Wolfendale: The Revenge of Reason

2024-08-0802:52:57

Peter Lee: Computing Theory and Practice, and GPT-4's Impact

Peter Lee: Computing Theory and Practice, and GPT-4's Impact

2024-08-0101:01:48

Manuel & Lenore Blum: The Conscious Turing Machine

Manuel & Lenore Blum: The Conscious Turing Machine

2024-07-2502:23:04

Kevin Dorst: Against Irrationalist Narratives

Kevin Dorst: Against Irrationalist Narratives

2024-07-1802:15:21

David Pfau: Manifold Factorization and AI for Science

David Pfau: Manifold Factorization and AI for Science

2024-07-1102:00:52

Dan Hart and Michelle Michael: Bringing AI to Students in New South Wales

Dan Hart and Michelle Michael: Bringing AI to Students in New South Wales

2024-07-0401:13:54

Kristin Lauter: Private AI, Homomorphic Encryption, and AI for Cryptography

Kristin Lauter: Private AI, Homomorphic Encryption, and AI for Cryptography

2024-06-2701:17:13

Sergiy Nesterenko: Automating Circuit Board Design

Sergiy Nesterenko: Automating Circuit Board Design

2024-06-2001:03:35

C. Thi Nguyen: Values, Legibility, and Gamification

C. Thi Nguyen: Values, Legibility, and Gamification

2024-06-1301:30:13

00:00

00:00

x

Davidad Dalrymple: Towards Provably Safe AI

Davidad Dalrymple: Towards Provably Safe AI

daniel bashir