Listen Top Shows Blog

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

Update: 2023-06-25

Share

Description

Michael Cohen is is a DPhil student at the University of Oxford with Mike Osborne. He will be starting a postdoc with Professor Stuart Russell at UC Berkeley, with the Center for Human-Compatible AI. His research considers the expected behaviour of generally intelligent artificial agents, with a view to designing agents that we can expect to behave safely.

You can see more links and a full transcript at www.hearthisidea.com/episodes/cohen.

We discuss:

What is reinforcement learning, and how is it different from supervised and unsupervised learning?

Michael's recently co-authored paper titled 'Advanced artificial agents intervene in the provision of reward'

Why might it be hard to convey what we really want to RL learners — even when we know exactly what we want?

Why might advanced RL systems might tamper with their sources of input, and why could this be very bad?

What assumptions need to hold for this "input tampering" outcome?

Is reward really the optimisation target? Do models "get reward"?

What's wrong with the analogy between RL systems and evolution?

Key links:

Michael's personal website

'Advanced artificial agents intervene in the provision of reward' by Michael K. Cohen, Marcus Hutter, and Michael A. Osborne

'Pessimism About Unknown Unknowns Inspires Conservatism' by Michael Cohen and Marcus Hutter

'Intelligence and Unambitiousness Using Algorithmic Information Theory' by Michael Cohen, Badri Vallambi, and Marcus Hutter

'Quantilizers: A Safer Alternative to Maximizers for Limited Optimization' by Jessica Taylor

'RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning' by Marc Rigter, Bruno Lacerda, and Nick Hawes

'Quantilizers: A Safer Alternative to Maximizers for Limited Optimization' by Jessica Taylor

Season 40 of Survivor

Comments

In Channel

#83 – Max Smeets on Barriers To Cyberweapons

#83 – Max Smeets on Barriers To Cyberweapons

2025-03-1301:36:19

#82 – Tom Kalil on Institutions for Innovation (with Matt Clancy)

#82 – Tom Kalil on Institutions for Innovation (with Matt Clancy)

2024-12-1401:17:37

#81 – Cynthia Schuck on Quantifying Animal Welfare

#81 – Cynthia Schuck on Quantifying Animal Welfare

2024-11-2101:37:16

#80 – Dan Williams on How Persuasion Works

#80 – Dan Williams on How Persuasion Works

2024-10-2601:48:43

#79 – Tamay Besiroglu on Explosive Growth from AI

#79 – Tamay Besiroglu on Explosive Growth from AI

2024-09-1402:09:19

#78 – Jacob Trefethen on Global Health R&D

#78 – Jacob Trefethen on Global Health R&D

2024-09-0802:30:16

#77 – Elizabeth Seger on Open Sourcing AI

#77 – Elizabeth Seger on Open Sourcing AI

2024-07-2501:20:49

#76 – Joe Carlsmith on Scheming AI

#76 – Joe Carlsmith on Scheming AI

2024-03-1601:51:32

#75 – Eric Schwitzgebel on Digital Consciousness and the Weirdness of the World

#75 – Eric Schwitzgebel on Digital Consciousness and the Weirdness of the World

2024-02-0401:58:50

#74 – Sonia Ben Ouagrham-Gormley on Barriers to Bioweapons

#74 – Sonia Ben Ouagrham-Gormley on Barriers to Bioweapons

2023-12-1901:54:05

Bonus: 'How I Learned To Love Shrimp' & David Coman-Hidy

Bonus: 'How I Learned To Love Shrimp' & David Coman-Hidy

2023-11-2401:18:47

#73 – Michelle Lavery on the Science of Animal Welfare

#73 – Michelle Lavery on the Science of Animal Welfare

2023-11-2201:27:35

#72 – Richard Bruns on Indoor Air Quality

#72 – Richard Bruns on Indoor Air Quality

2023-11-0401:47:33

#71 – Saloni Dattani on Malaria Vaccines and Missing Data in Global Health

#71 – Saloni Dattani on Malaria Vaccines and Missing Data in Global Health

2023-10-1902:52:57

#70 – Liv Boeree on Healthy vs Unhealthy Competition

#70 – Liv Boeree on Healthy vs Unhealthy Competition

2023-09-2001:40:11

#69 – Jon Y (Asianometry) on Problems And Progress in Semiconductor Manufacturing

#69 – Jon Y (Asianometry) on Problems And Progress in Semiconductor Manufacturing

2023-08-3101:46:50

#68 – Steven Teles on what the Conservative Legal Movement Teaches about Policy Advocacy

#68 – Steven Teles on what the Conservative Legal Movement Teaches about Policy Advocacy

2023-08-0401:39:01

#67 – Guive Assadi on Whether Humanity Will Choose Its Future

#67 – Guive Assadi on Whether Humanity Will Choose Its Future

2023-07-1802:00:07

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

2023-06-2502:32:00

#65 – Katja Grace on Slowing Down AI and Whether the X-Risk Case Holds Up

#65 – Katja Grace on Slowing Down AI and Whether the X-Risk Case Holds Up

2023-06-1001:43:43

00:00

00:00

x

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

#66 – Michael Cohen on Input Tampering in Advanced RL Agents