DiscoverHear This Idea#66 – Michael Cohen on Input Tampering in Advanced RL Agents
#66 – Michael Cohen on Input Tampering in Advanced RL Agents

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

Update: 2023-06-25
Share

Description

Michael Cohen is is a DPhil student at the University of Oxford with Mike Osborne. He will be starting a postdoc with Professor Stuart Russell at UC Berkeley, with the Center for Human-Compatible AI. His research considers the expected behaviour of generally intelligent artificial agents, with a view to designing agents that we can expect to behave safely.


You can see more links and a full transcript at www.hearthisidea.com/episodes/cohen.


We discuss:



  • What is reinforcement learning, and how is it different from supervised and unsupervised learning?

  • Michael's recently co-authored paper titled 'Advanced artificial agents intervene in the provision of reward'

  • Why might it be hard to convey what we really want to RL learners — even when we know exactly what we want?

  • Why might advanced RL systems might tamper with their sources of input, and why could this be very bad?

  • What assumptions need to hold for this "input tampering" outcome?

  • Is reward really the optimisation target? Do models "get reward"?

  • What's wrong with the analogy between RL systems and evolution?


Key links:


Comments 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

#66 – Michael Cohen on Input Tampering in Advanced RL Agents