Imitative Generalisation (AKA ‘Learning the Prior’)

Update: 2024-06-17

Description

This post tries to explain a simplified version of Paul Christiano’s mechanism introduced here, (referred to there as ‘Learning the Prior’) and explain why a mechanism like this potentially addresses some of the safety problems with naïve approaches. First we’ll go through a simple example in a familiar domain, then explain the problems with the example. Then I’ll discuss the open questions for making Imitative Generalization actually work, and the connection with the Microscope AI idea. A more detailed explanation of exactly what the training objective is (with diagrams), and the correspondence with Bayesian inference, are in the appendix.

Source:

https://www.alignmentforum.org/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

Comments

In Channel

Eliciting Latent Knowledge

2024-06-1701:00:27

Deep Double Descent

2024-06-1708:27

Chinchilla’s Wild Implications

2024-06-1724:57

Intro to Brain-Like-AGI Safety

2024-06-1701:02:10

Gradient Hacking: Definitions and Examples

2024-06-1709:15

An Investigation of Model-Free Planning

2024-06-1708:11

Discovering Latent Knowledge in Language Models Without Supervision

2024-06-1737:09

Toy Models of Superposition

2024-06-1741:43

Imitative Generalisation (AKA ‘Learning the Prior’)

2024-06-1718:14

ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation

2024-06-1716:08

Least-To-Most Prompting Enables Complex Reasoning in Large Language Models

2024-06-1716:08

Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions

2024-06-1716:39

Low-Stakes Alignment

2024-06-1713:56

Empirical Findings Generalize Surprisingly Far

2024-06-1711:32

Worst-Case Thinking in AI Alignment

2024-05-2911:35

How to Get Feedback

2024-05-1207:30

Public by Default: How We Manage Information Visibility at Get on Board

2024-05-1209:50

How to Succeed as an Early-Stage Researcher: The “Lean Startup” Approach

2024-04-2315:16

Become a Person who Actually Does Things

2024-04-1705:14

Working in AI Alignment

2024-04-1401:08:44

00:00

Imitative Generalisation (AKA ‘Learning the Prior’)

#box-pro-ellipsis-176294902162081{-webkit-line-clamp:2;}Imitative Generalisation (AKA ‘Learning the Prior’)

Imitative Generalisation (AKA ‘Learning the Prior’)

BlueDot Impact

Imitative Generalisation (AKA ‘Learning the Prior’)