“Learnings from AI safety course so far” by boazbarak
Description
I have been teaching CS 2881r: AI safety and alignment this semester. While I plan to do a longer recap post once the semester is over, I thought I'd share some of what I've learned so far, and use this opportunity to also get more feedback.
Lectures are recorded and uploaded to a youtube playlist, and @habryka has kindly created a wikitag for this course, so you can view lecture notes here .
Let's start with the good parts
Aspects that are working:
Experiments are working well! I am trying something new this semester - every lecture there is a short presentation by a group of students who are carrying out a small experiment related to this lecture. (For example, in lecture 1 there was an experiment on generalizations of emergent misalignment by @Valerio Pepe ). I was worried that the short time will not allow [...]
---
Outline:
(00:39 ) Aspects that are working:
(02:50 ) Aspects that perhaps could work better:
(04:20 ) Aspects I am unsure of
---
First published:
September 27th, 2025
Source:
https://www.lesswrong.com/posts/2pZWhCndKtLAiWXYv/learnings-from-ai-safety-course-so-far
---
Narrated by TYPE III AUDIO.