“Learnings from AI safety course so far” by boazbarak

Update: 2025-09-27

Description

I have been teaching CS 2881r: AI safety and alignment this semester. While I plan to do a longer recap post once the semester is over, I thought I'd share some of what I've learned so far, and use this opportunity to also get more feedback.

Lectures are recorded and uploaded to a youtube playlist, and @habryka has kindly created a wikitag for this course, so you can view lecture notes here .

Let's start with the good parts

Aspects that are working:

Experiments are working well! I am trying something new this semester - every lecture there is a short presentation by a group of students who are carrying out a small experiment related to this lecture. (For example, in lecture 1 there was an experiment on generalizations of emergent misalignment by @Valerio Pepe ). I was worried that the short time will not allow [...]

---

Outline:

(00:39 ) Aspects that are working:

(02:50 ) Aspects that perhaps could work better:

(04:20 ) Aspects I am unsure of

---

First published:

September 27th, 2025

Source:

https://www.lesswrong.com/posts/2pZWhCndKtLAiWXYv/learnings-from-ai-safety-course-so-far

---

Narrated by TYPE III AUDIO.

Comments

In Channel

“Solving the problem of needing to give a talk” by Kaj_Sotala

2025-09-2817:59

“Transgender Sticker Fallacy” by ymeskhout

2025-09-2814:21

“A non-review of ‘If Anyone Builds It, Everyone Dies’” by boazbarak

2025-09-2806:38

“A Reply to MacAskill on ‘If Anyone Builds It, Everyone Dies’” by Rob Bensinger

2025-09-2831:36

“Learnings from AI safety course so far” by boazbarak

2025-09-2705:22

“Our Beloved Monsters” by Tomás B.

2025-09-2720:41

“Reasons to sell frontier lab equity to donate now rather than later” by Daniel_Eth, Ethan Perez

2025-09-2723:54

“The Illustrated Petrov Day Ceremony” by Raemon

2025-09-2604:40

“We Support ‘If Anyone Builds It, Everyone Dies’” by Liron

2025-09-2602:58

“What Happened After My Rat Group Backed Kamala Harris” by Blake

2025-09-2602:38

“Misalignment and Roleplaying: Are Misaligned LLMs Acting Out Sci-Fi Stories?” by Mark Keavney

2025-09-2624:47

“The real AI deploys itself” by David Scott Krueger (formerly: capybaralet)

2025-09-2506:48

“CFAR update, and New CFAR workshops” by AnnaSalamon

2025-09-2515:32

“Why you should eat meat - even if you hate factory farming” by KatWoods

2025-09-2519:22

“IABIED is on the NYT bestseller list” by Alice Blair

2025-09-2500:42

“EU and Monopoly on Violence” by Martin Sustrik

2025-09-2510:20

“OpenAI Shows Us The Money” by Zvi

2025-09-2417:24

“‘Shut it Down’ vs ‘Controlled Takeoff’” by Raemon

2025-09-2408:47

“More Reactions to If Anyone Builds It, Everyone Dies” by Zvi

2025-09-2436:38

“D&D.Sci: Serial Healers [Evaluation & Ruleset]” by abstractapplic

2025-09-2307:06

00:00

“Learnings from AI safety course so far” by boazbarak

#box-pro-ellipsis-175915305175268{-webkit-line-clamp:2;}“Learnings from AI safety course so far” by boazbarak

“Learnings from AI safety course so far” by boazbarak

“Learnings from AI safety course so far” by boazbarak