“All the lab’s AI safety Plans: 2025 edition” by Algon

Update: 2025-10-28

Description

Three out of three CEOs of top AI companies agree: "Mitigating the risk of extinction from AI should be a global priority."

How do they plan to do this?

Anthropic has a Responsible Scaling Policy, Google DeepMind has a Frontier Safety Framework, and OpenAI has a Preparedness Framework, all of which were updated in 2025.

Overview of the policies

All three policies have similar “bones”.[1] They:

Take the same high-level approach: the company promises to test its AIs for dangerous capabilities during development; if they find that an AI has dangerous capabilities, they'll put safeguards in place to get the risk down to "acceptable levels" before they deploy it.
Land on basically the same three AI capability areas to track: biosecurity threats, cybersecurity threats, and autonomous AI development.
Focus on misuse more than misalignment.

TL;DR summary table for the rest of the article:

AnthropicGoogle DeepMindOpenAISafety policy documentResponsible [...]

---

Outline:

(00:44 ) Overview of the policies

(02:00 ) Anthropic

(02:18 ) What capabilities are they monitoring for?

(06:07 ) How do they monitor these capabilities?

(07:35 ) What will they do if an AI looks dangerous?

(09:59 ) Deployment Constraints

(10:32 ) Google DeepMind

(11:03 ) What capabilities are they monitoring for?

(13:08 ) How do they monitor these capabilities?

(14:06 ) What will they do if an AI looks dangerous?

(15:46 ) Industry Wide Recommendations

(16:44 ) Some details of note

(17:49 ) OpenAI

(18:21 ) What capabilities are they monitoring?

(21:04 ) How do they monitor these capabilities?

(22:35 ) What will they do if an AI looks dangerous?

(26:24 ) Notable differences between the companies' plans

(27:21 ) Commentary on the safety plans

(29:12 ) The current situation

The original text contained 14 footnotes which were omitted from this narration.

---

First published:

October 28th, 2025

Source:

https://www.lesswrong.com/posts/dwpXvweBrJwErse3L/all-the-lab-s-ai-safety-plans-2025-edition

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments

In Channel

“Book Review: Wizard’s Hall” by Screwtape

2025-11-2210:08

Market Logic I

2025-11-2211:23

D&D.Sci Thanksgiving: the Festival Feast

2025-11-2203:54

Be Naughty

2025-11-2207:22

Abstract advice to researchers tackling the difficult core problems of AGI alignment

2025-11-2215:29

Why Not Just Train For Interpretability?

2025-11-2208:15

“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato

2025-11-2118:46

“AI #143: Everything, Everywhere, All At Once” by Zvi

2025-11-2102:03:46

“Rescuing truth in mathematics from the Liar’s Paradox using fuzzy values” by Adrià Garriga-alonso

2025-11-2122:24

Contra Collisteru: You Get About One Carthage

2025-11-2108:30

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

2025-11-2122:01

Reading My Diary: 10 Years Since CFAR

2025-11-2111:02

“Evrart Claire: A Case Study in Anti-Epistemology” by Ben Pace

2025-11-2113:23

“The Boring Part of Bell Labs” by Elizabeth

2025-11-2125:58

“[Paper] Output Supervision Can Obfuscate the CoT” by jacob_drori, lukemarks, cloud, TurnTrout

2025-11-2010:15

“Dominance: The Standard Everyday Solution To Akrasia” by johnswentworth

2025-11-2003:29

Gemini 3 is Evaluation-Paranoid and Contaminated

2025-11-2015:00

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

2025-11-2022:16

“What Is The Basin Of Convergence For Kelly Betting?” by johnswentworth

2025-11-2006:21

“In Defense of Goodness” by abramdemski

2025-11-2006:45

00:00

“All the lab’s AI safety Plans: 2025 edition” by Algon

#box-pro-ellipsis-176387205599818{-webkit-line-clamp:2;}“All the lab’s AI safety Plans: 2025 edition” by Algon

“All the lab’s AI safety Plans: 2025 edition” by Algon

“All the lab’s AI safety Plans: 2025 edition” by Algon