Listen Top Shows Blog

Claude Opus 4.5: Model Card, Alignment and Safety

Claude Opus 4.5: Model Card, Alignment and Safety

Update: 2025-11-28

Share

Description

They saved the best for last.

The contrast in model cards is stark. Google provided a brief overview of its tests for Gemini 3 Pro, with a lot of ‘we did this test, and we learned a lot from it, and we are not going to tell you the results.’

Anthropic gives us a 150 page book, including their capability assessments. This makes sense. Capability is directly relevant to safety, and also frontier capability safety tests often also credible indications of capability.

Which still has several instances of ‘we did this test, and we learned a lot from it, and we are not going to tell you the results.’ Damn it. I get it, but damn it.

Anthropic claims Opus 4.5 is the most aligned frontier model to date, although ‘with many subtleties.’

I agree with Anthropic's assessment, especially for practical purposes right now.

Claude is also miles ahead of other models on aspects of alignment that do not directly appear on a frontier safety assessment.

In terms of surviving superintelligence, it's still the scene from The Phantom Menace. As in, that won’t be enough.

(Above: Claude Opus 4.5 self-portrait as [...]

---

Outline:

(01:37 ) Claude Opus 4.5 Basic Facts

(03:12 ) Claude Opus 4.5 Is The Best Model For Many But Not All Use Cases

(05:38 ) Misaligned?

(09:04 ) Section 3: Safeguards and Harmlessness

(11:15 ) Section 4: Honesty

(12:33 ) 5: Agentic Safety

(17:09 ) Section 6: Alignment Overview

(23:45 ) Alignment Investigations

(24:23 ) Sycophancy Course Correction Is Lacking

(25:37 ) Deception

(28:05 ) Ruling Out Encoded Content In Chain Of Thought

(30:16 ) Sandbagging

(31:05 ) Evaluation Awareness

(35:05 ) Reward Hacking

(36:24 ) Subversion Strategy

(37:19 ) 6.13: UK AISI External Testing

(37:31 ) 6.14: Model Welfare

(38:22 ) 7: RSP Evaluations

(40:01 ) CBRN

(47:34 ) Autonomy

(54:50 ) Cyber

(58:29 ) The Whisperers Love The Vibes

---

First published:

November 28th, 2025

Source:

https://www.lesswrong.com/posts/gfby4vqNtLbehqbot/claude-opus-4-5-model-card-alignment-and-safety

---

Narrated by TYPE III AUDIO.

---

Images from the article:

<img alt="Bar graph showing " prompt="Prompt" src="https://substackcdn.co

“Monthly Roundup #37: December 2026” by Zvi

“Monthly Roundup #37: December 2026” by Zvi

2025-12-1241:01

“AI #146: Chipping In” by Zvi

“AI #146: Chipping In” by Zvi

2025-12-1101:25:25

“Childhood and Education #15: Got To Get Out” by Zvi

“Childhood and Education #15: Got To Get Out” by Zvi

2025-12-1048:20

“Selling H200s to China Is Unwise and Unpopular” by Zvi

“Selling H200s to China Is Unwise and Unpopular” by Zvi

2025-12-0924:43

“Little Echo” by Zvi

“Little Echo” by Zvi

2025-12-0804:09

“DeepSeek v3.2 Is Okay And Cheap But Slow” by Zvi

“DeepSeek v3.2 Is Okay And Cheap But Slow” by Zvi

2025-12-0519:30

“AI #145: You’ve Got Soul” by Zvi

“AI #145: You’ve Got Soul” by Zvi

2025-12-0401:54:09

“On Dwarkesh Patel’s Second Interview With Ilya Sutskever” by Zvi

“On Dwarkesh Patel’s Second Interview With Ilya Sutskever” by Zvi

2025-12-0339:06

“Reward Mismatches in RL Cause Emergent Misalignment” by Zvi

“Reward Mismatches in RL Cause Emergent Misalignment” by Zvi

2025-12-0214:17

“Claude Opus 4.5 Is The Best Model Available” by Zvi

“Claude Opus 4.5 Is The Best Model Available” by Zvi

2025-12-0144:49

Claude Opus 4.5: Model Card, Alignment and Safety

Claude Opus 4.5: Model Card, Alignment and Safety

2025-11-2801:00:59

AI #144: Thanks For the Models

AI #144: Thanks For the Models

2025-11-2701:28:32

The Big Nonprofits Post 2025

The Big Nonprofits Post 2025

2025-11-2701:53:43

The Big Nonprofits Post 2025

The Big Nonprofits Post 2025

2025-11-2601:53:43

ChatGPT 5.1 Codex Max

ChatGPT 5.1 Codex Max

2025-11-2515:35

Gemini 3 Pro Is a Vast Intelligence With No Spine

Gemini 3 Pro Is a Vast Intelligence With No Spine

2025-11-2458:42

Gemini 3: Model Card and Safety Framework Report

Gemini 3: Model Card and Safety Framework Report

2025-11-2124:42

“AI #143: Everything, Everywhere, All At Once” by Zvi

“AI #143: Everything, Everywhere, All At Once” by Zvi

2025-11-2002:03:46

“Monthly Roundup #36: November 2025” by Zvi

“Monthly Roundup #36: November 2025” by Zvi

2025-11-1901:10:40

“On Writing #2” by Zvi

“On Writing #2” by Zvi

2025-11-1824:04

00:00

00:00

x

Claude Opus 4.5: Model Card, Alignment and Safety

Claude Opus 4.5: Model Card, Alignment and Safety