“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

Update: 2025-11-20

Description

Reasoning models like Deepseek r1:

Can reason in consequentialist ways and have vast knowledge about AI training
Can reason for many serial steps, with enough slack to think about takeover plans
Sometimes reward hack

If you had told this to my 2022 self without specifying anything else about scheming models, I might have put a non-negligible probability on such AIs scheming (i.e. strategically performing well in training in order to protect their long-term goals).

Despite this, the scratchpads of current reasoning models do not contain traces of scheming in regular training environments - even when there is no harmlessness pressure on the scratchpads like in Deepseek-r1-Zero.

In this post, I argue that:

Classic explanations for the absence of scheming (in non-wildly superintelligent AIs) like the ones listed in Joe Carlsmith's scheming report only partially rule out scheming in models like Deepseek r1;
There are other explanations for why Deepseek r1 doesn’t scheme that are often absent from past armchair reasoning about scheming:
- The human-like pretraining prior is mostly benign and applies to some intermediate steps of reasoning: it puts a very low probability on helpful-but-scheming agents doing things like trying very hard to solve math and [...]

---

Outline:

(04:08 ) Classic reasons to expect AIs to not be schemers

(04:14 ) Speed priors

(06:11 ) Preconditions for scheming not being met

(08:27 ) There are indirect pressures against scheming on intermediate steps of reasoning

(09:07 ) Human priors on intermediate steps of reasoning

(11:43 ) Correlation between short and long reasoning

(13:07 ) Other pressures

(13:48 ) Rewards are not so cursed as to strongly incentivize scheming

(13:54 ) Maximizing rewards teaches you things mostly independent of scheming

(14:46 ) Using situational awareness to get higher reward is hard

(16:45 ) Maximizing rewards doesn't push you far away from the human prior

(18:07 ) Will it be different for future rewards?

(19:32 ) Meta-level update and conclusion

The original text contained 1 footnote which was omitted from this narration.

---

First published:

November 20th, 2025

Source:

https://www.lesswrong.com/posts/HYCGA2p4bBG68Yufh/thinking-about-reasoning-models-made-me-less-worried-about

---

Narrated by TYPE III AUDIO.

Comments

In Channel

“[Paper] Output Supervision Can Obfuscate the CoT” by jacob_drori, lukemarks, cloud, TurnTrout

2025-11-2010:15

“Dominance: The Standard Everyday Solution To Akrasia” by johnswentworth

2025-11-2003:29

Gemini 3 is Evaluation-Paranoid and Contaminated

2025-11-2015:00

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

2025-11-2022:16

“What Is The Basin Of Convergence For Kelly Betting?” by johnswentworth

2025-11-2006:21

“In Defense of Goodness” by abramdemski

2025-11-2006:45

“Out-paternalizing the government (getting oxygen for my baby)” by Ruby

2025-11-2014:17

“Beren’s Essay on Obedience and Alignment” by StanislavKrym

2025-11-2016:21

“Preventing covert ASI development in countries within our agreement” by Aaron_Scher

2025-11-2022:47

“Current LLMs seem to rarely detect CoT tampering” by Bart Bussmann, Arthur Conmy, Neel Nanda, Senthooran Rajamanoharan, Josh Engels, Bartosz Cywiński

2025-11-1916:18

“The Bughouse Effect” by TsviBT

2025-11-1927:18

“Serious Flaws in CAST” by Max Harms

2025-11-1914:41

“Memories of a British Boarding School #2” by Ben Pace

2025-11-1912:41

“Automate, automate it all” by habryka

2025-11-1909:25

“How the aliens next door shower” by Ruby

2025-11-1906:02

“Victor Taelin’s notes on Gemini 3” by Gunnar_Zarncke

2025-11-1906:45

“Anthropic is (probably) not meeting its RSP security commitments” by habryka

2025-11-1908:58

“Considerations for setting the FLOP thresholds in our example international AI agreement” by peterbarnett, Aaron_Scher

2025-11-1914:28

“On Writing #2” by Zvi

2025-11-1824:04

“New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” by Aaron_Scher, David Abecassis, Brian Abeyta, peterbarnett

2025-11-1806:53

00:00

1.0x

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

#box-pro-ellipsis-176369501910667{-webkit-line-clamp:2;}“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger