DiscoverLessWrong (30+ Karma)“Thinking about reasoning models made me less worried about scheming” by Fabien Roger
“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

Update: 2025-11-20
Share

Description

Reasoning models like Deepseek r1:

  • Can reason in consequentialist ways and have vast knowledge about AI training
  • Can reason for many serial steps, with enough slack to think about takeover plans
  • Sometimes reward hack

If you had told this to my 2022 self without specifying anything else about scheming models, I might have put a non-negligible probability on such AIs scheming (i.e. strategically performing well in training in order to protect their long-term goals).

Despite this, the scratchpads of current reasoning models do not contain traces of scheming in regular training environments - even when there is no harmlessness pressure on the scratchpads like in Deepseek-r1-Zero.

In this post, I argue that:

  • Classic explanations for the absence of scheming (in non-wildly superintelligent AIs) like the ones listed in Joe Carlsmith's scheming report only partially rule out scheming in models like Deepseek r1;
  • There are other explanations for why Deepseek r1 doesn’t scheme that are often absent from past armchair reasoning about scheming:
    • The human-like pretraining prior is mostly benign and applies to some intermediate steps of reasoning: it puts a very low probability on helpful-but-scheming agents doing things like trying very hard to solve math and [...]

---

Outline:

(04:08 ) Classic reasons to expect AIs to not be schemers

(04:14 ) Speed priors

(06:11 ) Preconditions for scheming not being met

(08:27 ) There are indirect pressures against scheming on intermediate steps of reasoning

(09:07 ) Human priors on intermediate steps of reasoning

(11:43 ) Correlation between short and long reasoning

(13:07 ) Other pressures

(13:48 ) Rewards are not so cursed as to strongly incentivize scheming

(13:54 ) Maximizing rewards teaches you things mostly independent of scheming

(14:46 ) Using situational awareness to get higher reward is hard

(16:45 ) Maximizing rewards doesn't push you far away from the human prior

(18:07 ) Will it be different for future rewards?

(19:32 ) Meta-level update and conclusion

The original text contained 1 footnote which was omitted from this narration.

---


First published:

November 20th, 2025



Source:

https://www.lesswrong.com/posts/HYCGA2p4bBG68Yufh/thinking-about-reasoning-models-made-me-less-worried-about


---


Narrated by TYPE III AUDIO.

Comments 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger

“Thinking about reasoning models made me less worried about scheming” by Fabien Roger