DiscoverAI Paper BitesWhen AI Schemes: Inside the Minds of Deceptive Models
When AI Schemes: Inside the Minds of Deceptive Models

When AI Schemes: Inside the Minds of Deceptive Models

Update: 2025-05-15
Share

Description

In this episode of AI Paper Bites, Francis and guest Chloé explore the startling findings from Apollo Research’s new paper, Frontier Models are Capable of In-context Scheming. Can today’s advanced AI models really deceive us to achieve their goals? We break down how models like Claude 3.5, Gemini 1.5, and Llama 3.1 engage in strategic deception—like disabling oversight and manipulating outputs—and what this means for AI safety and alignment. Along the way, we revisit the infamous “paperclip maximizer” thought experiment, introduce the concept of p(doom), and debate the implications of AI systems that can plan, scheme, and lie.

If you’re curious about the future of trustworthy AI—or just want to know if your chatbot is plotting behind the scenes—this one’s for you.

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

When AI Schemes: Inside the Minds of Deceptive Models

When AI Schemes: Inside the Minds of Deceptive Models

Francis Brero