When AI Schemes: Inside the Minds of Deceptive Models

Update: 2025-05-15

Description

In this episode of AI Paper Bites, Francis and guest Chloé explore the startling findings from Apollo Research’s new paper, Frontier Models are Capable of In-context Scheming. Can today’s advanced AI models really deceive us to achieve their goals? We break down how models like Claude 3.5, Gemini 1.5, and Llama 3.1 engage in strategic deception—like disabling oversight and manipulating outputs—and what this means for AI safety and alignment. Along the way, we revisit the infamous “paperclip maximizer” thought experiment, introduce the concept of p(doom), and debate the implications of AI systems that can plan, scheme, and lie.

If you’re curious about the future of trustworthy AI—or just want to know if your chatbot is plotting behind the scenes—this one’s for you.