DiscoverAI Paper BitesReasoning Models Don’t Always Say What They Think
Reasoning Models Don’t Always Say What They Think

Reasoning Models Don’t Always Say What They Think

Update: 2025-07-14
Share

Description

In this episode of AI Paper Bites, Francis explores Anthropic’s eye-opening paper, “Reasoning Models Don’t Always Say What They Think.”

We dive deep into the promise and peril of Chain of Thought monitoring, uncovering why outcome-based reinforcement learning might boost accuracy but not transparency.

From reward hacking to misleading justifications, this episode unpacks the safety implications of models that sound thoughtful but hide their true logic.

Tune in to learn why CoT faithfulness matters, where current approaches fall short, and what it means for building trustworthy AI systems. Can we really trust what AI says it’s thinking?

Comments 
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Reasoning Models Don’t Always Say What They Think

Reasoning Models Don’t Always Say What They Think

Francis Brero