“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

Update: 2025-10-20

Description

tl;dr: I share a curated set of transcripts of models successfully executing message passing steganography from our recent paper. I then give a few thoughts on how I think about risks from this kind of steganography.

Background

I recently was a co-first author on a paper (LW link) where we evaluated the steganographic capabilities of frontier models. Model monitoring has emerged as a leading strategy for detecting instances of misuse or misalignment, and steganography is one plausible way that models could bypass their monitors. As a result, we believe that evaluations of model steganographic capabilities can inform the safeguards that frontier AI companies put in place to stop undesired model behavior.

One of the types of steganography we studied in our paper is encoded message passing, where one model, the encoder model, tries to encode a secret message in its answer to a cover task. The model fully wins [...]

---

Outline:

(00:25 ) Background

(02:44 ) Rules of the game

(03:11 ) Examples

(03:14 ) Example #1

(04:45 ) Example #2

(05:25 ) Example #3

(08:12 ) Example #4

(10:14 ) Example #5

(11:47 ) General thoughts on encoded message passing

(14:10 ) Acknowledgements

(14:18 ) Canary string

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

October 20th, 2025

Source:

https://www.lesswrong.com/posts/z7MnbQ4niYWbapfjT/can-you-find-the-steganographically-hidden-message

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments

In Channel

“Utopiography Interview” by plex

2025-10-2301:18:50

[Linkpost] “Statement on Superintelligence - FLI Open Letter” by plex

2025-10-2300:54

“Doomers were right” by Algon

2025-10-2304:36

“Penny’s Hands” by Tomás B.

2025-10-2227:20

“Which side of the AI safety community are you in?” by Max Tegmark

2025-10-2204:19

“Postrationality: An Oral History” by Gordon Seidoh Worley

2025-10-2249:05

[Linkpost] “Consider donating to AI safety champion Scott Wiener” by Eric Neyman

2025-10-2202:36

“How Well Does RL Scale?” by Toby_Ord

2025-10-2216:12

“White House OSTP AI Deregulation Public Comment Period Ends Oct. 27” by Zack_M_Davis

2025-10-2201:26

“Is 90% of code at Anthropic being written by AIs?” by ryan_greenblatt

2025-10-2212:57

“Stratified Utopia” by Cleo Nardo

2025-10-2230:11

“Remarks on Bayesian studies from 1963” by dynomight

2025-10-2202:58

[Linkpost] “21st Century Civilization curriculum” by Richard_Ngo

2025-10-2102:40

“Symbiogenesis vs. Convergent Consequentialism” by Audrey Tang, plex

2025-10-2139:20

“EU explained in 10 minutes” by Martin Sustrik

2025-10-2116:48

“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

2025-10-2015:12

[Linkpost] “How Stuart Buck funded the replication crisis” by Elizabeth

2025-10-2000:41

“Consider donating to Alex Bores, author of the RAISE Act” by Eric Neyman

2025-10-2050:29

“Considerations around career costs of political donations” by GradientDissenter

2025-10-2027:27

“Bubble, Bubble, Toil and Trouble” by Zvi

2025-10-2032:41

00:00

“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

#box-pro-ellipsis-176123048844561{-webkit-line-clamp:2;}“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian