“Rogue internal deployments via external APIs” by Fabien Roger, Buck

Update: 2025-10-16

Description

Once AI companies build powerful AIs, they may:

Give internal AIs access to sensitive internal privileges (e.g. access to the internal infra that touches model weights, help with training of the next generation of AIs, …);
Have mitigations specific to those internal deployments enforced via monitoring of the internal LLM API;
Have an external API deployment without those mitigations.

In this situation, an internal AI may build an agent scaffold that makes calls to the external API and uses the internal sensitive privileges. Such agent scaffold would be in a better position to cause a catastrophe because it would not be subject to the same monitoring as the internal API. I call this a rogue internal deployment via external APIs. (It is “internal” because the model weights and the agent scaffold never leave the cluster.)

I think preventing those is similarly important from a misalignment perspective as preventing [...]

---

Outline:

(01:43 ) Rogue internal deployments via external APIs

(03:29 ) A variation: rogue internal deployments via cross-company APIs

(04:28 ) A possible mitigation: preventing the creation of scaffolds that use external LLM APIs via monitoring

(06:29 ) Why I am more pessimistic about other solutions

(06:34 ) Monitoring the external API

(08:47 ) Preventing access to external APIs

(10:22 ) Monitoring access to sensitive permissions

(10:57 ) Final thoughts

---

First published:

October 15th, 2025

Source:

https://www.lesswrong.com/posts/fqRmcuspZuYBNiQuQ/rogue-internal-deployments-via-external-apis

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments

In Channel

“Meditation is dangerous” by Algon

2025-10-1807:27

“I’m an EA who benefitted from rationality” by juliawise

2025-10-1704:36

“Finding Features in Neural Networks with the Empirical NTK” by jylin04

2025-10-1711:07

“Reducing risk from scheming by studying trained-in scheming behavior” by ryan_greenblatt

2025-10-1720:08

“Rogue internal deployments via external APIs” by Fabien Roger, Buck

2025-10-1611:39

“Cheap Labour Everywhere” by Morpheus

2025-10-1603:39

“[draft amnesty] A New Global Risk: Large Comet’s Impact on Sun Could Cause Fires on Earth” by avturchin

2025-10-1603:40

“How I Became a 5x Engineer with Claude Code” by Gordon Seidoh Worley

2025-10-1511:56

“That Mad Olympiad” by Tomás B.

2025-10-1526:42

“It will cost you nothing to ‘bribe’ a Utilitarian” by Gabriel Alfour

2025-10-1509:15

“The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work” by Elizabeth

2025-10-1514:45

“Recontextualization Mitigates Specification Gaming Without Modifying the Specification” by vgillioz, TurnTrout, cloud, ariana_azarbal

2025-10-1421:17

“The ‘Length’ of ‘Horizons’” by Adam Scholl

2025-10-1414:16

“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger

2025-10-1413:46

“The Mom Test for AI Extinction Scenarios” by Taylor G. Lunt

2025-10-1409:29

“How AI Manipulates—A Case Study” by Adele Lopez

2025-10-1432:49

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

2025-10-1426:02

“Making legible that many experts think we are not on track for a good future, barring some international cooperation” by Mateusz Bagiński, Ishual

2025-10-1324:15

“OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53” by Zvi

2025-10-1345:53

“Sublinear Utility in Population and other Uncommon Utilitarianism” by Alice Blair

2025-10-1312:57

00:00

“Rogue internal deployments via external APIs” by Fabien Roger, Buck

#box-pro-ellipsis-176078988555217{-webkit-line-clamp:2;}“Rogue internal deployments via external APIs” by Fabien Roger, Buck

“Rogue internal deployments via external APIs” by Fabien Roger, Buck

“Rogue internal deployments via external APIs” by Fabien Roger, Buck