DiscoverRedwood Research Blog“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris
“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris

“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris

Update: 2025-08-09
Share

Description

Subtitle: To wit: LLM APIs, agent scaffolds, code review, and detection-and-response systems.

To prevent potentially misaligned LLM agents from taking actions with catastrophic consequences, you can try to monitor LLM actions - that is, try to detect dangerous or malicious actions, and do something about it when you do (like blocking the action, starting an investigation, …).1

But where in your infrastructure should your LLM monitor be implemented, and what team in your organization should own it? Historically, researchers have mostly focused on implementing monitoring inside the agent scaffold, which would mean that it would be owned by agent product teams. But actually there are also three other important places where you might put it, and I think these are comparable in importance to LLM monitors in agent scaffolds.

This is important for a few reasons:

  1. This might inspire us to focus on developing better monitoring [...]

---

Outline:

(01:55 ) Monitors in agent scaffolds

(04:17 ) Monitors in cyber threat detection-and-response systems

(07:41 ) Monitors in LLM APIs

(10:32 ) Monitors in code reviews

(12:08 ) So where should you put the monitors?

(13:29 ) Appendix: Further subdivisions

---


First published:

August 9th, 2025



Source:

https://blog.redwoodresearch.org/p/four-places-where-you-can-put-llm


---


Narrated by TYPE III AUDIO.


---

Images from the article:


Comparison table showing monitor types across six evaluation criteria.
System architecture diagram showing LLM API connecting to cybersecurity execution workflow.
System diagram showing LLM API, agent scaffold, execution, and threat detection.
System architecture diagram showing LLM API connecting to agent scaffold and executions.
System diagram showing LLM API connecting to agent scaffold and execution.
Simple diagram showing LLM API, Agent scaffold, and Execution components connected sequentially.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Comments 
loading
In Channel
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris

“Four places where you can put LLM monitoring” by Fabien Roger, Buck Shlegeris