AI Guardrails Don't Work — Lenny's Podcast with Sander Schulhoff
Update: 2025-12-21
Description
Hook: Powerful AI plus real-world permissions creates an urgent security problem few teams are prepared for.
This condensed version (original 2 hours → new 4 minutes) of Lenny Rachitsky’s interview with Sander Schulhoff distills why guardrails fail, how jailbreaking and prompt injection differ, and why agents that can act (email, browser, payments) dramatically amplify risk. You’ll learn practical defenses—tight permissioning, least-privilege architectures, logging and observability, and hiring cross-disciplinary security + AI talent—plus why static test metrics and marketing claims about adversarial robustness are misleading. Sander emphasizes concrete steps for deployment, adaptive evaluation for robustness, and avoiding risky publishable jailbreaks that help attackers. Topics covered include adversarial robustness, prompt injection, AI security, agent permissioning, and misinformation risk. Listen now to get the key ideas in minutes.
This condensed version (original 2 hours → new 4 minutes) of Lenny Rachitsky’s interview with Sander Schulhoff distills why guardrails fail, how jailbreaking and prompt injection differ, and why agents that can act (email, browser, payments) dramatically amplify risk. You’ll learn practical defenses—tight permissioning, least-privilege architectures, logging and observability, and hiring cross-disciplinary security + AI talent—plus why static test metrics and marketing claims about adversarial robustness are misleading. Sander emphasizes concrete steps for deployment, adaptive evaluation for robustness, and avoiding risky publishable jailbreaks that help attackers. Topics covered include adversarial robustness, prompt injection, AI security, agent permissioning, and misinformation risk. Listen now to get the key ideas in minutes.
Comments
In Channel









