The coming AI security crisis (and what to do about it) | Sander Schulhoff
Digest
This podcast delves into the critical vulnerabilities within AI security, emphasizing that current AI guardrails are largely ineffective and easily bypassed by determined individuals. The discussion highlights the prevalence of prompt injection and jailbreaking attacks, which trick AI systems into performing unintended or harmful actions. Real-world examples, such as attacks on AI agents in customer service and the potential for AI-powered cyber threats, illustrate the escalating risks. The podcast critiques the AI security industry, questioning the effectiveness of automated red teaming and guardrails, and points out that foundational model companies struggle with adversarial robustness. It stresses the need for specialized AI security expertise, traditional cybersecurity best practices, and a cautious approach to deploying AI agents and robotics. The CAMEL framework is presented as a promising security measure for AI agents. Ultimately, the episode warns against a false sense of security from current measures and underscores the growing danger as AI capabilities advance.
Outlines

Introduction to AI Security Concerns and Guest Expertise
The podcast opens by highlighting major issues in AI security, stating that AI guardrails are ineffective and easily bypassed. It introduces Sander Schulhoff, a leading researcher in adversarial robustness, whose expertise lies in making AI systems perform unintended actions and who runs an AI red teaming competition.

Prompt Injection, Jailbreaking, and Real-World Exploits
Current AI systems are vulnerable to prompt injection and jailbreaking, with risks increasing as AI gains autonomy through agents and robots. Examples include ServiceNow Assist AI attacks, a Twitter chatbot spreading hate speech, and a math problem solver exfiltrating API keys.

Escalating Risks with AI Agents and Cyber Attacks
The Vegas Cybertruck explosion is cited as a potential jailbreaking example. The discussion covers AI-powered viruses performing cyber attacks via independent API requests, and the significant harm potential as AI systems evolve into agents and robots, leading to data breaches and financial losses.

The AI Security Industry: Red Teaming vs. Guardrails
The AI security industry offers solutions like automated red teaming to find vulnerabilities and guardrails to prevent malicious outputs. However, the effectiveness of these solutions, particularly guardrails, is questioned, as they are claimed to be easily bypassed.

Understanding Adversarial Robustness and Attack Success Rate (ASR)
Adversarial robustness measures a system's defense against attacks, quantified by Attack Success Rate (ASR). Companies use these metrics to demonstrate the effectiveness of their AI security solutions, with lower ASR indicating higher robustness.

Enterprise Engagement with AI Security Firms
CISOs hire AI security firms for audits and to implement automated red teaming and guardrails. These firms identify vulnerabilities and offer solutions to improve the adversarial robustness of enterprise AI systems.

The Critical Failure of AI Guardrails and Red Teaming Effectiveness
Automated red teaming is highly effective at finding vulnerabilities, indicating AI models are easily tricked. Conversely, AI guardrails are largely ineffective, failing to prevent malicious outputs due to the vast attack surface and inability to deter determined attackers.

Concerns Regarding AI Security Company Practices and Foundational Challenges
Some AI security companies allegedly fabricate statistics, and their products may not work effectively, especially with non-English languages. The expertise of frontier AI labs highlights the limitations of third-party solutions, and adversarial robustness remains a long-standing, difficult problem.

Knowledge Gaps and Flawed Defenses in AI Security
Many AI security issues stem from a misunderstanding of AI's differences from classical cybersecurity. Prompt-based defenses are considered the worst type, easily bypassed and offering minimal protection.

Practical Recommendations and the "Angry God" Analogy
For simple chatbots, security concerns are minimal. For complex systems, focus on traditional cybersecurity. The "angry god" analogy highlights the need for containment. Adding security layers helps, but over-reliance on guardrails is dangerous. Monitoring logs is crucial.

The Future of Security: AI, Cybersecurity, and Agentic Systems
The future of security lies at the intersection of AI and traditional cybersecurity. Professionals need to understand AI's unique vulnerabilities and apply classical security principles. Agentic AI systems pose risks, but frameworks like CAMEL offer promising approaches by restricting agent actions.

Importance of Education, Foundational Models, and Promising Research
Educating teams about AI security risks and having AI security experts is vital. Foundational model companies must prioritize security, with adaptive evaluations and adversarial training being key areas for improvement. Early adversarial training and new architectures show promise but need validation.

Companies in AI Security and Market Predictions
Companies like Anthropic are making progress in AI safety. Trustable focuses on AI governance, and Rappello offers tools for discovering AI deployments. A market correction is predicted for guardrail companies, with increased real-world harms from AI agents expected.

Advice for Researchers and Final Takeaways
Researchers are advised against focusing solely on offensive adversarial security research. Human-in-the-loop systems are useful now but may not align with future AI capabilities. Guardrails are fundamentally ineffective, and as AI agents become more capable, the potential for real-world harm increases.

The Crucial Role of AI Expertise and Connecting with Sandra
The discussion emphasizes the need for AI researchers on security teams to understand AI intricacies, combining AI and classical security knowledge. Sandra shares her contact information (@sandarshuloff) and recommends hackai.co for AI security courses and risk evaluation.
Keywords
AI Guardrails
Systems designed to prevent AI models from generating harmful outputs, but argued to be largely ineffective and easily bypassed.
Prompt Injection
An attack where malicious input manipulates an AI system into performing unintended actions, overriding its original instructions.
Jailbreaking
Tricking an AI model into bypassing safety restrictions to generate harmful, unethical, or forbidden content.
Adversarial Robustness
An AI system's ability to maintain performance and security against adversarial attacks, crucial for safe and reliable operation.
Red Teaming
Simulating attacks on AI systems to identify vulnerabilities, testing defenses against prompt injection and jailbreaking.
AI Agents
AI systems that perform tasks autonomously, posing increased risks due to their ability to take actions and access external tools.
CAMEL Framework
A security approach for AI agents that restricts actions based on user prompts, limiting potential malicious behavior through secure permissioning.
AI Security Expertise
Specialized knowledge in understanding and mitigating AI-specific security risks like prompt injection and developing effective defenses.
AI System Deployment
The process of implementing AI systems, requiring careful consideration of security risks, ethical implications, and defense feasibility.
Classical Cybersecurity
Traditional security practices that need to be integrated with AI security knowledge for comprehensive risk mitigation.
Q&A
What are AI guardrails and why are they considered ineffective?
AI guardrails are systems designed to filter AI inputs and outputs to prevent harmful content. However, they are largely ineffective because the number of potential attacks against AI models is virtually infinite. Determined attackers can easily find ways to bypass these guardrails, rendering them useless and potentially creating a false sense of security.
What is the difference between prompt injection and jailbreaking in AI security?
Jailbreaking involves directly tricking an AI model into bypassing its safety features. Prompt injection, on the other hand, occurs when a malicious user manipulates an AI application or agent to ignore its developer's instructions and execute harmful commands, often by exploiting the system's underlying prompts.
Why is adversarial robustness a critical concept in AI security?
Adversarial robustness refers to an AI system's ability to withstand attacks designed to deceive or manipulate it. It's crucial because AI models, especially LLMs, are vulnerable to various adversarial attacks like prompt injection and jailbreaking. Measuring and improving adversarial robustness is key to ensuring AI systems operate safely and reliably.
What are the main risks associated with AI agents and robotics?
AI agents and robots, due to their ability to take actions in the real world, pose significant risks. If compromised through prompt injection or jailbreaking, they could leak sensitive data, cause financial damage, or even inflict physical harm. The increasing autonomy and capabilities of these systems amplify these security concerns.
What is the CAMEL framework and how does it improve AI agent security?
The CAMEL framework is a security approach for AI agents that restricts their actions based on the user's prompt. By granting only the necessary permissions for a specific task, it limits the agent's ability to perform malicious actions, even if tricked by an attacker. This focuses on secure permissioning rather than just filtering prompts.
What advice is given regarding the current AI security industry, particularly guardrail companies?
The podcast suggests that the AI security industry, especially companies selling guardrails and automated red teaming, may face a market correction. Many guardrails are ineffective, and open-source alternatives often perform better. The focus should be on understanding AI's unique security challenges rather than relying on easily bypassed solutions.
Why is specialized AI expertise crucial for security teams?
Specialized AI expertise is vital because AI systems have unique vulnerabilities, like prompt injection, that traditional security knowledge might not cover. AI researchers can better understand and address these specific risks.
What is the primary advice given regarding AI system deployment?
The key advice is to thoroughly evaluate AI systems before deployment. Consider if they are vulnerable to prompt injection, if defenses are feasible, and if the system should be deployed at all, prioritizing security over immediate implementation.
Where can people find Sandra and learn more about AI security?
Sandra can be found on Twitter at @sandarshuloff. For those interested in learning more and checking out a course, hackai.co is recommended, with a team available to answer questions and provide training.
Show Notes
Sander Schulhoff is an AI researcher specializing in AI security, prompt injection, and red teaming. He wrote the first comprehensive guide on prompt engineering and ran the first-ever prompt injection competition, working with top AI labs and companies. His dataset is now used by Fortune 500 companies to benchmark their AI systems security, he’s spent more time than anyone alive studying how attackers break AI systems, and what he’s found isn’t reassuring: the guardrails companies are buying don’t actually work, and we’ve been lucky we haven’t seen more harm so far, only because AI agents aren’t capable enough yet to do real damage.
We discuss:
1. The difference between jailbreaking and prompt injection attacks on AI systems
2. Why AI guardrails don’t work
3. Why we haven’t seen major AI security incidents yet (but soon will)
4. Why AI browser agents are vulnerable to hidden attacks embedded in webpages
5. The practical steps organizations should take instead of buying ineffective security tools
6. Why solving this requires merging classical cybersecurity expertise with AI knowledge
—
Brought to you by:
Datadog—Now home to Eppo, the leading experimentation and feature flagging platform: https://www.datadoghq.com/lenny
Metronome—Monetization infrastructure for modern software companies: https://metronome.com/
GoFundMe Giving Funds—Make year-end giving easy: http://gofundme.com/lenny
—
Transcript: https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis
—
My biggest takeaways (for paid newsletter subscribers): https://www.lennysnewsletter.com/i/181089452/my-biggest-takeaways-from-this-conversation
—
Where to find Sander Schulhoff:
• X: https://x.com/sanderschulhoff
• LinkedIn: https://www.linkedin.com/in/sander-schulhoff
• Website: https://sanderschulhoff.com
• AI Red Teaming and AI Security Masterclass on Maven: https://bit.ly/44lLSbC
—
Where to find Lenny:
• Newsletter: https://www.lennysnewsletter.com
• X: https://twitter.com/lennysan
• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/
—
In this episode, we cover:
(00:00 ) Introduction to Sander Schulhoff and AI security
(05:14 ) Understanding AI vulnerabilities
(11:42 ) Real-world examples of AI security breaches
(17:55 ) The impact of intelligent agents
(19:44 ) The rise of AI security solutions
(21:09 ) Red teaming and guardrails
(23:44 ) Adversarial robustness
(27:52 ) Why guardrails fail
(38:22 ) The lack of resources addressing this problem
(44:44 ) Practical advice for addressing AI security
(55:49 ) Why you shouldn’t spend your time on guardrails
(59:06 ) Prompt injection and agentic systems
(01:09:15 ) Education and awareness in AI security
(01:11:47 ) Challenges and future directions in AI security
(01:17:52 ) Companies that are doing this well
(01:21:57 ) Final thoughts and recommendations
—
Referenced:
• AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff (Learn Prompting, HackAPrompt): https://www.lennysnewsletter.com/p/ai-prompt-engineering-in-2025-sander-schulhoff
• The AI Security Industry is Bullshit: https://sanderschulhoff.substack.com/p/the-ai-security-industry-is-bullshit
• The Prompt Report: Insights from the Most Comprehensive Study of Prompting Ever Done: https://learnprompting.org/blog/the_prompt_report?srsltid=AfmBOoo7CRNNCtavzhyLbCMxc0LDmkSUakJ4P8XBaITbE6GXL1i2SvA0
• OpenAI: https://openai.com
• Scale: https://scale.com
• Hugging Face: https://huggingface.co
• Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition: https://www.semanticscholar.org/paper/Ignore-This-Title-and-HackAPrompt%3A-Exposing-of-LLMs-Schulhoff-Pinto/f3de6ea08e2464190673c0ec8f78e5ec1cd08642
• Simon Willison’s Weblog: https://simonwillison.net
• ServiceNow: https://www.servicenow.com
• ServiceNow AI Agents Can Be Tricked Into Acting Against Each Other via Second-Order Prompts: https://thehackernews.com/2025/11/servicenow-ai-agents-can-be-tricked.html
• Alex Komoroske on X: https://x.com/komorama
• Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack: https://arstechnica.com/information-technology/2022/09/twitter-pranksters-derail-gpt-3-bot-with-newly-discovered-prompt-injection-hack
• MathGPT: https://math-gpt.org
• 2025 Las Vegas Cybertruck explosion: https://en.wikipedia.org/wiki/2025_Las_Vegas_Cybertruck_explosion
• Disrupting the first reported AI-orchestrated cyber espionage campaign: https://www.anthropic.com/news/disrupting-AI-espionage
• Thinking like a gardener not a builder, organizing teams like slime mold, the adjacent possible, and other unconventional product advice | Alex Komoroske (Stripe, Google): https://www.lennysnewsletter.com/p/unconventional-product-advice-alex-komoroske
• Prompt Optimization and Evaluation for LLM Automated Red Teaming: https://arxiv.org/abs/2507.22133
• MATS Research: https://substack.com/@matsresearch
• CBRN: https://en.wikipedia.org/wiki/CBRN_defense
• CaMeL offers a promising new direction for mitigating prompt injection attacks: https://simonwillison.net/2025/Apr/11/camel
• Trustible: https://trustible.ai
• Repello: https://repello.ai
• Do not write that jailbreak paper: https://javirando.com/blog/2024/jailbreaks
—
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.
—
Lenny may be an investor in the companies discussed.
To hear more, visit www.lennysnewsletter.com
























