Exploring AI, APIs, and the Social Engineering of LLMs
Description
Summary:
Timothy De Block is joined by Keith Hoodlet, Engineering Director at Trail of Bits, for a fascinating, in-depth look at AI red teaming and the security challenges posed by Large Language Models (LLMs). They discuss how prompt injection is effectively a new form of social engineering against machines, exploiting the training data's inherent human biases and logical flaws. Keith breaks down the mechanics of LLM inference, the rise of middleware for AI security, and cutting-edge attacks using everything from emojis and bad grammar to weaponized image scaling. The episode stresses that the fundamental solutions—logging, monitoring, and robust security design—are simply timeless principles being applied to a terrifyingly fast-moving frontier.
Key Takeaways
The Prompt Injection Threat
Social Engineering the AI: Prompt injection works by exploiting the LLM's vast training data, which includes all of human history in digital format, including movies and fiction. Attackers use techniques that mirror social engineering to trick the model into doing something it's not supposed to, such as a customer service chatbot issuing an unauthorized refund.
Business Logic Flaws: Successful prompt injections are often tied to business logic flaws or a lack of proper checks and guardrails, similar to vulnerabilities seen in traditional applications and APIs.
Novel Attack Vectors: Attackers are finding creative ways to bypass guardrails:
Image Scaling: Trail of Bits discovered how to weaponize image scaling to hide prompt injections within images that appear benign to the user, but which pop out as visible text to the model when downscaled for inference.
Invisible Text: Attacks can use white text, zero-width characters (which don't show up when displayed or highlighted), or Unicode character smuggling in emails or prompts to covertly inject instructions.
Syntax & Emojis: Research has shown that bad grammar, run-on sentences, or even a simple sequence of emojis can successfully trigger prompt injections or jailbreaks.
Defense and Design
LLM Security is API Security: Since LLMs rely on APIs for their "tool access" and to perform actions (like sending an email or issuing a refund), security comes down to the same principles used for APIs: proper authorization, access control, and eliminating misconfiguration.
The Middleware Layer: Some companies are using middleware that sits between their application and the Frontier LLMs (like GPT or Claude) to handle system prompting, guard-railing, and filtering prompts, effectively acting as a Web Application Firewall (WAF) for LLM API calls.
Security Design Patterns: To defend against prompt injection, security design patterns are key:
Action-Selector Pattern: Instead of a text field, users click on pre-defined buttons that limit the model to a very specific set of safe actions.
Code-Then-Execute Pattern (CaMeL): The first LLM is used to write code (e.g., Pythonic code) based on the natural language prompt, and a second, quarantined LLM executes that safer code.
Map-Reduce Pattern: The prompt is broken into smaller chunks, processed, and then passed to another model, making it harder for a prompt injection to be maintained across the process.
Timeless Hygiene: The most critical defenses are logging, monitoring, and alerting. You must log prompts and outputs and monitor for abnormal behavior, such as a user suddenly querying a database thousands of times a minute or asking a chatbot to write Python code.
Resources & Links Mentioned
Trail of Bits Research:
Blog: blog.trailofbits.com
Company Site: trailofbits.com
Call Me A Jerk: Persuading AI to Comply with Objectionable Requests
Securing LLM Agents Paper: Design Patterns for Securing LLM Agents against Prompt Injections.
Defending LLM applications against Unicode character smuggling
Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models
LLM Explanation: Three Blue One Brown (3Blue1Brown) has a great short video explaining how Large Language Models work.
Lakera Gandalf: Game for learning how to use prompt injection against AI
Keith Hoodlet's Personal Sites:
Website: securing.dev and thought.dev
Support the Podcast:
Enjoyed this episode? Leave us a review and share it with your network! Subscribe for more insightful discussions on information security and privacy.
Contact Information:
Leave a comment below or reach out via the contact form on the site, email timothy.deblock[@]exploresec[.]com, or reach out on LinkedIn.
Check out our services page and reach out if you see any services that fit your needs.
Social Media Links:
[RSS Feed] [iTunes] [LinkedIn][YouTube]
<form autocomplete="on" class="newsletter-form" method="POST">
Subscribe
Sign up with your email address to receive news and updates.
<label class="newsletter-form-field-label title" for="email-yui_3_17_2_1_1704234756218_68248-field">Email Address</label>
<input autocomplete="email" class="newsletter-form-field-element field-element" id="email-yui_3_17_2_1_1704234756218_68248-field" name="email" type="email" />
<button class="
newsletter-form-button
sqs-system-button
sqs-editable-button-layout
sqs-editable-button-style
sqs-editable-button-shape
sqs-button-element--primary
" type="submit" value="Sign Up">
Sign Up
</button>
We respect your privacy.
Thank you!
</form>