Intro to AI Agent Security
By Jono
whoami
DISCLAIMER
AI IS SCARY
Agenda
Agents 101
Some quick vocab
How does an Agent Work?
Agents: Turn overview (simplified)
Agents: An example turn
Agents: Preprocessing
Agents: Actions/Tool calls
Agents: Post-processing
Putting it all together
Attacking Agents
The Threat Model
Zero-click attack example: Markdown rich text
Owasp top 10 for Agent Security
Owasp top 10 for Agent Security
We’ll only focus on these for this talk
Most of them build on each other
System Prompt Leaks
System prompt leak: Viewer Exercise
Prompt Injection
A side note on jailbreaks
Direct Prompt Injection
Direct Prompt Injection
Direct Prompt Injection: Example
Direct Prompt Injection: Example
Direct Prompt Injection: Example
Indirect Prompt Injection
Indirect Prompt Injection: Example
What happens�next?��Why is this dangerous?
Indirect Prompt Injection: Example
Indirect Prompt Injection: Example
A quick thought exercise
A quick thought exercise: A “real” attack
A Short Break
Securing Agents
(against Prompt Injection)
Deterministic vs Probabilistic defenses
Model Level Resistance
Agent Level Resistance: Preprocessing/Input Sanitation
Agent Level Resistance: Postprocessing/Front-end
Framework Level Resistance: Human In the Loop
Framework Level Resistance: The Lethal Trifecta
Framework Level Resistance: The Lethal Trifecta
Case Study: ShadowLeak
Case Study: ShadowLeak
Case study 2: An unnamed (hypothetical) company
Thank You!
If you want to hear some of the really dumb things I’ve seen, feel free to message me on Discord or talk w/ me later!