8 lessons learned building threat detection systems as an MLE
Jeremy Jordan
Machine Learning Engineer
Preface
Today’s talk:
Many thanks to: Brian Jones, Zachary Abzug, Jeremy Hedges, Dan Salo, Cameron Schmauch, Mike Ciavarella, Konstantin Klinger, Sherrod DeGrippo, Dan Rapp, Wes Drone, Kirk Soluk, Tim Hopper, Bronwyn Woods, Brian Lindauer, Rich Harang, Mike Moran, Becca Lynch, Jeremie Vallee, Josh Terry, Joe Duggan …and many more wonderful people I’ve had the opportunity to work with who have influenced how I think about the intersection of cybersecurity and machine learning.
Disclaimer
The views and opinions expressed in this talk are my own and do not necessarily reflect the views and opinions of my current or previous employers.
Cybersecurity products in a nutshell
Implement secure protocols
Detect and block threats
+
Cybersecurity products in a nutshell – today’s focus
Implement secure protocols
Detect and block threats
+
How do you build a threat detection system?
How do you build a threat detection system?
We require a system which:
Lesson 0: Understand the threats you’re trying to detect
Lesson 0: Understand the threats you’re trying to detect
All successful machine learning projects start with a well-defined task.
Lesson 1: Start with rules
Lesson 1: Start with rules
From Martin Zinkevich’s “Rules of ML”
Lesson 2: Annotate your data
The sometimes painful, but always necessary part of building a threat detection system.
Lesson 2: Annotate your data
Labeling workflows can be complex as we strive to efficiently label a lot of data.
A big challenge in labeling cybersecurity data is having sufficient context.
Lesson 2: Annotate your data
Is this URL a phish? 🐟
https://interbank.pe/solicitar/tarjeta/extracash/inicio
Lesson 2: Annotate your data
Is this login attempt legit?
Lesson 2: Annotate your data
What context is required in order to confidently annotate the data? What information do domain experts use? Let’s make sure we’re collecting that!
Lesson 3: Scale your detections with ML
Lesson 3: Scale your detections with ML
Machine learning complements rules-based detection engines, each have unique strengths
rules
machine learning
Lesson 4: Pay attention when detection systems disagree
Lesson 5: Build cascading detection systems
Lesson 5: Build cascading detection systems
⚖️
efficacy
cost
There’s a need to balance our detection accuracy with the associated cost of detection.
Lesson 6: Focus on the threats (...not the anomalies)
weird ≠ malicious
Lesson 6: Focus on the threats (...not the anomalies)
Archive of a bank’s website…
Is this weird? ✅
Is it a threat? ❌
Lesson 7: Mitigate detection errors with design
Lesson 7: Mitigate detection errors with design
Even when we can’t outright block content, we can redirect select users to more secure environments.
Lesson 7: Mitigate detection errors with design
We can make intentional choices about when to increase friction to separate the user experiences between benign and malicious users.
Lesson 8: Understand you’re playing an infinite game
Lesson 8: Understand you’re playing an infinite game
Lesson 8: Understand you’re playing an infinite game
Cybercrime is big business. As we improve our detections against existing threats, there will always be new attack vectors that pop up.
Thanks for listening! (now go enable 2FA)