1 of 31

Machine Learning

w Security

Nosacz Meetup #128.10.20.19

Mariusz Wołoszyn

2 of 31

To pewie widzieliśćie?

If it’s in Python it can be Machine Learning,

If it’s in PowerPoitn it’s AI

3 of 31

ML is everywhere

Samochody
Helikoptery�Flying Car and Autonomous Flight Engineer https://www.udacity.com/course/flying-car-nanodegree--nd787
Lasery�World's 1st AI Laser Beam https://www.indiegogo.com/projects/world-s-1st-ai-laser-beam#
Pleśniowe sery�Arla's robo 'milk maids' are using AI to churn your cheese�https://www.wired.co.uk/article/arla-dairy-artificial-intelligence �

Hype

4 of 31

Gun Detection

Image recognition

Supervised learning, image classification, object detection

API:�https://valossa.com/image-recognition-demo-is-now-live/

5 of 31

Adversarial attacks

6 of 31

Adversarial training

7 of 31

Fooling humans

8 of 31

Mouse and cat

“Facebook AI Research (FAIR) has developed a state-of-the-art “de-identification” system that works on video, including even live video. It works by altering key facial features of a video subject in real time using machine learning, to trick a facial recognition system into improperly identifying the subject. “

https://www.theverge.com/2019/10/25/20932879/facebook-ai-facial-recognition-live-video-de-identification-deepfakes

9 of 31

Summary

There’s AI hype all over
For each ML use there’s hay to “hack it”
ML capabilities are much like humans�(with its strengths and weaknesses)
For each advance there is countermeasure

10 of 31

Is ML in Security a thing?

Probably, someone make a book about it...

11 of 31

Is ML in Security a thing?

Probably, someone make a book about it...
...and another...

12 of 31

Is ML in Security a thing?

Probably, someone make a book about it...
...and another…
... yet another...

13 of 31

Is ML in Security a thing?

Probably, someone make a book about it...
...and another…
... yet another…
...and another…

14 of 31

Is ML in Security a thing?

Probably, someone make a book about it...
...and another…
... yet another…
...and another…
...and one more...

15 of 31

Is ML in Security a thing?

Probably, someone make a book about it...
...and another…
... yet another…
...and another…
...and one more…
...and more...

16 of 31

Is ML in Security a thing?

Probably, someone make a book about it...
and another…
yet another…
and another…
and one more…
and more…
and more...

OK, I was cheating a bit, but you got the point.

17 of 31

ML in Security

Machine Learning

In Security

Pattern recognition

Anomaly detection

18 of 31

Pattern recognition

Spam detection
Malware detection
Botnet detection
Identity verification
...

19 of 31

Anomaly detection

User authentication?
Behavioral analysis?
Network outlier detection
Malicious URL detection

20 of 31

Caveats

Lot of measures to optimise (not just accuracy, precision and recall).
At 99% precision and 100.000 sessions per day we may interrupt 1010 legit connections. Is it acceptable?
What costs more? Too many false positives or lower recall?
Maintain bayesian viewpoint. Asses your prior and incorporate that into your system.

21 of 31

Choose wisely

Pick right algorithm (people tend to use sledgehammer to crack a nuts)

In security context it’s often better to have slightly less reliable answer but faster,
also it’s easier to implement cheaper, less accurate algorithm at scale rather than best but prohibitively expensive.

Is explainability required? (Usually it’s beneficial if not strictly desired).

22 of 31

Calibrate your models

Using default thresholds is usually bad idea (0.5 for probability scores or 0 for SVM).

Build your models around decision_function not predict_proba in sklearn.

23 of 31

Retrain your models regularly

Patterns do change...
so do customs, preferences, environment.
New technologies and trends do come out.
Limit training data span (6 to 12 months is usually good for behavioral patterns).

24 of 31

Example Malware Detection

Ergo (a command line tool that makes machine learning with Keras easier)
Under the hood it’s classification (a binary one).
We need lot of input features (characteristics of file to be examined).
Throw features at classification algorithm(s)

Logistic regression, SVM, Forest (say XGBoost) or NN.

Asses the results in terms of precision, recall, auroc, execution (inferring/predict) time, resources utilization, explainability.
Build process and pipeline!

https://www.evilsocket.net/2019/05/22/How-to-create-a-Malware-detection-system-with-Machine-Learning/#.XOXPbZSbnyM.twitter

25 of 31

Feature Engineering

Extract file characteristics (https://github.com/lief-project/LIEF)
Bytes histogram
Used APIs
…

26 of 31

Model

97% accuracy around epoch 30�https://www.evilsocket.net/2019/05/22/How-to-create-a-Malware-detection-system-with-Machine-Learning/#.XOXPbZSbnyM.twitter

28 of 31

XGBoost

Explainability.
Speed (parallels nicely on CPU).
No need for GPU (cheaper).
Due to explainability we can actually remove features with little relevance boosting performance even further.
Knowing which features are important we can come up with other features.
Accuracy: 0.987548132219833

�https://github.com/emsi/ergo/tree/master/eda-notebooks

29 of 31

There’s always a way to fool it...

Evading Machine Learning Malware Classifiers

1 of 31

2 of 31

3 of 31

4 of 31

5 of 31

6 of 31

7 of 31

8 of 31

9 of 31

10 of 31

11 of 31

12 of 31

13 of 31

14 of 31

15 of 31

16 of 31

17 of 31

18 of 31

19 of 31

20 of 31

21 of 31

22 of 31

23 of 31

24 of 31

25 of 31

26 of 31

27 of 31

28 of 31

29 of 31

30 of 31

31 of 31