Machine Learning
DeapSECURE module 3
https://deapsecure.gitlab.io/deapsecure-lesson03-ml/
Machine Learning at a Glance
Face unlock in Smartphones
Natural language processing
E-mail spam alert
Play Store
recommendations
Autonomous driving
Types of Machine Learning Models
Regression
Classification
Supervised
Learning
Unsupervised
Learning
Semi-supervised
Learning
Reinforcement Learning
Supervised Learning
Regression
Classification
Unsupervised/Semi-supervised Learning
Unsupervised
Learning
Semi-supervised
Learning
Reinforcement Learning
General Process in Machine Learning
Involve data preprocessing
Need to determine the model parameters
What Is a Machine Learning Model?
Illustration of a ML model
ML Model = mathematical function to perform a certain task
Parameters yet to be determined�(example: w(1), w(2), …)
What makes machine learning powerful is that the model contains parameters that can be systematically improved according to a prescribed algorithm.
Phases for Machine Learning Models
Training Phase
Testing Phase
& Deployment
Case Study: Classification of Smartphone Applications
Dataset: Sherlock
Detailed phone information
develop simple machine learning models to predict the name of the smartphone apps
Goal for this workshop:
Data Preparation!
Data Wrangling & Visualization
Data Wrangling
The importance of good data:
Data Wrangling:
Goal: clean, consistent, and processable data� ⇒ input of further analysis such as machine learning
Understand each Feature
Remove Bad Data
Deal with Missing Data
Types of Data — A Data Scientist’s View
Numerical vs. Categorical
Discrete vs. Continuous
Qualitative vs. Quantitative
Issues with Data
Address issues in data:
Visualization
Powerful aid for discovery and comprehension:
Helps you “see” many numbers in a quick glance!
About SherLock “2-apps” dataset
Questions to Explore
Hands-on
Go & Explore the Sherlock Dataset!
Obtaining Hands-On Files
You have a home directory => your own storage
/home/YOUR_MIDAS_ID
Copying Hands-On Files:
/shared/DeapSECURE/install-modules
From Jupyter: Prepend “!”
Machine Learning
Workflow
Practical Tool to Build Machine Learning Models
https://scikit-learn.org/stable/
Data Preprocessing
Remove Irrelevant Features
Dealing with Missing Data
Removing Duplicate Features
Separating Labels from Features
Data Normalization
Label Encoding/
One-Hot Encoding
Machine Learning Steps
Split data to “Train” and “Test” sets
Build model (see next slides)
Train model
Evaluate model
Use the model!
Logistic Regression
Figure: Logistic regression of iris plant species based two features. (Source: scikit-learn)
Decision Tree
each leaf node denotes a class label
sklearn.tree.DecisionTreeClassifier
Logistic Regression or Decision Tree?
Support Vector Machine
Metrics to Assess the Machine Learning Models
Confusion Matrix
Accuracy:
Precision:
Recall:
TP+TN
TP+TN+FP+FN
TP
TP+FP
TP
TP+FN
Precision tells us how many of the correctly predicted cases actually turned out to be positive.
Recall tells us how many of the actual positive cases we were able to predict correctly with our model.
Accuracy tells us how many of the cases predicted correctly.
Metrics to Assess the Machine Learning Models
Bias: how far are the average prediction from the actual values
Variance: how scattered are the predicted values from the actual values
High Bias
Low Bias
Low Variance
High Variance
Hands-On
Build and Train a Machine Learning Model!
Feature Selection
Histograms
Correlation
Simple Group Analysis
Data Visualization
T h a n k Y o u !
The DeapSECURE team
PI: Dr. Hongyi Wu (ECE), Dr. Masha Sosonkina (CMSE),� Dr. Wirawan Purwanto (ITS)
Assessor: Dr. Karina Arcaute
Assistants:
Qiao Zhang, Liuwan Zhu, Jacob Strother, Rosby Asiamah, Yuming He
Funding: NSF OAC grant #1829771
Regression
Linear Regression:
Logistic Regression:
Support Vector Machines
Strengths: