Machine Learning I:
Artificial Neural Networks
Patrick Hall
Visiting Faculty, Department of Decision Sciences
George Washington University
Lecture 5 Agenda
Where are we in the modeling lifecycle?
Data Collection �& ETL
Feature Selection & Engineering
Supervised�Learning
Unsupervised�Learning
Deployment
Cost Intensive
Revenue
Generating
Assessment & Validation
�Brief Introduction
Overview
Historical Development
The name “neural network” originates from its initial conception to model neurotransmission where each unit represents a neuron and the connection represents synapses.
Brief History
Sources: Introduction to Statistical Learning
Neural Network Structure
Sources: Introduction to Statistical Learning
A simple feed-forward neural network using 4 predictors and a single hidden layer (5 hidden units) for modeling a numeric response:
Y
Image Source: https://www.asimovinstitute.org/neural-network-zoo/
Network Structure & Data Structure
Sources: Demystifying Deep Learning, SAS Institute; Explainable Neural Networks based on Additive Index Models
Convolutional neural networks for images:
Specialized, GAM-like, architectures for structured data:
Neural Networks
Neural Network Algorithm Overview
Supervised: MLP
Unsupervised: Autoencoder
Sources: Introduction to Statistical Learning & Elements of Statistical Learning
Neural Network Algorithm Overview
Sources: Introduction to Data Mining
Hidden Layers: Activations
Sources: Introduction to Statistical Learning
Image Source: https://www.spiedigitallibrary.org/
Activation Function
Piecewise-linear ReLU function is popular for its efficiency and computability; above graph has been scaled for ease of comparison.
Sources: Introduction to Statistical Learning & Demystifying Deep Learning, SAS Institute
Training Neural Networks
x1
x2
x3
h1
h2
y
This neural net has one output unit for an interval target – but neural nets can have an arbitrary number of targets
Hyperbolic tangent activation function
One hidden layer with two hidden units
Multilayer Perceptron
Sources: Introduction to Statistical Learning
Training Neural Network
Sources: Introduction to Statistical Learning
Gradient Descent & Back Propagation
Illustration of gradient descent:
The objective function is not convex - has two minimas. Start with some value (typically chosen randomly), find each step that moves against the gradient until it cannot go down any further. Here, gradient descent reaches the global minimum in 7 steps.
Sources: Introduction to Statistical Learning
Image Source: https://en.wikipedia.org/wiki/Stochastic_gradient_descent
Stochastic Gradient Descent (SGD)
Iterations
Error Function
Sources: Introduction to Statistical Learning; Wikipedia
Regularization: Ridge (L2)
Sample from MNIST data:
Iteration plots with regularization:
Regularization: Dropout
Dropout Learning - relatively new and efficient form of regularization:
Fully connected mini-batch
Mini-batch with dropout
Sources: Introduction to Statistical Learning & Random Search for Hyper-Parameter Optimization
Hyperparameter Tuning
Sources: Demystifying Deep Learning, SAS Institute
Unsupervised: Autoencoder
y | x1 | x2 | x3 |
1 | 2.54 | 1.65 | 0.02 |
0 | 1.14 | 0.70 | 0.82 |
1 | 0.99 | 0.51 | 2.11 |
⁞ | ⁞ | ⁞ | ⁞ |
Target vector
Input vectors
Supervised training - predict target y from input vectors X
Unsupervised training - predict X from X
Sources: Demystifying Deep Learning, SAS Institute
Unsupervised Training
Is it trivial to learn X from X?
Why is unsupervised training of neural networks useful?
MNIST Data on Autoencoder
Hidden Unit 1
Hidden Unit 2
Issues of Neural Network Overview
Requires grid search:
Sources: Demystifying Deep Learning, SAS Institute
Demystifying Deep Learning
VERY SIMPLY PUT - a NEURAL NETWORK with more than one hidden layer for a supervised or unsupervised learning task
�Sonar Case Study
Using Python
Source: Machine Learning Algorithms from Scratch
Stochastic Gradient Descent & k-fold Cross Validation Approach
Reading