Brief history & Basic theories
Exploring neural networks from outside to inside
Congcong Yuan
10/24/2019
Outline
01
AI & Machine Learning & Deep learning
Brief history
Venn diagram shows inclusion relation among AI, Machine learning and deep learning (Goodfellow et., 2017)
Flowchart shows how different parts of an AI system
(Goodfellow et., 2017)
01
Machine Learning & Deep learning
Brief history
Matlab.mathworks.com
One Example
01
Machine Learning & Deep learning
Brief history
Types of machine learning algorithms (Kong et al., 2018)
01
Deep learning
Brief history
01
Deep learning
Brief history
2011
2012
2013
2014
2015
2016
2017
beats Humans in Jeopardy
Different Housing materials
02
Basic Units
Neural Network
02
Basic Units
Neural Network
Question : what are basic units?
02
Basic Units
Neural Network
02
Perceptron
Neural Network
Neurons in brain
02
Perceptron
Neural Network
Linear classification
02
Perceptron
Neural Network
Linearly separable
Non-linearly separable (XOR problem)
02
Multi-layered Perceptron
Neural Network
(Goodfellow et al., 2017)
02
Activation function
Neural Network
Sigmoid function
02
Activation function
Neural Network
Tanh function
Only for 2-class classification !
02
Activation function
Neural Network
Softmax function for multi-class classification
02
Activation function
Neural Network
ReLU function
Leaky ReLU function
02
Cost function
Neural Network
Classification problems:
Regression problems:
Mean squared error:
Mean absolute error:
Cross entropy:
02
Cost function
Neural Network
02
Back propagation
Neural Network
Chain rule:
02
Optimization method
Neural Network
02
Optimization method
Neural Network
Stochastic gradient descent (SGD)
Batch/Minibatch SGD
Difficult to choose learning rate!
Slow + unstable!
02
Optimization method
Neural Network
Momentum SGD
Nesterov accelerated gradient (NAG)
Blindly!
Smarter!
Slow down
02
Optimization method
Neural Network
Adagrad
Allows the learning rate to adapt on the parameters,
but learning rate always decreases and decays
Adaptive Moment Estimation (Adam)
Straightforward to implement/computational efficient
Mean of gradients
Uncentered variances of gradients
(similar to momentum)
02
Convolution
Neural Network
(Goodfellow et al., 2017)
Question: What are differences between above networks?
02
Convolution
Neural Network
02
Dilated convolution
Neural Network
02
Pooling/down-sampling
Neural Network
02
Up-sampling
Neural Network
02
Overfitting & regularization
Neural Network
(Sliderplayer)
02
Overfitting & regularization
Neural Network
L1 & L2 parameter regularizations
Decay weights to zero
Decay weights close to zero
Dropout
02
Overfitting & regularization
Neural Network
No regularization
Appropriate regularization
Large regularization
02
Overfitting & regularization
Neural Network
Data Augmentation
Early stopping
03
Architecture
Neural Network
03
Architecture - DNN
Neural Network
03
Architecture - CNN
Neural Network
(Simonyan and Zisserman, 2014)
03
Architecture - ResNet
Neural Network
Skip connect
Advantages:
03
Architecture - autoencoder
Neural Network
03
Architecture – encoder-decoder
Neural Network
Fully convolutional neural network for Semantic Segmentation
03
Architecture – encoder-decoder
Neural Network
Unet (Fully convolutional neural network + Skip connection) for biomedical segmentation
03
Architecture – GAN
Neural Network
(Lecun, NIPS 2016)
Future
(WestWorld S1)