1 of 45

Brief history & Basic theories

Exploring neural networks from outside to inside

Congcong Yuan

10/24/2019

2 of 45

Outline

  1. Deep learning brief history

  • Basic units of neural networks

  • popular architectures of neural networks

3 of 45

01

AI & Machine Learning & Deep learning

Brief history

Venn diagram shows inclusion relation among AI, Machine learning and deep learning (Goodfellow et., 2017)

Flowchart shows how different parts of an AI system

(Goodfellow et., 2017)

4 of 45

01

Machine Learning & Deep learning

Brief history

Matlab.mathworks.com

One Example

5 of 45

01

Machine Learning & Deep learning

Brief history

Types of machine learning algorithms (Kong et al., 2018)

6 of 45

01

Deep learning

Brief history

7 of 45

01

Deep learning

Brief history

2011

2012

2013

2014

2015

2016

2017

  • Deep RNN: Speech recognition
  • Dropout: New regularization
  • Google Brain to find Cat Video
  • AlexNet: a deep neural network
  • IBM Watson

beats Humans in Jeopardy

  • Feifei Li launched ImageNet
  • VGG-16/19: a very deep neural network
  • Adam: very efficient optimization method
  • GAN: generative adversarial network
  • Deep residual learning
  • Batch Normalization
  • Convolutional LSTM
  • Unet
  • Fast R-CNN
  • Tensorflow
  • Real-time object detection
  • Alpha Go-zero
  • Wasserstein GANs
  • Cycle-consistent adversarial network

8 of 45

Different Housing materials

02

Basic Units

Neural Network

9 of 45

02

Basic Units

Neural Network

Question : what are basic units?

10 of 45

02

Basic Units

Neural Network

11 of 45

02

Perceptron

Neural Network

Neurons in brain

12 of 45

02

Perceptron

Neural Network

Linear classification

13 of 45

02

Perceptron

Neural Network

Linearly separable

Non-linearly separable (XOR problem)

14 of 45

02

Multi-layered Perceptron

Neural Network

(Goodfellow et al., 2017)

15 of 45

02

Activation function

Neural Network

Sigmoid function

16 of 45

02

Activation function

Neural Network

Tanh function

Only for 2-class classification !

17 of 45

02

Activation function

Neural Network

Softmax function for multi-class classification

18 of 45

02

Activation function

Neural Network

ReLU function

Leaky ReLU function

19 of 45

02

Cost function

Neural Network

Classification problems:

Regression problems:

Mean squared error:

Mean absolute error:

Cross entropy:

20 of 45

02

Cost function

Neural Network

21 of 45

02

Back propagation

Neural Network

Chain rule:

22 of 45

02

Optimization method

Neural Network

23 of 45

02

Optimization method

Neural Network

Stochastic gradient descent (SGD)

Batch/Minibatch SGD

Difficult to choose learning rate!

Slow + unstable!

24 of 45

02

Optimization method

Neural Network

Momentum SGD

Nesterov accelerated gradient (NAG)

Blindly!

Smarter!

Slow down

25 of 45

02

Optimization method

Neural Network

Adagrad

Allows the learning rate to adapt on the parameters,

but learning rate always decreases and decays

Adaptive Moment Estimation (Adam)

Straightforward to implement/computational efficient

Mean of gradients

Uncentered variances of gradients

(similar to momentum)

26 of 45

02

Convolution

Neural Network

(Goodfellow et al., 2017)

Question: What are differences between above networks?

27 of 45

02

Convolution

Neural Network

28 of 45

02

Dilated convolution

Neural Network

29 of 45

02

Pooling/down-sampling

Neural Network

30 of 45

02

Up-sampling

Neural Network

31 of 45

02

Overfitting & regularization

Neural Network

(Sliderplayer)

32 of 45

02

Overfitting & regularization

Neural Network

L1 & L2 parameter regularizations

Decay weights to zero

Decay weights close to zero

Dropout

33 of 45

02

Overfitting & regularization

Neural Network

No regularization

Appropriate regularization

Large regularization

34 of 45

02

Overfitting & regularization

Neural Network

Data Augmentation

Early stopping

35 of 45

03

Architecture

Neural Network

36 of 45

03

Architecture - DNN

Neural Network

37 of 45

03

Architecture - CNN

Neural Network

(Simonyan and Zisserman, 2014)

38 of 45

03

Architecture - ResNet

Neural Network

Skip connect

Advantages:

  1. To avoid problem of vanishing gradients;
  2.  Effectively simplifies the network and speed learning;
  3.  Less vulnerable to perturbations in data.

39 of 45

03

Architecture - autoencoder

Neural Network

40 of 45

03

Architecture – encoder-decoder

Neural Network

Fully convolutional neural network for Semantic Segmentation

41 of 45

03

Architecture – encoder-decoder

Neural Network

Unet (Fully convolutional neural network + Skip connection) for biomedical segmentation

42 of 45

03

Architecture – GAN

Neural Network

43 of 45

44 of 45

(Lecun, NIPS 2016)

Future

45 of 45

(WestWorld S1)