JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 152

Machine Learning

Prof. Seungtaek Choi

2 of 152

Last Time

Logistic Regression
Classification (SVM)

3 of 152

Today

Neural Networks
Backpropagation
Announcement: 2^nd Assignment!

Practice with PyTorch

4 of 152

Neural Networks

5 of 152

Neural Networks

Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s and early 90s; popularity diminished in late 90s.
Recent resurgence: State-of-the-art technique for many applications.

6 of 152

The brain adapts its function to the input it receives.

7 of 152

8 of 152

The brain flexibly adapts to incoming sensory channels and can even learn entirely new senses. – A general-purpose, multimodal learning machine.

https://www.youtube.com/watch?v=48evjcN73rw&t=70s

https://www.youtube.com/watch?v=WHYCs8xtzUI

9 of 152

Neuron = input integration 🡪 threshold 🡪 output.

10 of 152

Neurons form networks: from sensory input to motor output.

11 of 152

12 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

13 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

14 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

15 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

16 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

17 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

18 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

19 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

20 of 152

Perceptron: Binary Linear Classifier

Given weights

https://iailab.kaist.ac.kr/teaching/machine-learning

21 of 152

Perceptron: Geometric Interpretation

Given weights

https://iailab.kaist.ac.kr/teaching/machine-learning

22 of 152

Perceptron for New Data

Forward propagation with new data

https://iailab.kaist.ac.kr/teaching/machine-learning

23 of 152

Binary Linear Classifier in 2D

https://iailab.kaist.ac.kr/teaching/machine-learning

24 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

25 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

26 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

27 of 152

Idea: Nonlinear Curve Approximated by Multiple Lines

Nonlinear regression

Nonlinear classification

https://iailab.kaist.ac.kr/teaching/machine-learning

28 of 152

AND Problem


0	0	0
0	1	0
1	0	0
1	1	1


0	0	-30	0
0	1	-10	0
1	0	-10	0
1	1	10	1

-30

29 of 152

OR Problem


0	0	0
0	1	1
1	0	1
1	1	1


0	0	-10	0
0	1	10	1
1	0	10	1
1	1	30	1

-10

30 of 152

How about XOR Problem?

Misky-Papert controversy on XOR
For not linearly separable

Single neuron = one linear classification boundary

A perceptron cannot solve due to its linear nature


0	0	0
0	1	1
1	0	1
1	1	0

31 of 152

XOR Problem

At least two lines are required

https://iailab.kaist.ac.kr/teaching/machine-learning

32 of 152

XOR Problem

At least two lines are required
If two perceptrons are stacked, it represents two hyperplanes.

https://iailab.kaist.ac.kr/teaching/machine-learning

33 of 152

XOR Problem

At least two lines are required
If two perceptrons are stacked, it represents two hyperplanes.

https://iailab.kaist.ac.kr/teaching/machine-learning

34 of 152

Multiple Perceptrons

Multi neurons = multiple linear classification boundaries

https://iailab.kaist.ac.kr/teaching/machine-learning

35 of 152

Multiple Perceptrons

Multi neurons = multiple linear classification boundaries

https://iailab.kaist.ac.kr/teaching/machine-learning

36 of 152

Multiple Perceptrons

Sigmoid function for nonlinear activation function

https://iailab.kaist.ac.kr/teaching/machine-learning

37 of 152

Multiple Perceptrons

In a compact representation

https://iailab.kaist.ac.kr/teaching/machine-learning

38 of 152

Multiple Perceptrons

In a compact representation

https://iailab.kaist.ac.kr/teaching/machine-learning

39 of 152

Multiple Perceptrons

In a compact representation

First layer

with neurons

https://iailab.kaist.ac.kr/teaching/machine-learning

40 of 152

Multiple Perceptrons

With one more layer…

First layer

Second layer

41 of 152

42 of 152

Another Interpretation

XOR can be represented with only AND, OR, NOT.
A XOR B = (A OR B) AND NOT (A AND B)

Combination of simple operations

43 of 152

Another Interpretation

A XOR B = (A OR B) AND NOT (A AND B)

-10

-30

44 of 152

Another Interpretation

A XOR B = (A OR B) AND NOT (A AND B)� = z_1 AND NOT z_2

-10

-30

-20


0	0	0	0	0
0	1	1	0	1
1	0	1	0	1
1	1	1	1	0

45 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

46 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

47 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

48 of 152

49 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

50 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

51 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

Looks a lot like logistic regression

The only difference is, instead of input a feature vector, the features are just values calculated by the hidden layer

52 of 152

https://www.youtube.com/watch?v=alfdI7S6wCY

Feature Learning

Looks a lot like logistic regression

The only difference is, instead of input a feature vector, the features are just values calculated by the hidden layer

53 of 152

Another Perspective:�Hidden Layers as Kernel Learning

https://iailab.kaist.ac.kr/teaching/machine-learning

54 of 152

Nonlinear Classification

https://www.youtube.com/watch?v=3liCbRZPrZA

https://iailab.kaist.ac.kr/teaching/machine-learning

55 of 152

Neuron

We can represent this “neuron” as follows:

https://iailab.kaist.ac.kr/teaching/machine-learning

56 of 152

Second Way of Looking at Multiple Perceptrons

Can represent nonlinear relationship between input and outputs due to nonlinear activation function

https://iailab.kaist.ac.kr/teaching/machine-learning

57 of 152

Common Activation Functions

Source: 6.S191 Intro. to Deep Learning at MIT

Discuss later

https://iailab.kaist.ac.kr/teaching/machine-learning

58 of 152

XOR Problem in Perceptron

The main weakness of linear predictors is their lack of capacity.
For classiﬁcation, the populations have to be linearly separable.

https://iailab.kaist.ac.kr/teaching/machine-learning

59 of 152

Nonlinear Mapping

The XOR example can be solved by pre-processing the data to make the two populations linearly separable.

Source: Dr. Francois Fleuret at EPFL

https://iailab.kaist.ac.kr/teaching/machine-learning

60 of 152

Nonlinear Mapping

Source: Dr. Francois Fleuret at EPFL

https://iailab.kaist.ac.kr/teaching/machine-learning

61 of 152

Nonlinear Mapping

Source: Dr. Francois Fleuret at EPFL

https://iailab.kaist.ac.kr/teaching/machine-learning

62 of 152

Neuron

Suppose that data is not linearly separable

https://iailab.kaist.ac.kr/teaching/machine-learning

63 of 152

Kernel + Neuron

Nonlinear mapping + neuron

User-defined Kernel

https://iailab.kaist.ac.kr/teaching/machine-learning

64 of 152

Neuron + Neuron

Nonlinear mapping can be represented by another layer (or neurons)

Learnable Kernel

Nonlinear activation functions

https://iailab.kaist.ac.kr/teaching/machine-learning

65 of 152

Multi Layer Perceptron (MLP)

Nonlinear mapping can be represented by another layer (or neurons)
We can generalize an MLP

https://iailab.kaist.ac.kr/teaching/machine-learning

66 of 152

Multi Layer Perceptron (MLP) = Artificial Neural Networks

Why do we need multi-layers ?

Nonlinear mapping

https://iailab.kaist.ac.kr/teaching/machine-learning

67 of 152

Multi Layer Perceptron (MLP) = Artificial Neural Networks

Why do we need multi-layers ?

Nonlinear mapping

https://iailab.kaist.ac.kr/teaching/machine-learning

68 of 152

Multi Layer Perceptron (MLP) = Artificial Neural Networks

Why do we need multi-layers ?

…

Nonlinear mappings

Linearly separable

https://iailab.kaist.ac.kr/teaching/machine-learning

69 of 152

Multi Layer Perceptron (MLP) = Artificial Neural Networks

Why do we need multi-layers ?

…

Nonlinear mappings

Multiple Linear classifiers

Linearly separable

https://iailab.kaist.ac.kr/teaching/machine-learning

70 of 152

Multi Layer Perceptron (MLP) = Artificial Neural Networks

Why do we need multi-layers ?

…

Linear classification

Feature Learning

Nonlinear mappings

Linearly separable

https://iailab.kaist.ac.kr/teaching/machine-learning

71 of 152

Two Ways of Looking at Artificial Neural Networks

Still represent lines

Can represent nonlinear relationship between input and outputs due to nonlinear activation function

https://iailab.kaist.ac.kr/teaching/machine-learning

72 of 152

Two Ways of Looking at Artificial Neural Networks

(1)

(2)

Still represent lines

Can represent nonlinear relationship between input and outputs due to nonlinear activation function

https://iailab.kaist.ac.kr/teaching/machine-learning

73 of 152

https://iailab.kaist.ac.kr/teaching/machine-learning

74 of 152

75 of 152

76 of 152

Backpropagation

77 of 152

Training Neural Networks: Optimization

https://iailab.kaist.ac.kr/teaching/machine-learning

78 of 152

Training Neural Networks: Loss Function

Measures error between target values and predictions

Example

Squared loss (for regression):

Cross entropy (for classification):�

https://iailab.kaist.ac.kr/teaching/machine-learning

79 of 152

Training Neural Networks: Gradient Descent

https://iailab.kaist.ac.kr/teaching/machine-learning

80 of 152

81 of 152

Gradients in ANN

https://iailab.kaist.ac.kr/teaching/machine-learning

82 of 152

Gradients in ANN

https://iailab.kaist.ac.kr/teaching/machine-learning

83 of 152

Training Neural Networks: Backpropagation Learning

Forward propagation

the initial information propagates up to the hidden units at each layer and finally produces output

Backpropagation

allows the information from the cost to flow backwards through the network in order to compute the gradients

https://iailab.kaist.ac.kr/teaching/machine-learning

92 of 152

93 of 152

94 of 152

95 of 152

96 of 152

97 of 152

98 of 152

99 of 152

100 of 152

101 of 152

102 of 152

103 of 152

104 of 152

105 of 152

106 of 152

107 of 152

108 of 152

109 of 152

110 of 152

111 of 152

112 of 152

113 of 152

114 of 152

115 of 152

116 of 152

117 of 152

118 of 152

119 of 152

120 of 152

121 of 152

122 of 152

123 of 152

124 of 152

125 of 152

126 of 152

127 of 152

Training Neural Networks with PyTorch

127

https://iailab.kaist.ac.kr/teaching/machine-learning

128 of 152

Activation Function

129 of 152

130 of 152

131 of 152

132 of 152

133 of 152

134 of 152

135 of 152

136 of 152

137 of 152

138 of 152

139 of 152

140 of 152

Artificial Neural Networks with PyTorch

141 of 152

MNIST database

Mixed National Institute of Standards and Technology database
Handwritten digit database
28 x 28 gray scaled image
Flattened matrix into a vector of 28 x 28 = 784

https://en.wikipedia.org/wiki/MNIST_database

https://huggingface.co/datasets/ylecun/mnist

142 of 152

Our Neural Network (Model)

Input layer

(784)

Hidden layer

(100)

Output layer

(10)

Input image

(28 X 28)

Flattened

Digit prediction

in one-hot-encoding

143 of 152

Iterative Optimization

We will use

Mini-batch gradient descent
Adam optimizer

Implementation in Python

150 of 152

Model Ensemble
Regularization (dropout)
Deep Neural Network w/ CNN & RNN