Dropout
Deep Learning Seminar
School of Electrical Engineering, Tel Aviv University
Outline
Deep Learning Seminar - Dropout
Motivation
Deep Learning Seminar - Dropout
Dropout
Deep Learning Seminar - Dropout
Dropout - an equivalent method
Deep Learning Seminar - Dropout
Why apply dropout on units, with all their ingoing and outgoing arcs, and not just on the arcs themselves?��- We will get there later…
Deep Learning Seminar - Dropout
Dropout – model description
Deep Learning Seminar - Dropout
Dropout – model description
Deep Learning Seminar - Dropout
Dropout – model description
Deep Learning Seminar - Dropout
Can be a very bad approximation, particularly for the ReLU activation.
Masking matrix. Has columns of 0's.
Experimental Results
SVHN – Street View House Numbers
Deep Learning Seminar - Dropout
Experimental Results
CIFAR-10 and CIFAR-100:
Deep Learning Seminar - Dropout
Experimental Results
The effect of dropout on learned features:
Deep Learning Seminar - Dropout
MNIST, one hidden layer, 256 ReLUs
No dropout
Units have co-adapted. Each unit does not detect a meaningful feature.
Experimental Results
Deep Learning Seminar - Dropout
Experimental Results
The effect of data set size:
Deep Learning Seminar - Dropout
Huge data set
Dropout barely improves the error rate. The data set is big enough, so that overfitting is not an issue.
Average to large data set
Dropout improves error rate.
Extremely small data set
Dropout does not improve error rate, and even makes it worse.
Experimental Results
Deep Learning Seminar - Dropout
How good is the approximated averaging technique?
Weight Decay
Deep Learning Seminar - Dropout
Weight Decay
Deep Learning Seminar - Dropout
DropConnect
Deep Learning Seminar - Dropout
Dropout
DropConnect
A private case
DropConnect
Input to the activation function
A weighted sum of Bernoulli variables. Can be approximated by a Gaussian
Statistics of the Gaussian
DropConnect
No-drop, Dropout and DropConnect comparison:
Deep Learning Seminar - Dropout
MNIST
CIFAR-10
SVHN
DropConnect
In DropConnect’s paper, they achieve the lowest error rate recorded so far in MNIST!
Deep Learning Seminar - Dropout
These results are equivalent for the use of DropConnect, and the use of no drop at all…
DropConnect
DropConnect’s drawbacks:
Deep Learning Seminar - Dropout
Summary
Deep Learning Seminar - Dropout