JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 26

Deep Networks

(DenseNet & ResNet)

Rowel Atienza

rowel@eee.upd.edu.ph

University of the Philippines

2 of 26

DenseNet

Huang, Gao, et al. "Densely connected convolutional networks Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

3 of 26

Problem of

Vanishing Gradient

Gradients vanish when computed at shallow layers due to the nature of backprop that uses chain rule

A variable that is dependent on another variable that is small is also small. This smallness propages to the shallow layers until it vanishes.

4 of 26

DenseNet (>100 layers)

5 of 26

Solution

Encourage information flow down to shallow layers
How do we maximize information flow between networks?

DenseNet: Connect all layers directly with each other
DenseNet: Feature maps are combined by concatenation

6 of 26

DenseNet Input

7 of 26

DenseNet

8 of 26

DenseNet vs ResNet

9 of 26

DenseNet 2 Issues to Address

How to address feature maps growth rate?
How to reduce feature map size?

10 of 26

Feature Maps Growth Rate

11 of 26

BottleNeck Layer Controls Growth Rate

12 of 26

Transition Layer Bridges Blocks w/ Different Feature Map Sizes

Deep DenseNet with decreasing feature map size are divided into to Dense Blocks
A transition layer between 2 Dense Blocks block enables the change of feature map size;

Transition Layer = Convolution and Pooling

13 of 26

CIFAR10

10 categories

50k train set

10k test set

14 of 26

100-layer DenseNet Architecture

(Accuracy is > 93.55%)

15 of 26

DenseNet on CIFAR10

16 of 26

ResNet v1 & v2

He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016a.

He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing, 2016b.

17 of 26

RESNET

Problem of vanishing gradients/exploding gradients
In theory, as the network gets deeper, it gets better since it learns more representations

18 of 26

RESNET

However, this is not the case for plain networks

In more common cases, as the network has maximized its performance, it starts to degrade as the training continues
The problem is not with overfitting

19 of 26

RESNET

20 of 26

21 of 26

ResNet Residual

Block

22 of 26

ResNet v1 on CIFAR 10

23 of 26

ResNet

CIFAR10

24 of 26

ResNet v2

25 of 26

Accuracy of ResNet v1/v2 on CIFAR10

26 of 26

END