1 of 26

Deep Networks

(DenseNet & ResNet)

Rowel Atienza

rowel@eee.upd.edu.ph

University of the Philippines

2 of 26

DenseNet

Huang, Gao, et al. "Densely connected convolutional networks Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

3 of 26

Problem of

Vanishing Gradient

Gradients vanish when computed at shallow layers due to the nature of backprop that uses chain rule

A variable that is dependent on another variable that is small is also small. This smallness propages to the shallow layers until it vanishes.

4 of 26

DenseNet (>100 layers)

5 of 26

Solution

  • Encourage information flow down to shallow layers
  • How do we maximize information flow between networks?
    • DenseNet: Connect all layers directly with each other
    • DenseNet: Feature maps are combined by concatenation

6 of 26

DenseNet Input

 

7 of 26

DenseNet

 

8 of 26

DenseNet vs ResNet

 

9 of 26

DenseNet 2 Issues to Address

  • How to address feature maps growth rate?
  • How to reduce feature map size?

10 of 26

Feature Maps Growth Rate

 

11 of 26

BottleNeck Layer Controls Growth Rate

 

12 of 26

Transition Layer Bridges Blocks w/ Different Feature Map Sizes

  • Deep DenseNet with decreasing feature map size are divided into to Dense Blocks
  • A transition layer between 2 Dense Blocks block enables the change of feature map size;
    • Transition Layer = Convolution and Pooling

13 of 26

CIFAR10

10 categories

50k train set

10k test set

14 of 26

100-layer DenseNet Architecture

(Accuracy is > 93.55%)

15 of 26

DenseNet on CIFAR10

16 of 26

ResNet v1 & v2

He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016a.

He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing, 2016b.

17 of 26

RESNET

  • Problem of vanishing gradients/exploding gradients
  • In theory, as the network gets deeper, it gets better since it learns more representations

18 of 26

RESNET

  • However, this is not the case for plain networks
    • In more common cases, as the network has maximized its performance, it starts to degrade as the training continues
    • The problem is not with overfitting

19 of 26

RESNET

 

20 of 26

21 of 26

ResNet Residual

Block

22 of 26

ResNet v1 on CIFAR 10

23 of 26

ResNet

on

CIFAR10

24 of 26

ResNet v2

25 of 26

Accuracy of ResNet v1/v2 on CIFAR10

26 of 26

END