1 of 23

DLCV04

Manel Baradad, Míriam Bellver, Martí Cervià, Hector Esteban, Carlos Roig

Visit our GitHub page here

1

2 of 23

Task 1: + Resources

  • Software used:
    • Keras with GPU accelerated Tensor Flow as backend, using python 2.7.
  • Hardware used:
    • PC with GPU (NVIDIA 970GTX)

2

3 of 23

Task 1: Architecture

For the first task, we modified a convent designed for the MNIST dataset.

3

ConvLayer_1

32x26x26

ReLU

ConvLayer_2

32x24x24

ReLU

MaxPooling

2x2 kernel

FullyConnectedLayer

128

Softmax

10 classes

Input

1x28x28

epoch time 18s parameters:600810

4 of 23

Task 1: Architecture

Removing one Conv layer

Adding 3 more Conv layers

Epoch time: 11s

Total parameters: 693962

Epoch time: 39s

Total parameters: 370506

4

Adding new Conv layers results expensive in terms of total run time!

5 of 23

Task 1: Architecture

Adding a new FC layer with output size 1024 between the Conv layers and the original FC layer

5

Epoch time: 28 sec

Total Parameters: 4.9M

Adding new FC layers is expensive in terms of memory consumption!

6 of 23

Task 2: Batch Size

6

Test with MNIST

Only 10 classes → no perceptible difference changing the batch (same happened with CIFAR10)

Epoch time = 28 s

Epoch time = 40 s

7 of 23

Task 2: Data Augmentation

7

  • Trained with CIFAR10

  • Results worse with Augmentation than without Augmentation.

Possible causes:

  • Not enough epochs
  • This is real-time augmentation, so images are randomly flipped / translated, etc, but the amount of images per epochs is the same

8 of 23

Task 2: Batch Normalization

8

  • Trained with CIFAR10

  • With Batch Normalization the model learns faster, with fewer epochs

9 of 23

Task 2: Overfitting

We have trained a network with 2 conv layers and 1 fully connected layer

Database: Terrassa, only 450 training images, no data augmentation

9

without dropout

with dropout

10 of 23

Task 3 - Filter Visualization

10

Conv

Layer 1

Conv

Layer 2

Conv

Layer 3

Conv

Layer 4

ConvLayer_1

ReLU

ConvLayer_2

ReLU

MaxPooling

FC Layer

512

Softmax

10 classes

Input

32x32 pixels

ConvLayer_3

ReLU

ConvLayer_4

ReLU

MaxPooling

CIFAR10 NET

11 of 23

Task 3 - Filter Visualization

Filters of pre-trained VGG16 'conv5_1'

They are calculated defining a loss function that maximizes the activation of a specific filter in a specific layer. We have simply executed a keras/example on the following link

11

12 of 23

Task 3 - T-SNE

The T-SNE is a tool to visualize high-dimensional data.

Converts similarities between data points to joint probabilities and tries to minimize the KL divergence.

In this example we used the MNIST dataset with 2500 images.

12

13 of 23

Task 3 - Off-the-shelf VGG-16 Local Classification

prob: 0.113

prob: 0.0493

prob: 0.25

13

We did these tests using a trained VGG16 network for the Imagenet dataset, the probability displayed is the value for the class of the original image.

14 of 23

Task 4 - Fine tuning

  1. Train network on CIFAR10 and fine-tune for Terrassa Buildings 900 2

14

Terrassa database

Preprocessing

  1. resize (32,32)
  2. normalization color channels

CIFAR

model of Keras

pre-trained 200 epochs

acc: 0.7361

removed last layer and added a dense layer with 13 neurons

3000 epochs

batches size: 32

CIFAR fine tuning for Terrassa

val acc: 0.7750

640 labeled samples

15 of 23

Task 4 - Fine tuning

  • Train network on CIFAR10 and fine-tune for Terrassa Buildings 900 2

Accuracy

Losses

15

16 of 23

Task 4 - Fine tuning

2. Use pre-trained weights of VGG16 and train on top a classifier for Terrassa Buildings 900 2

16

Terrassa database

Preprocessing

  • resize (224,224)
  • subtracted mean of Imagenet dataset

VGG16

pre-trained weights of VGG16 trained with ImageNet

removed last layer and added a dense layer with 13 neurons

350 epochs

VGG16 fine tuning for Terrassa

val acc: 0.85

640 labeled samples

17 of 23

Task 4 - Fine tuning

2. Use pre-trained weights of VGG16 and train on top a classifier for Terrassa Buildings 900 2

Accuracy

Losses

17

18 of 23

Task 5 - Open Project

Neural Style

Goal: Generate a new image with the content of image1 and the style of image2

18

base image

style

How to encode content? ['conv4_2'] of VGG16

How to encode style? Gram matrix ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1'] of VGG16

19 of 23

Task 5 - Open Project

Neural Style

19

base image

style

10 iterations

20 iterations

5 iterations

1 iteration

default style and content features

20 of 23

Task 5 - Open Project

Test 1: Only first conv style features feature_layers = ['conv1_1', 'conv2_1']

20

1

5

10

20

10

20

5

1

default

first conv layers features

21 of 23

Task 5 - Open Project

Test 2: Only last conv style features feature_layers = ['conv4_1', 'conv5_1']

21

10

20

5

1

default

last conv layer features

1

10

5

20

22 of 23

Thanks!! Questions?

22

23 of 23

References

23