DLCV04
1
Task 1: + Resources
2
Task 1: Architecture
For the first task, we modified a convent designed for the MNIST dataset.
3
ConvLayer_1
32x26x26
ReLU
ConvLayer_2
32x24x24
ReLU
MaxPooling
2x2 kernel
FullyConnectedLayer
128
Softmax
10 classes
Input
1x28x28
epoch time 18s parameters:600810
Task 1: Architecture
Removing one Conv layer
Adding 3 more Conv layers
Epoch time: 11s
Total parameters: 693962
Epoch time: 39s
Total parameters: 370506
4
Adding new Conv layers results expensive in terms of total run time!
Task 1: Architecture
Adding a new FC layer with output size 1024 between the Conv layers and the original FC layer
5
Epoch time: 28 sec
Total Parameters: 4.9M
Adding new FC layers is expensive in terms of memory consumption!
Task 2: Batch Size
6
Test with MNIST
Only 10 classes → no perceptible difference changing the batch (same happened with CIFAR10)
Epoch time = 28 s
Epoch time = 40 s
Task 2: Data Augmentation
7
Possible causes:
Task 2: Batch Normalization
8
Task 2: Overfitting
We have trained a network with 2 conv layers and 1 fully connected layer
Database: Terrassa, only 450 training images, no data augmentation
9
without dropout
with dropout
Task 3 - Filter Visualization
10
Conv
Layer 1
Conv
Layer 2
Conv
Layer 3
Conv
Layer 4
ConvLayer_1
ReLU
ConvLayer_2
ReLU
MaxPooling
FC Layer
512
Softmax
10 classes
Input
32x32 pixels
ConvLayer_3
ReLU
ConvLayer_4
ReLU
MaxPooling
CIFAR10 NET
Task 3 - Filter Visualization
Filters of pre-trained VGG16 'conv5_1'
They are calculated defining a loss function that maximizes the activation of a specific filter in a specific layer. We have simply executed a keras/example on the following link
11
Task 3 - T-SNE
The T-SNE is a tool to visualize high-dimensional data.
Converts similarities between data points to joint probabilities and tries to minimize the KL divergence.
In this example we used the MNIST dataset with 2500 images.
12
Task 3 - Off-the-shelf VGG-16 Local Classification
prob: 0.113
prob: 0.0493
prob: 0.25
13
We did these tests using a trained VGG16 network for the Imagenet dataset, the probability displayed is the value for the class of the original image.
Task 4 - Fine tuning
14
Terrassa database
Preprocessing
CIFAR
model of Keras
pre-trained 200 epochs
acc: 0.7361
removed last layer and added a dense layer with 13 neurons
3000 epochs
batches size: 32
CIFAR fine tuning for Terrassa
val acc: 0.7750
640 labeled samples
Task 4 - Fine tuning
Accuracy
Losses
15
Task 4 - Fine tuning
2. Use pre-trained weights of VGG16 and train on top a classifier for Terrassa Buildings 900 2
16
Terrassa database
Preprocessing
VGG16
pre-trained weights of VGG16 trained with ImageNet
removed last layer and added a dense layer with 13 neurons
350 epochs
VGG16 fine tuning for Terrassa
val acc: 0.85
640 labeled samples
Task 4 - Fine tuning
2. Use pre-trained weights of VGG16 and train on top a classifier for Terrassa Buildings 900 2
Accuracy
Losses
17
Task 5 - Open Project
Neural Style
Goal: Generate a new image with the content of image1 and the style of image2
18
base image
style
How to encode content? ['conv4_2'] of VGG16
How to encode style? Gram matrix ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1'] of VGG16
Task 5 - Open Project
Neural Style
19
base image
style
10 iterations
20 iterations
5 iterations
1 iteration
default style and content features
Task 5 - Open Project
Test 1: Only first conv style features feature_layers = ['conv1_1', 'conv2_1']
20
1
5
10
20
10
20
5
1
default
first conv layers features
Task 5 - Open Project
Test 2: Only last conv style features feature_layers = ['conv4_1', 'conv5_1']
21
10
20
5
1
default
last conv layer features
1
10
5
20
Thanks!! Questions?
22