1 of 84

Lecture 13:

Understanding and Visualizing

Convolutional Neural Networks,

Plus Self-Supervision

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

2 of 84

Classification

Classification + Localization

Computer Vision Tasks

CAT

CAT, DOG, DUCK

Object Detection

Instance Segmentation

CAT, DOG, DUCK

Single object

Multiple objects

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

3 of 84

Understanding ConvNets

Visualize the weights
Visualize patches that maximally activate neurons
Visualize the representation space (e.g. with t-SNE)
Occlusion experiments
Human experiment comparisons
Deconv approaches (single backward pass)
Optimization over image approaches (optimization)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

4 of 84

Visualize the filters/kernels (raw weights)

one-stream AlexNet

conv1

only interpretable on the first layer :(

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

5 of 84

Visualize the filters/kernels (raw weights)

one-stream AlexNet

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

6 of 84

Visualize the

filters/kernels

(raw weights)

you can still do it for higher layers, it’s just not that interesting

(these are taken from ConvNetJS CIFAR-10 demo)

layer 1 weights

layer 2 weights

layer 3 weights

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

7 of 84

Visualize patches that maximally activate neurons

Rich feature hierarchies for accurate object detection and semantic segmentation

[Girshick, Donahue, Darrell, Malik]

one-stream AlexNet

pool5

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

8 of 84

Detour #1

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

9 of 84

Self-supervision vs. full supervision

Supervised learning: given pairs (input x, output y) learn to map x to y

( , flamingo) ( , hay)

Self-supervision: y is inherent in the data; �learn a proxy task to predict this implicit y
Example: inpainting – filling in a missing piece

( , , ) ( , )

10 of 84

Self-supervision by colorization

Motivation for self-supervision: need to learn about the visual world to do well on the proxy task
Self-supervision by colorization: �input = graylevel image, output = color channels

( , ) ( , )��

Intuition: to colorize well, need to learn semantics of the world -- flamingos are pink, sky is blue, grass is green – without any manual annotation of these concepts.

11 of 84

Colorization model�ECCV 2016, Larsson, Maire, Shakhnarovich

Our model: fully convolutional network, �zoomout (hypercolumn) features, �predict color distribution per pixel

12 of 84

Colorization results

Colorization pre-training: closes 70% or more of the gap between random init. and supervised ImageNet pretraining (on semantic segmentation, classification, etc.)

13 of 84

Learned Structure from Colorization

object

non-specific

object

specific

color non-specific

color specific

hidden unit activations

Larsson, Maire, Shakhnarovich, CVPR 2017

17 of 84

End of Detour #1

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

18 of 84

Visualizing the representation

fc7 layer

4096-dimensional “code” for an image

(layer immediately before the classifier)

can collect the code for many images

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

19 of 84

Visualizing the representation

t-SNE visualization

[van der Maaten & Hinton]

(t-distributed stochastic neighbor embed.)

Embed high-dimensional points so that locally, pairwise distances are conserved

i.e. similar things end up in similar places. dissimilar things end up wherever

Right: Example embedding of MNIST digits (0-9) in 2D

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

20 of 84

t-SNE visualization:

two images are placed nearby if their CNN codes are close. See more:

http://cs.stanford.edu/people/karpathy/cnnembed/

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

21 of 84

Occlusion experiments

[Zeiler & Fergus 2013]

(as a function of the position of the square of zeros in the original image)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

22 of 84

(as a function of the position of the square of zeros in the original image)

Occlusion experiments

[Zeiler & Fergus 2013]

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

23 of 84

Visualizing Activations

http://yosinski.com/deepvis

YouTube video

https://www.youtube.com/watch?v=AgkfIQ4IGaM

(4min)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

24 of 84

Deconv approaches

Feed image into net

Q: how can we compute the gradient of any arbitrary neuron in the network w.r.t. the image?

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

25 of 84

Deconv approaches

Feed image into net

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

26 of 84

Deconv approaches

Feed image into net

2. Pick a layer, set the gradient there to be all zero except for one 1 for some neuron of interest

3. Backprop to image:

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

27 of 84

Deconv approaches

Feed image into net

“Guided

backpropagation:”

instead

2. Pick a layer, set the gradient there to be all zero except for one 1 for some neuron of interest

3. Backprop to image:

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

28 of 84

Deconv approaches

[Visualizing and Understanding Convolutional Networks, Zeiler and Fergus 2013]

[Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, Simonyan et al., 2014]

[Striving for Simplicity: The all convolutional net, Springenberg, Dosovitskiy, et al., 2015]

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

29 of 84

Deconv approaches

[Visualizing and Understanding Convolutional Networks, Zeiler and Fergus 2013]

[Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, Simonyan et al., 2014]

[Striving for Simplicity: The all convolutional net, Springenberg, Dosovitskiy, et al., 2015]

Backward pass for a ReLU (will be changed in Guided Backprop)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

30 of 84

Deconv approaches

[Visualizing and Understanding Convolutional Networks, Zeiler and Fergus 2013]

[Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, Simonyan et al., 2014]

[Striving for Simplicity: The all convolutional net, Springenberg, Dosovitskiy, et al., 2015]

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

31 of 84

Visualization of patterns learned by the layer conv6 (top) and layer conv9 (bottom) of the network trained on ImageNet.

Each row corresponds to one filter.

The visualization using “guided backpropagation” is based on the top 10 image patches activating this filter taken from the ImageNet dataset.

[Striving for Simplicity: The all convolutional net, Springenberg, Dosovitskiy, et al., 2015]

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

32 of 84

Deconv approaches

[Visualizing and Understanding Convolutional Networks, Zeiler and Fergus 2013]

[Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, Simonyan et al., 2014]

[Striving for Simplicity: The all convolutional net, Springenberg, Dosovitskiy, et al., 2015]

bit weird

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

33 of 84

-2

-3

ReLU

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

34 of 84

-2

-3

ReLU

positive gradient, negative gradient, zero gradient

In backprop: all +ve and -ve paths of influence through the graph interfere

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

35 of 84

-2

-3

ReLU

positive gradient, negative gradient, zero gradient

In guided backprop: cancel out -ve paths of influence at each step

(i.e. we only keep positive paths of influence)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

36 of 84

Visualizing arbitrary neurons along the way to the top...

Visualizing and Understanding Convolutional Networks

Zeiler & Fergus, 2013

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

37 of 84

Visualizing arbitrary neurons along the way to the top...

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

38 of 84

Visualizing arbitrary neurons along the way to the top...

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

39 of 84

Q: can we find an image that maximizes some class score?

Optimization to Image

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

40 of 84

Optimization to Image

Q: can we find an image that maximizes some class score?

score for class c (before Softmax)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

41 of 84

Optimization to Image

zero image

1. feed in zeros.

2. set the gradient of the scores vector to be [0,0,....1,....,0], then backprop to image

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

42 of 84

Optimization to Image

zero image

1. feed in zeros.

2. set the gradient of the scores vector to be [0,0,....1,....,0], then backprop to image

3. do a small “image update”

4. forward the image through the network.

5. go back to 2.

score for class c (before Softmax)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

43 of 84

Find images that maximize some class score:

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2014

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

44 of 84

Find images that maximize some class score:

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2014

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

45 of 84

2. Visualize the

Data gradient:

(note that the gradient on data has three channels. Here they visualize M, s.t.:

(at each pixel take abs val, and max over channels)

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2014

M = ?

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

46 of 84

2. Visualize the

Data gradient:

(note that the gradient on data has three channels. Here they visualize M, s.t.:

(at each pixel take abs val, and max over channels)

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2014

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

47 of 84

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, 2014

Use grabcut for segmentation

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

48 of 84

We can in fact do this for arbitrary neurons along the ConvNet

Repeat:

Forward an image
Set activations in layer of interest to all zero, except for a 1.0 for a neuron of interest
Backprop to image
Do an “image update”

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

49 of 84

[Understanding Neural Networks Through Deep Visualization, Yosinski et al. , 2015]

Proposed a different form of regularizing the image

Repeat:

Update the image x with gradient from some unit of interest
Blur x a bit
Take any pixel with small norm to zero (to encourage sparsity)

More explicit scheme:

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

50 of 84

[Understanding Neural Networks Through Deep Visualization, Yosinski et al. , 2015]

http://yosinski.com/deepvis

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

51 of 84

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

52 of 84

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

53 of 84

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

54 of 84

Question: Given a CNN code, is it possible to reconstruct the original image?

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

55 of 84

Find an image such that:

Its code is similar to a given code
It “looks natural” (image prior regularization)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

56 of 84

Understanding Deep Image Representations by Inverting Them

[Mahendran and Vedaldi, 2014]

original image

reconstructions

from the 1000 log probabilities for ImageNet (ILSVRC) classes

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

57 of 84

Reconstructions from the representation after last last pooling layer

(immediately before the first Fully Connected layer)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

58 of 84

Reconstructions from intermediate layers

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

59 of 84

Next up: DeepDream

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

60 of 84

DeepDream https://github.com/google/deepdream

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

61 of 84

DeepDream modifies the image in a way that “boosts” all activations, at any layer

this creates a feedback loop: e.g. any slightly detected dog face will be made more and more dog like over time

inception_4c/output

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

62 of 84

DeepDream modifies the image in a way that “boosts” all activations, at any layer

inception_4c/output

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

63 of 84

inception_3b/5x5_reduce

DeepDream modifies the image in a way that “boosts” all activations, at any layer

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

64 of 84

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

65 of 84

DeepDream: set dx = x :)

“image update”

jitter regularizer

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

66 of 84

DeepDream modifies the image in a way that “boosts” all activations, at any layer

this creates a feedback loop: e.g. any slightly detected dog face will be made more and more dog like over time

inception_4c/output

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

67 of 84

Bonus videos

Deep Dream Grocery Trip

https://www.youtube.com/watch?v=DgPaCWJL7XI

Deep Dreaming Fear & Loathing in Las Vegas: the Great San Francisco Acid Wave

https://www.youtube.com/watch?v=oyxSerkkP4o

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

68 of 84

NeuralStyle

[ A Neural Algorithm of Artistic Style by Leon A. Gatys,

Alexander S. Ecker, and Matthias Bethge, 2015]

good implementation by Justin in Torch:

https://github.com/jcjohnson/neural-style

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

69 of 84

make your own easily on deepart.io

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

70 of 84

Step 1: Extract content targets (ConvNet activations of all layers for the given content image)

content activations

e.g.

at CONV5_1 layer we would have a [14x14x512] array of target activations

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

71 of 84

Step 2: Extract style targets (Gram matrices of ConvNet activations of all layers for the given style image)

style gram matrices

e.g.

at CONV1 layer (with [224x224x64] activations) would give a [64x64] Gram matrix of all pairwise activation covariances (summed across spatial locations)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

72 of 84

Step 3: Optimize over image to have:

The content of the content image (activations match content)
The style of the style image (Gram matrices of activations match style)

(+Total Variation regularization (maybe))

match content

match style

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

73 of 84

We can pose an optimization over the input image to maximize any class score.

That seems useful.

Question: Can we use this to “fool” ConvNets?

spoiler alert: yeah

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

74 of 84

[Intriguing properties of neural networks, Szegedy et al., 2013]

correct

+distort

ostrich

correct

+distort

ostrich

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

75 of 84

[Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Nguyen, Yosinski, Clune, 2014]

>99.6% confidences

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

76 of 84

>99.6% confidences

[Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Nguyen, Yosinski, Clune, 2014]

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

77 of 84

These kinds of results were around even before ConvNets…

[Exploring the Representation Capabilities of the HOG Descriptor, Tatu et al., 2011]

Identical HOG represention

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

78 of 84

EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES

[Goodfellow, Shlens & Szegedy, 2014]

“primary cause of neural networks’ vulnerability to adversarial

perturbation is their linear nature“

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

79 of 84

Lets fool a binary linear classifier:

(logistic regression)

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

80 of 84

2	-1	3	-2	2	2	1	-4	5	1
-1	-1	1	-1	1	-1	1	1	-1	1

Lets fool a binary linear classifier:

input example

weights

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

81 of 84

2	-1	3	-2	2	2	1	-4	5	1
-1	-1	1	-1	1	-1	1	1	-1	1

Lets fool a binary linear classifier:

input example

weights

class 1 score = dot product:

= -2 + 1 + 3 + 2 + 2 - 2 + 1 - 4 - 5 + 1 = -3

=> probability of class 1 is 1/(1+e^(-(-3))) = 0.0474

i.e. the classifier is 95% certain that this is class 0 example.

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

82 of 84

2	-1	3	-2	2	2	1	-4	5	1
-1	-1	1	-1	1	-1	1	1	-1	1
?	?	?	?	?	?	?	?	?	?

Lets fool a binary linear classifier:

input example

weights

adversarial x

class 1 score = dot product:

= -2 + 1 + 3 + 2 + 2 - 2 + 1 - 4 - 5 + 1 = -3

=> probability of class 1 is 1/(1+e^(-(-3))) = 0.0474

i.e. the classifier is 95% certain that this is class 0 example.

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

83 of 84

2	-1	3	-2	2	2	1	-4	5	1
-1	-1	1	-1	1	-1	1	1	-1	1
1.5	-1.5	3.5	-2.5	2.5	1.5	1.5	-3.5	4.5	1.5

Lets fool a binary linear classifier:

input example

weights

adversarial x

class 1 score before:

-2 + 1 + 3 + 2 + 2 - 2 + 1 - 4 - 5 + 1 = -3

=> probability of class 1 is 1/(1+e^(-(-3))) = 0.0474

-1.5+1.5+3.5+2.5+2.5-1.5+1.5-3.5-4.5+1.5 = 2

=> probability of class 1 is now 1/(1+e^(-(2))) = 0.88

i.e. we improved the class 1 probability from 5% to 88%

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

84 of 84

2	-1	3	-2	2	2	1	-4	5	1
-1	-1	1	-1	1	-1	1	1	-1	1
1.5	-1.5	3.5	-2.5	2.5	1.5	1.5	-3.5	4.5	1.5

Lets fool a binary linear classifier:

input example

weights

adversarial x

This was only with 10 input dimensions. A 224x224 input image has 150,528.

(It’s significantly easier with more numbers, need smaller nudge for each)

class 1 score before:

-2 + 1 + 3 + 2 + 2 - 2 + 1 - 4 - 5 + 1 = -3

=> probability of class 1 is 1/(1+e^(-(-3))) = 0.0474

-1.5+1.5+3.5+2.5+2.5-1.5+1.5-3.5-4.5+1.5 = 2

=> probability of class 1 is now 1/(1+e^(-(2))) = 0.88

i.e. we improved the class 1 probability from 5% to 88%

Lecture 13 -

21 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 13 -

14 Oct 2021

Erik Learned-Miller�Adapted from slides of Fei-Fei Li & Andrej Karpathy & Justin Johnson

1 of 84

2 of 84

3 of 84

4 of 84

5 of 84

6 of 84

7 of 84

8 of 84

9 of 84

10 of 84

11 of 84

12 of 84

13 of 84

14 of 84

15 of 84

16 of 84

17 of 84

18 of 84

19 of 84

20 of 84

21 of 84

22 of 84

23 of 84

24 of 84

25 of 84

26 of 84

27 of 84

28 of 84

29 of 84

30 of 84

31 of 84

32 of 84

33 of 84

34 of 84

35 of 84

36 of 84

37 of 84

38 of 84

39 of 84

40 of 84

41 of 84

42 of 84

43 of 84

44 of 84

45 of 84

46 of 84

47 of 84

48 of 84

49 of 84

50 of 84

51 of 84

52 of 84

53 of 84

54 of 84

55 of 84

56 of 84

57 of 84

58 of 84

59 of 84

60 of 84

61 of 84

62 of 84

63 of 84

64 of 84

65 of 84

66 of 84

67 of 84

68 of 84

69 of 84

70 of 84

71 of 84

72 of 84

73 of 84

74 of 84

75 of 84

76 of 84

77 of 84

78 of 84

79 of 84

80 of 84