1 of 71

Learning Representations III

(Mixtures & GANs)

Understanding Representations

CS331B: Representation Learning in Computer Vision

Amir R. Zamir

Silvio Savarese

2 of 71

(class logistics)

Make-up class. Thursday, Hewlett 201, 5:30-7:00 PM.
Wednesday Class:

M Huh, P Agrawal, AA Efros, What makes ImageNet good for transfer learning, arXiv 2016.
C Vondrick, H Pirsiavash, A Torralba, Generating Videos with Scene Dynamics, NIPS 2016.
P Agrawal, R Girshick, J Malik, Analyzing the performance of multilayer neural networks for object recognition, ECCV 2014.

3 of 71

What we talked about so far...

4 of 71

“Transcript”

Cat

Macbeth was guilty.

[ 81 20 84 64 58 39 17 54 72 15]

Representation

Mathematical Model

(e.g., classifier)

5 of 71

Some basics concepts related to representations

Ill-posedness
Readout Non-linearity
Dimensionality
Computational Complexity
Encoding power (i.e., performance)
Narrowness of application domain (vertical vs horizontal representations)

6 of 71

Handcrafting Representations

Color Histograms

Deformable Part based Models (DPM)

Histogram of Gradients

(HOG)

Models based Shapes

Felzenszwalb et al., 2010.

Dalal and Triggs, 2005.

Beis and Lowe, 1997.

7 of 71

Learning Representations

Supervised

Representation constrained on task(s).

Unsupervised

Representation constrained on reconstruction.

LeCun et al. 1998.

Hinton et al. 2006.

8 of 71

Unsupervised representation learning

Sparse Coding
Basic Autoencoders

9 of 71

Supervised representation learning

Stanford CS231n

Neural Net

A Neuron

Convolutional X

10 of 71

Supervised Low-level Matching

Zagoruyko & Komodakis. 2015.

11 of 71

Lectures 4&6

3D, activities, segmentations, layout, BoW, etc.

12 of 71

Objects-based Representations (ImageNet)

Krizhevsky et al. 2012.

Deng et al. 2009.

Zeiler & Fergus. 2014.

Escorcia et al. 2015.

Query |

Nearest Neighbors

13 of 71

Scene-based Representations (MIT-Places)

Zhou et al. 2014.

14 of 71

(~static) Video Representations

Simonyan & Zisserman. 2014.

Karpathy et al. 2015.

15 of 71

Recurrent Models & Structured Prediction

Jain et al. 2016.

16 of 71

Today

Mixing Representations
Generative Adversarial Networks (GAN)
Understanding and Probing Representations I

17 of 71

Methods of Mixing Representations

18 of 71

Mixing Representations

Sometimes you wish to mix two/multiple tasks/representations

To expand
To transfer information or labeled data across tasks
To form a multi-task representation
To form a better single-task representation

Sometimes you wish to mix two/multiple tasks/representations

19 of 71

Mixing Representations - How?

Li & Hoiem. 2016.

20 of 71

Mixing Representations - fine tuning

Li & Hoiem. 2016.

21 of 71

Mixing Representations - joint (multi-task) training

Li & Hoiem. 2016.

22 of 71

Mixing Representations - feature extraction

Li & Hoiem. 2016.

23 of 71

Mixing Representations - LwF

Li & Hoiem. 2016.

ECCV’16

24 of 71

Mixing Representations

Pros and cons of each method:

Li & Hoiem. 2016.

25 of 71

Mixing Representations

Empirical study:

Li & Hoiem. 2016.

26 of 71

Curriculum Learning

For faster convergence, better minima, (and mixing representations)

Bengio et al. 2009.

27 of 71

Curriculum Learning

Guided learning helps training humans and animals

Shaping

Start from simpler examples / easier tasks (Piaget 1952, Skinner 1958)

Education

Bengio et al. 2009. slides and paper credit.

28 of 71

The Dogma in question

It is best to learn from a training set of examples sampled from the same distribution as the test set. Really?

Bengio et al. 2009. slides and paper credit.

29 of 71

Question

Can machine learning algorithms benefit from a curriculum strategy?

Cognition journal:

(Elman 1993) vs (Rohde & Plaut 1999),

(Krueger & Dayan 2009)

Bengio et al. 2009. slides and paper credit.

30 of 71

Convex vs Non-Convex Criteria

Convex criteria: the order of presentation of examples should not matter to the convergence point, but could influence convergence speed
Non-convex criteria: the order and selection of examples could yield to a better local minimum

Bengio et al. 2009. slides and paper credit.

31 of 71

Deep Architectures

Theoretical arguments: deep architectures can be exponentially more compact than shallow ones representing the same function
Cognitive and neuroscience arguments
Many local minima
Good candidate for testing curriculum ideas

Bengio et al. 2009. slides and paper credit.

32 of 71

Deep Training Trajectories

Random initialization

Unsupervised guidance

(Erhan et al. AISTATS 09)

Bengio et al. 2009. slides and paper credit.

33 of 71

Starting from Easy Examples

Most difficult examples
Higher level abstractions

Easiest
Lower level

abstractions

Bengio et al. 2009. slides and paper credit.

34 of 71

Continuation Methods

Target objective

Heavily smoothed objective = surrogate criterion

Track local minima

Final solution

Easy to find minimum

Bengio et al. 2009. slides and paper credit.

35 of 71

Curriculum Learning as Continuation

Sequence of training distributions
Initially peaking on easier / simpler ones
Gradually give more weight to more difficult ones until reach target distribution

Most difficult examples
Higher level abstractions

Easiest
Lower level

abstractions

Bengio et al. 2009. slides and paper credit.

36 of 71

How to order examples?

The right order is not known
3 series of experiments:

Toy experiments with simple order

Larger margin first
Less noisy inputs first

Simpler shapes first, more varied ones later
Smaller vocabulary first

Bengio et al. 2009. slides and paper credit.

37 of 71

Larger Margin First: Faster Convergence

Bengio et al. 2009. slides and paper credit.

38 of 71

Cleaner First: Faster Convergence

Bengio et al. 2009. slides and paper credit.

39 of 71

Shape Recognition

First: easier, basic shapes

Second = target: more varied geometric shapes

Bengio et al. 2009. slides and paper credit.

40 of 71

Shape Recognition Experiment

3-hidden layers deep net known to involve local minima (unsupervised pre-training finds much better solutions)
10 000 training / 5 000 validation / 5 000 test examples
Procedure:

Train for k epochs on the easier shapes
Switch to target training set (more variations)

Bengio et al. 2009. slides and paper credit.

41 of 71

Shape Recognition Results

Bengio et al. 2009. slides and paper credit.

42 of 71

Why?

Faster convergence to a minimum
Wasting less time with noisy or harder to predict examples
Convergence to better local minima

Curriculum = particular continuation method

Finds better local minima of a non-convex training criterion
Like a regularizer, with main effect on test set

Bengio et al. 2009. slides and paper credit.

43 of 71

This lecture

Finishing up GANs
Brief overview of representation understanding methods
Generic Representations

44 of 71

Energy-Based GANs

An energy-based formulation for discriminator

Auto-encoder instantiating

Zhao et al. 2016.

45 of 71

Generative Adversarial Networks

Goodfellow et al. 2014.

46 of 71

Generative Adversarial Networks�

Goodfellow et al. 2014.

Kevin McGuinness. 2016.

47 of 71

Generative Adversarial Networks�

Goodfellow et al. 2014.

48 of 71

Generative Adversarial Networks�

Goodfellow et al. 2014.

MNIST

CIFAR10

CIFAR100

TFD

49 of 71

Generative Adversarial Networks�

Why does this matter?

Goodfellow et al. 2014.

MNIST

CIFAR10

CIFAR100

TFD

50 of 71

Problems with GAN

Diversity

Generator may overfit to certain modes in the data

Stability

Hard to train. Sensitive to choice of hyperparameters and status of Discriminator & Generator

Evaluation

How to evaluate the generated results?

Goodfellow et al. 2014.

51 of 71

Energy-Based GANs

An energy-based formulation for discriminator

Auto-encoder instantiating

Repelling Regularizer (Pull-Away). Minibatch Discrimination is an alternative.
Better behaved, more diverse results.

Zhao et al. 2016.

52 of 71

EBGAN vs GAN (on MNIST)

Zhao et al. 2016.

GAN

EBGAN

EBGAN with PT term

53 of 71

EBGAN vs GAN (on LSUN)

Zhao et al. 2016.

GAN

EBGAN

54 of 71

EBGAN vs GAN (on CelebA)

Zhao et al. 2016.

GAN

EBGAN

55 of 71

EBGAN (on ImageNet)

Zhao et al. 2016.

56 of 71

EBGAN (on ImageNet)

Remember the main objective: learning an arbitrarily complex distribution of pixels (i.e. visual worlds)

Zhao et al. 2016.

57 of 71

A use case of such distribution: “Generative Visual Manipulation on the Natural Image Manifold”

Editing images while remaining on the natural image manifold

i.e. preserving realism

Learns distribution of real data using GAN. Defines a set of edits and constrain the output to fall on the manifold.

Zhu et al. 2016.

58 of 71

Zhu et al. 2016.

59 of 71

Zhu et al. 2016.

60 of 71

Understanding and Probing Representations

(very brief executive summary)

61 of 71

Understanding Representations�

Why?!
Tools:

Nearest neighbors in full dimensional space
Low-dimensional embeddings
Readout function
Inverting the representation (remember Hoggles?)
(Discussed in upcoming lectures:)

Minimal Image (what matters in an image)
Receptive field (what matters to a neuron)
Images maximally activating a neuron
Neuron activation maps
Visualization learned filters

62 of 71

Nearest neighbors in full dimensional space�

Query |

Nearest Neighbors

Krizhevsky et al. 2012.

Deng et al. 2009.

63 of 71

Nearest neighbors in full dimensional space�

Zamir et al. 2016

Zamir et al.

Wang & Gupta. 2015

Agrawal et al. 2015

Krizhevsky (Imagenet), 2015

64 of 71

Low-dimensional embeddings�

6000 MNIST Digits

tSNE
Isomap
Sammon M
LLE

Van der Maaten & Hinton. 2008

65 of 71

Low-dimensional embeddings�

tSNE

Van der Maaten & Hinton. 2008

66 of 71

Van der Maaten & Hinton. 2008

67 of 71

Inverting the representation

Simonyan et al. 2014

Class appearance models (ImageNet)

Image

Top-1 class saliency map

Thresholded saliency map (for segmentation)

Foreground Segment

68 of 71

Inverting the representation

Supervised

Dala & Triggs. 2005.

Vondrick et al. 2013.

Mahendran & Vadaldi. 2016.

Dosovitskiy & Brox. 2016.

Dosovitskiy & Brox

Hoggles

HOG^-1

Mahendran

HOG

Image

69 of 71

Inverting the representation

Supervised

Dosovitskiy & Brox. 2016.

Mahendran & Vadaldi. 2016.

Hinton et al. 2006.

Dosovitskiy & Brox

Mahendran & Vedaldi

70 of 71

Understanding Representations�

Nearest neighbors in full dimensional space
Low-dimensional embeddings
Read-out function
Inverting the representation
Minimal Image (what matters in an image)
Receptive field (what matters to a neuron)
Images maximally activating a neuron
Neuron activation maps
Visualization learned filters

1 of 71

2 of 71

3 of 71

4 of 71

5 of 71

6 of 71

7 of 71

8 of 71

9 of 71

10 of 71

11 of 71

12 of 71

13 of 71

14 of 71

15 of 71

16 of 71

17 of 71

18 of 71

19 of 71

20 of 71

21 of 71

22 of 71

23 of 71

24 of 71

25 of 71

26 of 71

27 of 71

28 of 71

29 of 71

30 of 71

31 of 71

32 of 71

33 of 71

34 of 71

35 of 71

36 of 71

37 of 71

38 of 71

39 of 71

40 of 71

41 of 71

42 of 71

43 of 71

44 of 71

45 of 71

46 of 71

47 of 71

48 of 71

49 of 71

50 of 71

51 of 71

52 of 71

53 of 71

54 of 71

55 of 71

56 of 71

57 of 71

58 of 71

59 of 71

60 of 71

61 of 71

62 of 71

63 of 71

64 of 71

65 of 71

66 of 71

67 of 71

68 of 71

69 of 71

70 of 71

71 of 71