1 of 71

  • Learning Representations III
    • (Mixtures & GANs)
  • Understanding Representations

CS331B: Representation Learning in Computer Vision

Amir R. Zamir

Silvio Savarese

2 of 71

(class logistics)

  • Make-up class. Thursday, Hewlett 201, 5:30-7:00 PM.
  • Wednesday Class:
    • M Huh, P Agrawal, AA Efros, What makes ImageNet good for transfer learning, arXiv 2016.
    • C Vondrick, H Pirsiavash, A Torralba, Generating Videos with Scene Dynamics, NIPS 2016.
    • P Agrawal, R Girshick, J Malik, Analyzing the performance of multilayer neural networks for object recognition, ECCV 2014.

2

3 of 71

What we talked about so far...

3

4 of 71

4

“Transcript”

Cat

Macbeth was guilty.

[ 81 20 84 64 58 39 17 54 72 15]

Representation

Mathematical Model

(e.g., classifier)

5 of 71

Some basics concepts related to representations

  • Ill-posedness
  • Readout Non-linearity
  • Dimensionality
  • Computational Complexity
  • Encoding power (i.e., performance)
  • Narrowness of application domain (vertical vs horizontal representations)

5

6 of 71

Handcrafting Representations

6

Color Histograms

Deformable Part based Models (DPM)

Histogram of Gradients

(HOG)

Models based Shapes

Felzenszwalb et al., 2010.

Dalal and Triggs, 2005.

Beis and Lowe, 1997.

7 of 71

Learning Representations

  • Supervised
    • Representation constrained on task(s).
  • Unsupervised
    • Representation constrained on reconstruction.

7

LeCun et al. 1998.

Hinton et al. 2006.

8 of 71

Unsupervised representation learning

  • Sparse Coding
  • Basic Autoencoders

8

9 of 71

Supervised representation learning

9

Stanford CS231n

Neural Net

A Neuron

Convolutional X

10 of 71

Supervised Low-level Matching

10

Zagoruyko & Komodakis. 2015.

11 of 71

Lectures 4&6

  • 3D, activities, segmentations, layout, BoW, etc.

11

12 of 71

Objects-based Representations (ImageNet)

12

Krizhevsky et al. 2012.

Deng et al. 2009.

Zeiler & Fergus. 2014.

Escorcia et al. 2015.

Query |

Nearest Neighbors

13 of 71

Scene-based Representations (MIT-Places)

13

Zhou et al. 2014.

14 of 71

(~static) Video Representations

14

Simonyan & Zisserman. 2014.

Karpathy et al. 2015.

15 of 71

Recurrent Models & Structured Prediction

15

Jain et al. 2016.

16 of 71

Today

  • Mixing Representations
  • Generative Adversarial Networks (GAN)
  • Understanding and Probing Representations I

16

17 of 71

Methods of Mixing Representations

17

18 of 71

Mixing Representations

  • Sometimes you wish to mix two/multiple tasks/representations
    • To expand
    • To transfer information or labeled data across tasks
    • To form a multi-task representation
    • To form a better single-task representation
  • Sometimes you wish to mix two/multiple tasks/representations

18

19 of 71

Mixing Representations - How?

19

Li & Hoiem. 2016.

20 of 71

Mixing Representations - fine tuning

20

Li & Hoiem. 2016.

21 of 71

Mixing Representations - joint (multi-task) training

21

Li & Hoiem. 2016.

22 of 71

Mixing Representations - feature extraction

22

Li & Hoiem. 2016.

23 of 71

Mixing Representations - LwF

23

Li & Hoiem. 2016.

ECCV’16

24 of 71

Mixing Representations

  • Pros and cons of each method:

24

Li & Hoiem. 2016.

25 of 71

Mixing Representations

  • Empirical study:

25

Li & Hoiem. 2016.

26 of 71

Curriculum Learning

For faster convergence, better minima, (and mixing representations)

26

Bengio et al. 2009.

27 of 71

Curriculum Learning

Guided learning helps training humans and animals

Shaping

Start from simpler examples / easier tasks (Piaget 1952, Skinner 1958)

Education

Bengio et al. 2009. slides and paper credit.

28 of 71

The Dogma in question

It is best to learn from a training set of examples sampled from the same distribution as the test set. Really?

Bengio et al. 2009. slides and paper credit.

29 of 71

Question

Can machine learning algorithms benefit from a curriculum strategy?

Cognition journal:

(Elman 1993) vs (Rohde & Plaut 1999),

(Krueger & Dayan 2009)

Bengio et al. 2009. slides and paper credit.

30 of 71

Convex vs Non-Convex Criteria

  • Convex criteria: the order of presentation of examples should not matter to the convergence point, but could influence convergence speed
  • Non-convex criteria: the order and selection of examples could yield to a better local minimum

Bengio et al. 2009. slides and paper credit.

31 of 71

Deep Architectures

  • Theoretical arguments: deep architectures can be exponentially more compact than shallow ones representing the same function
  • Cognitive and neuroscience arguments
  • Many local minima
  • Good candidate for testing curriculum ideas

Bengio et al. 2009. slides and paper credit.

32 of 71

Deep Training Trajectories

Random initialization

Unsupervised guidance

(Erhan et al. AISTATS 09)

Bengio et al. 2009. slides and paper credit.

33 of 71

Starting from Easy Examples

3

  • Most difficult examples
  • Higher level abstractions

2

1

  • Easiest
  • Lower level

abstractions

Bengio et al. 2009. slides and paper credit.

34 of 71

Continuation Methods

Target objective

Heavily smoothed objective = surrogate criterion

Track local minima

Final solution

Easy to find minimum

Bengio et al. 2009. slides and paper credit.

35 of 71

Curriculum Learning as Continuation

  • Sequence of training distributions
  • Initially peaking on easier / simpler ones
  • Gradually give more weight to more difficult ones until reach target distribution

3

  • Most difficult examples
  • Higher level abstractions

2

1

  • Easiest
  • Lower level

abstractions

Bengio et al. 2009. slides and paper credit.

36 of 71

How to order examples?

  • The right order is not known
  • 3 series of experiments:
    • Toy experiments with simple order
      • Larger margin first
      • Less noisy inputs first
    • Simpler shapes first, more varied ones later
    • Smaller vocabulary first

Bengio et al. 2009. slides and paper credit.

37 of 71

Larger Margin First: Faster Convergence

Bengio et al. 2009. slides and paper credit.

38 of 71

Cleaner First: Faster Convergence

Bengio et al. 2009. slides and paper credit.

39 of 71

Shape Recognition

First: easier, basic shapes

Second = target: more varied geometric shapes

Bengio et al. 2009. slides and paper credit.

40 of 71

Shape Recognition Experiment

  • 3-hidden layers deep net known to involve local minima (unsupervised pre-training finds much better solutions)
  • 10 000 training / 5 000 validation / 5 000 test examples
  • Procedure:
    • Train for k epochs on the easier shapes
    • Switch to target training set (more variations)

Bengio et al. 2009. slides and paper credit.

41 of 71

Shape Recognition Results

k

Bengio et al. 2009. slides and paper credit.

42 of 71

Why?

  • Faster convergence to a minimum
  • Wasting less time with noisy or harder to predict examples
  • Convergence to better local minima

Curriculum = particular continuation method

    • Finds better local minima of a non-convex training criterion
    • Like a regularizer, with main effect on test set

Bengio et al. 2009. slides and paper credit.

43 of 71

This lecture

  • Finishing up GANs
  • Brief overview of representation understanding methods
  • Generic Representations

43

44 of 71

Energy-Based GANs

  • An energy-based formulation for discriminator
    • Auto-encoder instantiating

44

Zhao et al. 2016.

45 of 71

Generative Adversarial Networks

45

Goodfellow et al. 2014.

46 of 71

Generative Adversarial Networks�

46

Goodfellow et al. 2014.

Kevin McGuinness. 2016.

47 of 71

Generative Adversarial Networks�

47

Goodfellow et al. 2014.

48 of 71

Generative Adversarial Networks�

48

Goodfellow et al. 2014.

MNIST

CIFAR10

CIFAR100

TFD

49 of 71

Generative Adversarial Networks�

  • Why does this matter?

49

Goodfellow et al. 2014.

MNIST

CIFAR10

CIFAR100

TFD

50 of 71

Problems with GAN

  • Diversity
    • Generator may overfit to certain modes in the data
  • Stability
    • Hard to train. Sensitive to choice of hyperparameters and status of Discriminator & Generator
  • Evaluation
    • How to evaluate the generated results?

50

Goodfellow et al. 2014.

51 of 71

Energy-Based GANs

  • An energy-based formulation for discriminator
    • Auto-encoder instantiating
  • Repelling Regularizer (Pull-Away). Minibatch Discrimination is an alternative.
  • Better behaved, more diverse results.

51

Zhao et al. 2016.

52 of 71

EBGAN vs GAN (on MNIST)

52

Zhao et al. 2016.

GAN

EBGAN

EBGAN with PT term

53 of 71

EBGAN vs GAN (on LSUN)

53

Zhao et al. 2016.

GAN

EBGAN

54 of 71

EBGAN vs GAN (on CelebA)

54

Zhao et al. 2016.

GAN

EBGAN

55 of 71

EBGAN (on ImageNet)

55

Zhao et al. 2016.

56 of 71

EBGAN (on ImageNet)

  • Remember the main objective: learning an arbitrarily complex distribution of pixels (i.e. visual worlds)

56

Zhao et al. 2016.

57 of 71

A use case of such distribution: “Generative Visual Manipulation on the Natural Image Manifold”

  • Editing images while remaining on the natural image manifold
    • i.e. preserving realism
  • Learns distribution of real data using GAN. Defines a set of edits and constrain the output to fall on the manifold.

57

Zhu et al. 2016.

58 of 71

58

Zhu et al. 2016.

59 of 71

59

Zhu et al. 2016.

60 of 71

Understanding and Probing Representations

(very brief executive summary)

60

61 of 71

Understanding Representations�

  • Why?!
  • Tools:
    • Nearest neighbors in full dimensional space
    • Low-dimensional embeddings
    • Readout function
    • Inverting the representation (remember Hoggles?)
    • (Discussed in upcoming lectures:)
      • Minimal Image (what matters in an image)
      • Receptive field (what matters to a neuron)
      • Images maximally activating a neuron
      • Neuron activation maps
      • Visualization learned filters

61

62 of 71

Nearest neighbors in full dimensional space�

62

Query |

Nearest Neighbors

Krizhevsky et al. 2012.

Deng et al. 2009.

63 of 71

Nearest neighbors in full dimensional space�

63

Zamir et al. 2016

Zamir et al.

Wang & Gupta. 2015

Agrawal et al. 2015

Krizhevsky (Imagenet), 2015

64 of 71

Low-dimensional embeddings�

  • 6000 MNIST Digits
    • tSNE
    • Isomap
    • Sammon M
    • LLE

64

Van der Maaten & Hinton. 2008

65 of 71

Low-dimensional embeddings�

  • tSNE

65

Van der Maaten & Hinton. 2008

66 of 71

Van der Maaten & Hinton. 2008

67 of 71

Inverting the representation

67

Simonyan et al. 2014

Class appearance models (ImageNet)

Image

Top-1 class saliency map

Thresholded saliency map (for segmentation)

Foreground Segment

68 of 71

Inverting the representation

  • Supervised

68

Dala & Triggs. 2005.

Vondrick et al. 2013.

Mahendran & Vadaldi. 2016.

Dosovitskiy & Brox. 2016.

Dosovitskiy & Brox

Hoggles

HOG^-1

Mahendran

HOG

Image

69 of 71

Inverting the representation

  • Supervised

69

Dosovitskiy & Brox. 2016.

Mahendran & Vadaldi. 2016.

Hinton et al. 2006.

Dosovitskiy & Brox

Mahendran & Vedaldi

AE

70 of 71

Understanding Representations�

    • Nearest neighbors in full dimensional space
    • Low-dimensional embeddings
    • Read-out function
    • Inverting the representation
    • Minimal Image (what matters in an image)
    • Receptive field (what matters to a neuron)
    • Images maximally activating a neuron
    • Neuron activation maps
    • Visualization learned filters

70

71 of 71