2 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

3 of 89

Recap of challenges

Problems:

Limited “labeled” training data
Mismatch between training and test data

4 of 89

Mismatch between training/test data

Supervised learning
(Unknown) distribution: of (x, y): P(x, y) = P(y)P(x | y)
Goal: find the model h: X🡪Y, to minimize E_{P(x, y)}[loss(f(x), y)]

Training data: D_tr={(x_n, y_n)} ~ P(x, y) 🡪 learn h from D_tr
Test data: D_te={(x_m, y_m)} ~ P(x, y) 🡪evaluate sum_m (loss(h(x_m), y_m))

5 of 89

Mismatch between training/test data

product images

ImageNet

web images

6 of 89

Data collection bias

Credits: Rogerio Feris, ICCV-2019 slides

7 of 89

Recap of challenges

Problems:

Limited “labeled” training data
Mismatch between training and test data

Potential solutions:

Transfer learning, meta learning (among data resources, modalities)
Domain adaptation (adapt the model/training data to test data)
Imbalanced learning (balanced, transfer among classes)
Semi-supervised learning (leverage unlabeled data)
Self-supervised learning (learn from unlabeled data)
Generating pseudo data

8 of 89

Transfer learning

In general, P_tr(x, y) ≠ P_te(x, y) and P_tr(y) P_tr(x | y) ≠ P_te(y) P_te(x | y)
We are given “limited” information (e.g., data) from the test task
We cannot train accurate models using such limited information

Different joint distributions
Different tasks
Different output spaces (e.g., vehicles vs. animals)
Different output distributions
Different input modalities
Different input spaces (e.g., product images vs. daily images)

9 of 89

Transfer learning

Fine-tuning:

Anchor

Weather

Reporter

ImageNet

10 of 89

Transfer learning

Between classes and modalities

leopard cat

leopard

cheetah

bobcat

clouded

leopard

cat

wildcat

11 of 89

Transfer learning

Between classes and modalities

leopard cat

cat

leopard

Felidae

small sized

black spot

……

Felidae

small sized

stripe

……

Felidae

large sized

black spot

……

12 of 89

Domain adaptation

Mainly about input data: Basic version P_tr(x) ≠ P_te(x) or P_tr(x | y) ≠ P_te(x | y)
Conventional naming

Training: source domain
Testing: target domain

Mostly unsupervised in the “target”
Some supervised

[You al., 2019]

13 of 89

Domain adaptation

Data

reweighting

14 of 89

Domain adaptation

Feature

transform

15 of 89

Domain adaptation

[Saenko al., 2019]

16 of 89

Key words

Inductive vs. Transductive setting

18 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

19 of 89

Basic setup of domain adaptation

20 of 89

Basic setup of domain adaptation

Credits: Hoffman 2019 ICCV tutorial

21 of 89

Domain adaptation (DA)

Need to know something about the “test” data
Source domain (SD):
Ample labeled data

Target domain (TD):
Ample unlabeled data (UDA) + limited labeled data (SDA)

Testing:

Inductive

Transductive

22 of 89

Worth of thinking

How much can “limited” labeled data from TD help?

How much is the difference between inductive/transductive setting?

24 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

25 of 89

Theoretical perspective

Ben-David et al, 2010

Hypothesis class, problem difficulty

Error in the source domain

Discrepancy in distribution

26 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

27 of 89

Algorithm

Goal: minimize the domain mismatch
How to quantify domain mismatch
How to match? feature transform or re-weighting

28 of 89

Algorithm

Feature transform or re-weighting

29 of 89

Quantifying domain mismatch

First thoughts: estimate the distributions and then their distance

Estimating the distribution is challenging, especially in high dimensionality.

Can we avoid estimate the distribution/density?
Moment matching
Matching the mean and variance

30 of 89

Maximum Mean Discrepancy (MMD)

MMD: Fortes and Mourier, 1953

Let’s assume we know

We want to minimize MMD

31 of 89

Maximum Mean Discrepancy (MMD)

32 of 89

Re-weighting to “minimize” MMD

33 of 89

Transforms to “minimize” MMD

37 of 89

MMD: before DNN

Credits: Hoffman 2019 ICCV tutorial

38 of 89

MMD: DNN transform

Credits: Hoffman 2019 ICCV tutorial

39 of 89

Credits: Hoffman 2019 ICCV tutorial

Can we train with domain loss first then classification loss?

40 of 89

Credits: Hoffman 2019 ICCV tutorial

(Deep) CORAL: Correlation Alignment

41 of 89

Geodesic flow kernel (GFK)

Gong et al, CVPR 2012

Kernel machines

43 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

44 of 89

Domain mismatch

So far, we are not optimizing

We can do so by adversarial learning!
We can even go beyond certain forms of “discrepancy” by learning a domain classifier

45 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

46 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

47 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

48 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

Binary classification:�SD: +1

TD: -1

49 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

Binary classification:�SD: +1

TD: -1

50 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

The devils in the details!

Be aware of trivial solutions

51 of 89

Recap: Adversarial learning

Task-specific classification loss (source):

Domain discriminator (source vs. target):

Domain matching/alignment (source vs. target):

Credits: Hoffman 2019 ICCV tutorial

52 of 89

Additional objectives

Adversarial + reconstruction: domain-invariant and domain-specific features

Preventing losing too much information in feature matching

[Li et al., CVPR 2018]

53 of 89

Challenges

How to do model selection, architecture/hyper-parameter tuning?
In UDA, no labeled training data is available from TD

55 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

56 of 89

Problems

[Saito et al., CVPR 2018] Credits: Hoffman 2019 ICCV tutorial

After domain alignment, target instances are still misclassified.

57 of 89

Alignment is not aware of the task-specific boundary

[Saito et al., CVPR 2018] Credits: Saenko 2019 ICCV tutorial

58 of 89

Solution: change the alignment objective

[Saito et al., CVPR 2018] Credits: Saenko 2019 ICCV tutorial

Find target instances that are far from source instances and near the task-specific decision boundary, which are likely to be misaligned and misclassified.
Train the feature generator to push target instances away from these regions.

59 of 89

Maximum classifier discrepancy (MCD)

[Saito et al., CVPR 2018]

Train multiple task-specific classifiers and use their disagreement to detect target instances that are uncertain and are to be adapted
Train the feature generator to minimize the disagreement
Entangle the task-specific classifier with the domain classifier

60 of 89

Maximum classifier discrepancy (MCD)

[Saito et al., CVPR 2018]

61 of 89

Maximum classifier discrepancy (MCD)

Training steps:
Step A:

Step B:

Step C:

62 of 89

Recap: Theoretical perspective

[Ben-David et al., 2010]

Hypothesis class, problem difficulty

Training loss

Discrepancy in distribution

63 of 89

Maximum classifier discrepancy (MCD)

[Saito et al., CVPR 2018]

64 of 89

Experimental results

66 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

67 of 89

[Zhu et al., ICCV 2017]

maximize w.r.t. D_Y

minimize w.r.t. G

68 of 89

To encourage meaningful transformation!

71 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

72 of 89

Domain adaptation (DA)

Need to know something about the “test” data
Source domain (SD):
Ample labeled data

Target domain (TD):
Ample unlabeled data (UDA) + limited labeled data (SDA)

Testing:

73 of 89

Self-training

74 of 89

Self-training

77 of 89

Why does pre-training work?

Sharpening the label (from soft to hard labels)

Like the clustering assumption!

Fine-tuning the feature extractor to the target domain in a supervised way

Learned models can discover the underlying patterns

79 of 89

Recap: Domain adaptation

Components:

Quantify the domain mismatch
Minimize the mismatch

Algorithms:

MMD (maximum mean discrepancy)
Domain adversarial learning
MCD (maximum classifier discrepancy)
Cycle-consistency

80 of 89

Deep MMD

Credits: Hoffman 2019 ICCV tutorial

81 of 89

Domain adversarial learning

Credits: Hoffman 2019 ICCV tutorial

82 of 89

Maximum classifier discrepancy (MCD)

[Saito et al., CVPR 2018]

83 of 89

Maximum classifier discrepancy (MCD)

[Saito et al., CVPR 2018]

84 of 89

Cycle consistency

[Zhu et al., ICCV 2017]

Can we mess up the correspondence between SD and TD?

85 of 89

Semantic correspondence

86 of 89

Semantic correspondence

87 of 89

Semantic correspondence

Machines may find unreasonable shortcuts to minimize the training loss!

[Tzeng et al., 2017]

88 of 89

Today

Recap: challenges of ML and CV
Overview of domain adaptation (DA)
Theoretical points of view
Algorithms

MMD
Adversarial learning
Beyond marginal distribution alignment: classifier discrepancy
Cycle-consistency
Self-training

Beyond classification

1 of 89

2 of 89

3 of 89

4 of 89

5 of 89

6 of 89

7 of 89

8 of 89

9 of 89

10 of 89

11 of 89

12 of 89

13 of 89

14 of 89

15 of 89

16 of 89

17 of 89

18 of 89

19 of 89

20 of 89

21 of 89

22 of 89

23 of 89

24 of 89

25 of 89

26 of 89

27 of 89

28 of 89

29 of 89

30 of 89

31 of 89

32 of 89

33 of 89

34 of 89

35 of 89

36 of 89

37 of 89

38 of 89

39 of 89

40 of 89

41 of 89

42 of 89

43 of 89

44 of 89

45 of 89

46 of 89

47 of 89

48 of 89

49 of 89

50 of 89

51 of 89

52 of 89

53 of 89

54 of 89

55 of 89

56 of 89

57 of 89

58 of 89

59 of 89

60 of 89

61 of 89

62 of 89

63 of 89

64 of 89

65 of 89

66 of 89

67 of 89

68 of 89

69 of 89

70 of 89

71 of 89

72 of 89

73 of 89

74 of 89

75 of 89

76 of 89

77 of 89

78 of 89

79 of 89

80 of 89