1 of 89

CSE 5539: �Domain Adaptation

2 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training

3 of 89

Recap of challenges

Problems:

  • Limited “labeled” training data
  • Mismatch between training and test data

4 of 89

Mismatch between training/test data

  • Supervised learning
  • (Unknown) distribution: of (x, y): P(x, y) = P(y)P(x | y)
  • Goal: find the model h: X🡪Y, to minimize EP(x, y) [loss(f(x), y)]

  • Training data: Dtr={(xn, yn)} ~ P(x, y) 🡪 learn h from Dtr
  • Test data: Dte={(xm, ym)} ~ P(x, y) 🡪evaluate summ (loss(h(xm), ym))

5 of 89

Mismatch between training/test data

product images

ImageNet

web images

6 of 89

Data collection bias

Credits: Rogerio Feris, ICCV-2019 slides

7 of 89

Recap of challenges

Problems:

  • Limited “labeled” training data
  • Mismatch between training and test data

Potential solutions:

  • Transfer learning, meta learning (among data resources, modalities)
  • Domain adaptation (adapt the model/training data to test data)
  • Imbalanced learning (balanced, transfer among classes)
  • Semi-supervised learning (leverage unlabeled data)
  • Self-supervised learning (learn from unlabeled data)
  • Generating pseudo data

8 of 89

Transfer learning

  • In general, Ptr(x, y) ≠ Pte(x, y) and Ptr(y) Ptr(x | y) ≠ Pte(y) Pte(x | y)
  • We are given “limited” information (e.g., data) from the test task
  • We cannot train accurate models using such limited information

  • Different joint distributions
  • Different tasks
  • Different output spaces (e.g., vehicles vs. animals)
  • Different output distributions
  • Different input modalities
  • Different input spaces (e.g., product images vs. daily images)

9 of 89

Transfer learning

  • Fine-tuning:

Anchor

Weather

Reporter

ImageNet

10 of 89

Transfer learning

  • Between classes and modalities

leopard cat

leopard

cheetah

bobcat

clouded

leopard

cat

wildcat

11 of 89

Transfer learning

  • Between classes and modalities

leopard cat

cat

leopard

Felidae

small sized

black spot

……

Felidae

small sized

stripe

……

Felidae

large sized

black spot

……

12 of 89

Domain adaptation

  • Mainly about input data: Basic version Ptr(x) ≠ Pte(x) or Ptr(x | y) ≠ Pte(x | y)
  • Conventional naming
    • Training: source domain
    • Testing: target domain
  • Mostly unsupervised in the “target”
  • Some supervised

[You al., 2019]

13 of 89

Domain adaptation

Data

reweighting

14 of 89

Domain adaptation

Feature

transform

15 of 89

Domain adaptation

[Saenko al., 2019]

16 of 89

Key words

  • Inductive vs. Transductive setting

17 of 89

Questions?

18 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training

19 of 89

Basic setup of domain adaptation

20 of 89

Basic setup of domain adaptation

Credits: Hoffman 2019 ICCV tutorial

21 of 89

Domain adaptation (DA)

  • Need to know something about the “test” data
  • Source domain (SD):
  • Ample labeled data

  • Target domain (TD):
  • Ample unlabeled data (UDA) + limited labeled data (SDA)

  • Testing:

Inductive

or

Transductive

22 of 89

Worth of thinking

  • How much can “limited” labeled data from TD help?

  • How much is the difference between inductive/transductive setting?

23 of 89

Questions?

24 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training

25 of 89

Theoretical perspective

Ben-David et al, 2010

Hypothesis class, problem difficulty

Error in the source domain

Discrepancy in distribution

26 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training

27 of 89

Algorithm

  • Goal: minimize the domain mismatch
  • How to quantify domain mismatch
  • How to match? feature transform or re-weighting

28 of 89

Algorithm

  • Feature transform or re-weighting

29 of 89

Quantifying domain mismatch

  • First thoughts: estimate the distributions and then their distance

  • Estimating the distribution is challenging, especially in high dimensionality.

  • Can we avoid estimate the distribution/density?
  • Moment matching
  • Matching the mean and variance

30 of 89

Maximum Mean Discrepancy (MMD)

MMD: Fortes and Mourier, 1953

Let’s assume we know

We want to minimize MMD

31 of 89

Maximum Mean Discrepancy (MMD)

-

32 of 89

Re-weighting to “minimize” MMD

33 of 89

Transforms to “minimize” MMD

34 of 89

Why?

35 of 89

Recap: MMD

  •  

36 of 89

Questions?

37 of 89

MMD: before DNN

Credits: Hoffman 2019 ICCV tutorial

38 of 89

MMD: DNN transform

Credits: Hoffman 2019 ICCV tutorial

39 of 89

Credits: Hoffman 2019 ICCV tutorial

Can we train with domain loss first then classification loss?

40 of 89

Credits: Hoffman 2019 ICCV tutorial

(Deep) CORAL: Correlation Alignment

41 of 89

Geodesic flow kernel (GFK)

Gong et al, CVPR 2012

Kernel machines

42 of 89

Questions?

43 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training

44 of 89

Domain mismatch

  • So far, we are not optimizing

  • We can do so by adversarial learning!
  • We can even go beyond certain forms of “discrepancy” by learning a domain classifier

45 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

46 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

47 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

48 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

Binary classification:�SD: +1

TD: -1

49 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

Binary classification:�SD: +1

TD: -1

50 of 89

Adversarial learning

Credits: Hoffman 2019 ICCV tutorial

The devils in the details!

Be aware of trivial solutions

51 of 89

Recap: Adversarial learning

  • Task-specific classification loss (source):

  • Domain discriminator (source vs. target):

  • Domain matching/alignment (source vs. target):

Credits: Hoffman 2019 ICCV tutorial

52 of 89

Additional objectives

  • Adversarial + reconstruction: domain-invariant and domain-specific features

  • Preventing losing too much information in feature matching

[Li et al., CVPR 2018]

53 of 89

Challenges

  • How to do model selection, architecture/hyper-parameter tuning?
  • In UDA, no labeled training data is available from TD

54 of 89

Questions?

55 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training

56 of 89

Problems

[Saito et al., CVPR 2018] Credits: Hoffman 2019 ICCV tutorial

  • After domain alignment, target instances are still misclassified.

57 of 89

Alignment is not aware of the task-specific boundary

[Saito et al., CVPR 2018] Credits: Saenko 2019 ICCV tutorial

58 of 89

Solution: change the alignment objective

[Saito et al., CVPR 2018] Credits: Saenko 2019 ICCV tutorial

  • Find target instances that are far from source instances and near the task-specific decision boundary, which are likely to be misaligned and misclassified.
  • Train the feature generator to push target instances away from these regions.

59 of 89

Maximum classifier discrepancy (MCD)

-

+

-

+

[Saito et al., CVPR 2018]

  • Train multiple task-specific classifiers and use their disagreement to detect target instances that are uncertain and are to be adapted
  • Train the feature generator to minimize the disagreement
  • Entangle the task-specific classifier with the domain classifier

60 of 89

Maximum classifier discrepancy (MCD)

[Saito et al., CVPR 2018]

61 of 89

Maximum classifier discrepancy (MCD)

  • Training steps:
  • Step A:

  • Step B:

  • Step C:

62 of 89

Recap: Theoretical perspective

[Ben-David et al., 2010]

Hypothesis class, problem difficulty

Training loss

Discrepancy in distribution

63 of 89

Maximum classifier discrepancy (MCD)

-

+

-

+

[Saito et al., CVPR 2018]

64 of 89

Experimental results

65 of 89

Questions?

66 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training

67 of 89

[Zhu et al., ICCV 2017]

maximize w.r.t. DY

minimize w.r.t. G

68 of 89

To encourage meaningful transformation!

69 of 89

70 of 89

Questions?

71 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training

72 of 89

Domain adaptation (DA)

  • Need to know something about the “test” data
  • Source domain (SD):
  • Ample labeled data

  • Target domain (TD):
  • Ample unlabeled data (UDA) + limited labeled data (SDA)

  • Testing:

73 of 89

Self-training

  •  

74 of 89

Self-training

  •  

75 of 89

Results

76 of 89

Results

77 of 89

Why does pre-training work?

  • Sharpening the label (from soft to hard labels)
    • Like the clustering assumption!

  • Fine-tuning the feature extractor to the target domain in a supervised way

  • Learned models can discover the underlying patterns

78 of 89

Questions?

79 of 89

Recap: Domain adaptation

Components:

  • Quantify the domain mismatch
  • Minimize the mismatch

Algorithms:

  • MMD (maximum mean discrepancy)
  • Domain adversarial learning
  • MCD (maximum classifier discrepancy)
  • Cycle-consistency

80 of 89

Deep MMD

Credits: Hoffman 2019 ICCV tutorial

81 of 89

Domain adversarial learning

Credits: Hoffman 2019 ICCV tutorial

82 of 89

Maximum classifier discrepancy (MCD)

-

+

-

+

[Saito et al., CVPR 2018]

83 of 89

Maximum classifier discrepancy (MCD)

[Saito et al., CVPR 2018]

84 of 89

Cycle consistency

[Zhu et al., ICCV 2017]

Can we mess up the correspondence between SD and TD?

85 of 89

Semantic correspondence

86 of 89

Semantic correspondence

87 of 89

Semantic correspondence

Machines may find unreasonable shortcuts to minimize the training loss!

[Tzeng et al., 2017]

88 of 89

Today

  • Recap: challenges of ML and CV
  • Overview of domain adaptation (DA)
  • Theoretical points of view
  • Algorithms
    • MMD
    • Adversarial learning
    • Beyond marginal distribution alignment: classifier discrepancy
    • Cycle-consistency
    • Self-training
  • Beyond classification

89 of 89

Domain adaptation for semantic segmentation