CSE 5524: �Transfer learning & Stereo
1
HW 3 & HW 4 & quizzes
Today (37 & 40)
3
Recap: Domain gap
Domain gap
Recap: Domain gap
Recap: Data augmentation
Recap: Data augmentation
Today (37 & 40)
8
Data is important …
The existence of domain gaps implies that we need to “re-collect” training data
product images
ImageNet
web images
Data is important …
The existence of new tasks implies that we need to “re-collect” training data
Bird species
Dog breeds
Car brands & styles
Neural networks are data hungry…
Sufficient labeled data
ImageNet-1K (ILSVRC)
1,000 object classes
1,000 training images per class
Humans do not re-learn from scratch …
Transfer learning
Transfer learning
Transfer learning
How to achieve “adaptation”?
Different “distributions”
Different “labels”
Pre-training and fine-tuning
Probably don’t need to change a lot!
CNN revisit
17
Pre-training and fine-tuning
Probably don’t need to change a lot!
Pre-training and fine-tuning
Pre-training and fine-tuning
Algorithm
Questions?
22
This paradigm “pre-training + fine-tuning” is everywhere
This paradigm “pre-training + fine-tuning” is everywhere
This paradigm “pre-training + fine-tuning” is everywhere
25
Ficus auriculata
Onoclea
sensibilis
Onoclea hintonii
Plantae | Tracheophyta | Polpodiopsida | Polypodiales | Onocleaceae | Onoclea | sensibilis |
Kingdom
Phylum
Class
Order
Family
Genus
Species
Plantae | Tracheophyta | Polpodiopsida | Polypodiales | Onocleaceae | Onoclea | hintonii |
Plantae | Tracheophyta | Rosids | Rosales | Moraceae | Ficus | F. auriculata |
Autoregressive text representation
| | |
| | |
| | |
Vision encoder
CLIP (Contrastive Language-Image Pretraining) matches images to text,
This paradigm “pre-training + fine-tuning” is everywhere
Fine-tuning a subset of neural networks
Feature encoder
Prediction head
Feature vector
Image
Label
Fine-tuning a subset of neural networks
Feature encoder
Feature vector
Image
Label
Prediction head
Fine-tuning a subset of neural networks
Feature encoder
Feature vector
Image
Label
Prediction head
Fine-tuning a subset of neural networks
Feature encoder
Feature vector
Image
Label
Prediction head
Parameter efficient fine-tuning
MLP
MLP - new
MLP
MLP - new
Parameter efficient fine-tuning
MLP
MLP - new
MLP
MLP - new
K
Q
V
key, query, value
“learnable” matrices
K’
K
LoRA: Low-Rank Adaptation
Parameter efficient fine-tuning
Questions?
34
Learning from a teacher
Learning from a teacher: knowledge distillation
Learning from a teacher: knowledge distillation
Learning from a teacher: knowledge distillation
Learning from a teacher: knowledge distillation
Teachers could:
Questions?
40
Prompting
Prompting
Prompting
Prompting
Prompting
Prompting
Prompting
Visual Prompt Tuning
48
Visual Prompt Tuning [Menglin Jia et al.]
Questions?
49
3D reconstruction
Depth estimation and 3D reconstruction
3D reconstruction
3D reconstruction
Stereo depth estimation
53
=
Il
Ir
D
Z
disparity
depth
Focal length Baseline
Stereo depth estimation
54
Disparity Map
Left
Right
disparity
Depth Map
Stereo depth estimation
55
Left
Right
Similarity
Disparity
Stereo depth estimation
56
Left
Right
Neural
Net
Prob.
Similarity
Disparity
Pyramid stereo matching network (PSMNet)
57
[Chang et al., Pyramid stereo matching network, 2018]
General architecture
Cost volume
Left feature map
Left feature map
General architecture
Left feature map
Left feature map
Cost volume
Continuous Disparity Network
60
Left
Right
Neural
Net
Prob.
Probability
Disparity
[Garg et al., Wasserstein Distances for Stereo Disparity Estimation, 2020]
Continuous Disparity Network
61
Left
Right
Neural
Net
Prob.
Offset
Probability
Disparity
Output disparity
= shifted mode
[Garg et al., Wasserstein Distances for Stereo Disparity Estimation, 2020]
62
Without Offset
With Offset
Output disparity
= mode
Probability
Disparity
Output disparity
= shifted mode
Probability
Disparity
0
0
Questions?
63
What if we have more images?
What if we have more images?
Can we synthesize images from other views?