CSE 5524: �Representation learning
1
HW 3
Final project (30%)
Today (30)
4
Representation learning
The usefulness of representation
6
Retrieval, image-to-image search
The representation learning setup
Neural net
Encoder
Image
The representation learning setup
The representation learning setup
What makes a good representation?
Neural net
Encoder
An overview
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Autoencoder
Objective function:
Design principle: z with lower dimensionality
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Autoencoder
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Example
Training data
Nearest neighbor search
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Predictive encoding
Pretext tasks:
A good representation should solve these tasks well
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Predictive encoding
Pretext tasks:
A good representation should solve these tasks well
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Predictive encoding
Pretext tasks:
A good representation should solve these tasks well
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Predictive encoding
Pretext tasks:
A good representation should solve these tasks well
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Predictive encoding
Encoder
[Noroozi et al., 2016]
Predictive encoding
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Object detectors emerge from scene recognition
Self-supervised learning
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Self-supervised learning: context encoder
[Pathak et al., CVPR 2016]
Self-supervised learning: masked autoencoder (MAE)
[He et al., CVPR 2022]
Imputation
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Imputation
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Clustering
Clustering
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
K-means
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
K-means
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
K-means
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
K-means
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
K-means
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
K-means + deep learning
K-means + deep learning
Question: what do different algorithms learn?
Contrastive learning
Data transformation
37
[Chen et al., A Simple Framework for Contrastive Learning of Visual Representations, ICML 2020]
Contrastive learning
Contrastive learning: transformation
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Contrastive learning: co-occurence
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Contrastive learning
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Contrastive learning: exemplar losses
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
43
[Google AI blog, Advancing Self-Supervised and Semi-Supervised Learning with SimCLR]
Potential final project
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Section 30.10.2
Figure 30.13
Research example
Rethinking pre-training for object detector
Goal: Bridge the input point cloud and output bounding boxes
Our idea: Help the backbone cluster points into object instances or parts
46
Feature
backbone
Detection
head
Our proposal: leverage color information
Motivation: each object instance or part often possesses a coherent color and has a sharp contrast to the background
Hypothesis: Learning to colorize LiDAR point clouds would equip the backbone with the semantic cues to segment points
47
Region growing [Preetha et al., 2012]
Superpixel [Liu et al., 2011]
Our solution: “grounded” point colorization (GPC)
Provide colors of a subset of points as hints
48
Our solution: “grounded” point colorization (GPC)
Provide colors of a subset of points as hints
49
GPC Pre-training + fine-tuning
50
color
decoder
backbone
feature
concatenate
hints
……
N points
……
Output embeddings indicate which points should be colored/segmented together
Pre-training = Point-wise color regression problem
GPC Pre-training + fine-tuning
51
backbone
feature
……
N points
……
color
decoder
concatenate
hints
Pre-training = Point-wise color regression problem
detection
head
Output embeddings indicate which points should be colored/segmented together
GPC Pre-training + fine-tuning
52
backbone
feature
color
decoder
detection
head
pre-training
fine-tuning
concat
hints
Pre-Training LiDAR-Based 3D Object Detectors Through Colorization, ICLR 2024
What does GPC really learn?
Predict the exact color of each point
Predict which points possess the same color and are segmented together
53