CS294-158 Deep Unsupervised Learning
Lecture 7 Self-Supervised Learning
Pieter Abbeel, Wilson Yan, Kevin Frans, Philipp Wu
Reminder: Representations Matter
2
Goodfellow
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Depth often refines representations
3
Goodfellow
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Today
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
What is Self-Supervised Learning?
5
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Motivation: LeCake
6
Yann LeCun’s cake
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Motivation
7
Yann LeCun’s cake
Slide: LeCun
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Outline
8
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Denoising Autoencoder
9
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Denoising Autoencoder
10
Vincent et al 2010
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Denoising Autoencoder
11
Vincent et al 2010
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Denoising Autoencoder
12
Vincent et al 2010
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Emphasizing corrupted dimensions
13
Vincent et al 2010
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Stacked Denoising Autoencoder
14
Vincent et al 2010
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Denoising Autoencoder
15
Vincent et al 2010
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Diffusion Models
EmerDiff: pixel level segmentation masks from diffusion models – query vectors as features (see Lecture 6)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predict missing pieces
17
Pathak et al 2016
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Context Encoders
18
Pathak et al 2016
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Context Encoders
19
Pathak et al 2016
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Context Encoders
20
Pathak et al 2016
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Context Encoders
21
Pathak et al 2016
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Context Encoders
22
Pathak et al 2016
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Context Encoders
23
Pathak et al 2016
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
24
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
25
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
26
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
27
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
28
Ground Truth
L2 regression
Pixelwise classification
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
29
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
30
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
31
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting one view from another
32
Slide: Richard Zhang
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Temporal coherence of color
33
Slide: Zisserman
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Tracking emerges from colorization
34
GIFs from Google AI Blog post
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MAE
Nov, 2021
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MAE
Architecture: Vision Transformer (ViT)
BIG
small
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MAE on ImageNet validation images
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MAE on CoCo validation images
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Masking Ratio
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Comparison with Prior SOTA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MAE Cousins / Derivatives
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BEIT
[June 2021 / Sep 2022]
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BEIT Architecture
discreteVAE tokenizer
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
VideoMAE
[Oct, 2022]
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
VideoMAE Architecture
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Observations
– High masking ratio: 90% to 95%
– Impressive results even on very small datasets, e.g. 3k videos
– Data quality is more important than data quantity for Self Supervised Video Pretraining. Domain shift between pre-training and target datasets is an important factor.
– VideoMAE with the vanilla ViT backbone can achieve 87.4% on Kinects-400, 75.4% on SomethingSomething V2, 91.3% on UCF101, and 62.6% on HMDB51, without using any extra data.
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Experiments on Something-Something V2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Experiments on Kinetics 400
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Siam MAE
[May 2023]
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SiamMAE: Architecture
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SiamMAE: Key idea
– By masking a large fraction (95%) of patches in the future frame while leaving the past frame unchanged, SiamMAE encourages the network to focus on object motion and learn object-centric representations.
– SiamMAE outperform state-of-the-art self-supervised methods on video object segmentation, pose keypoint propagation, and semantic part propagation tasks
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Audio-MAE
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Audio-MAE
– Encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.
– The decoder then re-orders and decodes the encoded context padded with mask tokens, in order to reconstruct the input spectrogram.
– Local window attention in the decoder, as audio spectrograms are highly correlated in local time and frequency bands.
– Fine-tune the encoder with a lower masking ratio on target datasets.
– Empirically, Audio-MAE sets new state-of-the-art performance on six audio and speech classification tasks, outperforming other recent models that use external supervised pre-training.
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Audio-MAE: Architecture
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Audio-MAE: Architecture
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MultiMAE
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MultiMAE observations
– like MAE, encoder only processes non-masked tokens
– like MAE, shallow decoders
– pseudolabels for non-RGB modalities
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MultiMAE Experiments
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MultiMAE Experiments
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
M3AE: MultiModal MAE
[Oct 2022]
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
M3AE Contributions
– Until M3AE: dominant multi-modal representation learning paradigm was contrastive learning (CLIP, ALIGN)
– Downside of cross-modal contrastive: only works with paired data
– We find that multimodal pretraining of M3AE on CC12M achieves significantly higher performance on the ImageNet-1k linear classification benchmark [33] compared to pre-training on images only (MAE).
– M3AE performs best when we apply a high mask ratio (75%) on language, while in contrast, language models like BERT conventionally use a low mask ratio (15%)
– Encoder: image patches and language tokens, ViT
– Decoder: light weight, following MAE
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
M3AE: Architecture
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Comparison with MAE
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multiview MWM
[covered in a later section of lecture]
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Outline
73
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Relative Position of Image Patches
74
Task: Predict the relative position of the second patch with respect to the first
Slide: Zisserman
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Relative Position of Image Patches
75
Slide: Zisserman
Doersch, Gupta, Efros
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Relative Position of Image Patches
76
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Relative Position of Image Patches
77
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Solving Jigsaw Puzzles
78
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Solving Jigsaw Puzzles
79
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Rotation
80
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Rotation
81
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Rotation
82
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Rotation
83
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Rotation
84
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Rotation
85
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Outline
86
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
87
July 2018
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
88
Figure from Alex Graves
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
89
Figure from Alex Graves
Don't directly predict x
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
90
Figure from Alex Graves
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
91
Figure from Alex Graves
Bilinear dot product
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
92
Figure from Alex Graves
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
93
Figure from Alex Graves
InfoNCE
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
94
Figure from Alex Graves
InfoNCE
Can be viewed as categorical cross-entropy of classifying the positive sample correctly
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
95
Figure from Alex Graves
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
96
Figure from Alex Graves
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
97
Figure from Alex Graves
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
98
Figure from Alex Graves
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Contrastive Predictive Coding
99
Figure from Alex Graves
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPC - Speech
100
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPC - Speech
101
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPC - ImageNet
102
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPC - ImageNet
103
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPC - ImageNet
104
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPC - ImageNet
105
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPC - Natural Language Processing
106
Oord, Li, Vinyals 2018
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPC - Reinforcement Learning
107
Figure from Aaron Van den Oord
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
108
May 2019
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
109
Figure from Aaron Van den Oord
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
110
Figure from Aaron Van den Oord
ResNet-161
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
111
Figure from Aaron Van den Oord
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
112
Figure from Aaron Van den Oord
ResNet-161
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
113
Figure from Aaron Van den Oord
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
114
Figure from Aaron Van den Oord
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
115
Figure from Aaron Van den Oord
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
116
Figure from Aaron Van den Oord
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
117
Figure from Aaron Van den Oord
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
118
Figure from Aaron Van den Oord
1. Other patches within image
2. Patches from other images
Negatives
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
119
1. Other patches within image
2. Patches from other images
Negatives
InfoNCE Loss
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
120
1. Other patches within image
2. Patches from other images
Negatives
InfoNCE Loss
Parallel Implementation
with PixelCNN (masked conv) and 1x1 conv
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
121
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Large Scale CPC on ImageNet
122
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Linear Classification
Linear �Classifier�Score
(Imagenet)
123
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv1 ---> CPCv2
124
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CPCv2 - Data-Efficient Image Recognition
125
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Instance Discrimination
attract
repel
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Instance Discrimination
attract
repel
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Momentum Contrast (MoCo)
Nov 2019
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Momentum Contrast (MoCo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Momentum Contrast (MoCo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Momentum Contrast (MoCo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Momentum Contrast (MoCo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Momentum Contrast (MoCo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Momentum Contrast (MoCo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SimCLR
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SimCLR
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SimCLR
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SimCLR
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SimCLR
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MoCov2 vs SimCLR
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MoCov2 vs SimCLR
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MoCov2 vs SimCLR
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MoCo v3
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MoCo v3
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MoCo v3
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MoCo v3
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MoCo v3
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BYOL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BYOL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BYOL
Normalize features
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BYOL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BYOL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BYOL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BYOL
Another perspective
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Summary
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Outline
156
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO
Consider knowledge distillation
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO
Self supervised learning as knowledge distillation
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO
Apply centering to avoid collapse - use EMA so things work across different batch sizes
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO
Threshold attention map get mask
Compare similarity to ground truth mask
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
iBOT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
iBOT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
iBOT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO - V2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO - V2
ViT-L
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO-V2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO - V2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
DINO-V2
Feature matching
Image retrieval
Segmentation
Depth prediction
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
I-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
I-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
I-JEPA
Context Encoder, Target Encoder and Predictor are ViTs
Predictor
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
I-JEPA
Context and Target Selection
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
I-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Freeze context encoder and predictor�
Train a RCDM (representation conditioned diffusion model to visualize predictions
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
V-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
V-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
V-JEPA
Short range masks: union of 8 randomly sampled target blocks converting 15 % of each frame
Long range masks:union of 2 randomly rampled target blocks covering 70% of each frame
~90% mask ratio
Train on large dataset of 2 million videos from publicly available dataset
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
V-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
V-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
V-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
V-JEPA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Outline
187
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CLIP
Dataset
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CLIP
CLIP learns features useful for other model
unCLIP
LLaVA
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
LiT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
LiT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
LiT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
LiT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SigLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SigLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SigLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SigLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
FLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
FLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
FLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
FLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
SLIP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CoCa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CoCa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CoCa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CoCa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CoCa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CoCa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CoCa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CoCa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
ImageBind
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
ImageBind
Train with (Image, Modality) pairs
Transformer for all modality encoders
Train with InfoNCE loss
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
ImageBind
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
30 Jan 2023
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
Output Queries
Image Encoder on X (Frozen)
Image Transformer
Input Queries
Text Transformer
CLS embedding
Text embedding
Max. similarity
CLS token
Text token Y
ITC (Image-Text Contrastive Learning)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
Output Queries
Image Encoder on X (Frozen)
Image Transformer
Input Queries
Text Transformer
Autoregressive text output Y
DEC token
ITG (Image-Grounded Text Generation)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
Output Queries
Image Encoder on X (Frozen)
Image Transformer
Input Queries
Text Transformer
Autoregressive text Y
DEC token
Autoregressive text Y
ITG (Image-Grounded Text Generation)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
Output Queries
Image Encoder on X (Frozen)
Image Transformer
Input Queries
Text Transformer
Text token Y
Classifier, Averaged
ITM (Image-Text Matching)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
Training
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BLIP-2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Outline
233
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CURL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CURL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CURL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
CURL
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
Motivation: Plug-and-play, general Visual Representations for Robotics must contain Three main ingredients…
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
Ego4d is diverse, in-the-wild, and language annotated
Contains 3,500 hours of data from 70 locations across the globe
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
Time Contrastive Learning encodes temporal dynamics into the representation
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
Video-Language alignment encourages F to capture semantically relevant features
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
Joint Optimization - Simple regularizations encourage sparsity of representations
L1 reduces representations to only critical features
L2 probably has more of a regularizing effect than anything
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
R3M
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MVP
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MVP
Use MAE learned features for robotic control
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MVP
Establishes a benchmark dataset and evaluation suite
Train on Human-Object-Interaction dataset of 700k images:
Evaluate on new PixMC Benchmark - Train with PPO
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MVP
Evaluate with highly parallelized PPO
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MVP
Self-supervised on large data is better than supervised on smaller data (ImageNet)
Oracle has access to hand-engineered state - location of objects, 3d poses, direction to goal vectors
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MVP
Representations robust to distractors and generalize different object types
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
+ Large volume and diversity
- Lacks modalities important for EAI� (e.g. proprioception, actions etc)
Egocentric human videos�(e.g. Ego4D, Epic-Kitchens etc)
+ Matching modality and embodiments
- Lacks volume and diversity � (physical system, lab setup etc.)
Robot execution trajectories�(e.g. BAIR Robot Dataset, BC-Z etc)
+ Side information available� (e.g. joint sensors, object poses)
- Inaccurate physics for transfer
Simulators�(e.g. Habitat, MuJoCo etc)
Goal: Develop a unified learning paradigm for trajectories that are multi-modal and heterogeneous
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MTM
Trajectory is a generic sequence of elements.
xt
Joint Sensors
Vision (RGB)
Vision (Depth)
Action
Element�(time t)
Modalities
qt1
qt2
qt3
qtK
Tokens
Lift to common�embedding space with modality-specific encoders.
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MTM
Missing modalities ⇔ Masked as a constraint
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MTM
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MTM
Summary
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MTM
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
MTM
Setup: (1) Pretrain MTM model on offline dataset. �(2) Use state encoder of MTM and feed it to a standard RL algorithms (TD3)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Masked World Models for Visual Control
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Masked World Models for Visual Control
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Masked World Models for Visual Control
Main idea: Decouple visual representation learning and dynamics learning
Visual Representation Learning
Dynamics learning
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Masked World Models for Visual Control
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Masked World Models for Visual Control
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
[Akkaya et al., 2019]
[Jangir et al., 2022]
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
Main Idea:�Reconstruct masked viewpoints to learn cross-view information
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
MV-MAE can extract both multi-view �and single-view representations
Visual robotic manipulation with multi-view or single-view data
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
Motivation:
Camera calibration is a tedious procedure
Viewpoint randomization
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Multi-View MAE
Rotation
Shake
Translation
Zoom
(From Younggyo)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Outline
278
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Predicting neighbouring context
279
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Word Embeddings
280
(From 224n Stanford)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Word Embeddings
281
(From 224n Stanford)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Word Embeddings
282
(From 224n Stanford)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Word Embeddings
SVD approach suffers from:
283
(From 224n Stanford)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
n-gram Language Models
284
(From 224n Stanford)
Unigram
Bigram
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
word2vec
285
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
word2vec
286
(From 224n Stanford)
Continuous Bag Of Words (CBOW)
Skip Gram
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
word2vec - CBOW
287
(From 224n Stanford)
Continuous Bag Of Words (CBOW)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
word2vec - Skip Gram
288
(From 224n Stanford)
Skip Gram
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
word2vec - Skip Gram
289
Skip-gram model
Don’t have to have the denominator over all words in the vocabulary
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
word2vec
290
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
word2vec
291
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
GloVe
Consider counting based statistical approaches
Word co occurrences where is the number of times j occurs in the context of i
Ratios of co-occurrence probabilities can encode meaning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
GloVe
Vector dot product to be similar to likelihood of of their co occurrence
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
GloVe
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BERT
Oct 2018
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BERT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BERT
Pre-training data:
Fine Tuning
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BERT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BERT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
BERT
Feature based
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
RoBERTa
Jul 2019
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
RoBERTa
Greatly simplify the process to train BERT
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
RoBERTa
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
Cast all tasks as language input, language output
Explore different architectures and pre training tasks
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
Finetuning
Fine-tune with the same input output format. Add a task specific text prefix to the model, ex.�
Input: translate English to German: That is good.
Output: Das ist gut.
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
Denoising Objective
Sentinel token to delineate removed spans
(unique ids that are added to the token vocab)
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
T5’s effect in Imagen - using T5’s text encoder
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
T5
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UL2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UL2
Does training on different pre training tasks help?
Formulate 3 types of pretraining
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UL2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
UL2
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning
Outline
318
UC Berkeley -- Spring 2024 -- Deep Unsupervised Learning -- Pieter Abbeel, Kevin Frans, Philipp Wu, Wilson Yan -- L7 Self-Supervised Learning