Contrastive Pre-training�for unsupervised representation learning
Oğuz Şerbetçi
Why Representation Learning
Better representations:
→ Faster, easier and more data efficient machine learning
→ Transfer learning
Deep Learning = Representation Learning
Learn a supervised image classifier with labeled data.
→ Transfer Learning: Use activations as representation for another task or data.
dog
Deep Learning = Representation Learning
Unsupervised Representation Learning
with Generative Models
Generative models
! Reconstructing every detail in the data is usually required.
Unsupervised Representation Learning
with Pretext Tasks
Word2Vec and recent attention methods use raw text and a pretext task to learn representations.
Unsupervised Representation Learning
with Pretext Tasks
Image colouring
jigsaw puzzle
Supervised vs Unsupervised
labels
context
Representation learning with �Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, Oriol Vinyals
Predictive Coding
Contrastive Predictive Coding
Contrastive Predictive Coding
where
Prediction of the model
InfoNCE Loss
Experiments in 4 domains
Image
Every row shows
image patches that
activate a certain neuron
in the CPC architecture.
Context model is PixelCNN
Image
Audio
Text
Reinforcement Learning
No need for memory,
a reactive policy solves the task
Summary
Data-Efficient Image Recognition with Contrastive Predictive Coding
Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, �Aaron van den Oord
CPC-v2
Buying data efficiency with compute:
28 million → 305 million parameters
💸
🌏
🔥
CPC-v2
Talking points
A Simple Framework for Contrastive Learning of Visual Representations
Ting Chen, Simon Kornblith,
Mohammad Norouzi, Geoffrey Hinton
SimCLR: Contrastive Learning with only data augmentations
Augmentations instead of context.
X16 params
💸
🌏
🔥
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He, Haoqi Fan, Yuxin Wu,
Saining Xie, Ross Girshick
Memory Bank for the Negative Samples
Store mini-batches for later use as negative samples.
BUT embeddings change fast during training.
→ Use Momentum updates with a second encoder:
Have another encoder that updates slower, which makes negative embeddings “smoother”.
→ Larger batch-size → more negative samples: 65.536
grad
momentum
update
Transfer Learning
Figure & results: Chen et al. 2020, Improved Baselines with Momentum Contrastive Learning
More papers:
Contrastive Multiview Coding:� https://arxiv.org/abs/1906.05849
Learning Representations by �Maximizing Mutual Information Across Views:�https://arxiv.org/abs/1906.00910
Self-Supervised Learning of Pretext-Invariant �Representations: https://arxiv.org/abs/1912.01991
Thanks!