1 of 19

Adversarial Masking for Self-Supervised Learning

Yuge Shi, N. Siddharth, Philip H.S. Torr, Adam R. Kosiorek

Presenter: Aarash

2 of 19

Self-Supervised Learning

Images from https://generallyintelligent.ai/blog/2020-08-24-understanding-self-supervised-contrastive-learning/

3 of 19

Masking

Masked Image Modeling (MIM) is becoming common

Masking is done on random regions

Reconstruct
Contrastive Learning

Used for pretraining (similar to BERT)

Image from “Masked Autoencoders Are Scalable Vision Learners”

4 of 19

Motivation

Do not cover whole semantic “entities”, similar to language models like BERT
Works mostly on transformers

5 of 19

Proposed Idea

Inference Model

Occlusion Model

Image

6 of 19

Inference Model

2 proposed methods:

Distance in pixel space

consists an encoder and decoder (auto-encoder)
Where does the imputation happen?

Distance in representation space

consists of only an encoder and decoder

Image from “Masked Autoencoders Are Scalable Vision Learners”

7 of 19

Inference Model with SimCLR

8 of 19

Occlusion Model

Learn N different masks
Learnable neural network with pixelwise softmax applied across N masks
A U-Net is used (often used for segmentation)
Binarising pixels didn’t improve performance

9 of 19

Adversarial Inference-Occlusion Self-supervision (ADIOS)

10 of 19

Sparsity Penalty

Add sparsity penalty to prevent trivial solutions:

11 of 19

Final Objective

Original version: (ADIOS)

Requires N forward passes

Light-weight version: (ADIOS-s)

12 of 19

Evaluation

Evaluation types:

Classification

Datasets: ImageNet100[-s], STL10

Transfer Learning

Datasets: CIFAR10, CIFAR100, Flowers102, iNaturalist

Robustness

Datasets: Backgrounds challenge (Xiao et al., 2021)

13 of 19

Evaluation: Classification

Methods

k-NN
Linear probing
Clustering

Models:

ViT-Tiny
ResNet-18

Datasets:

ImageNet100[-s]
STL10

14 of 19

Evaluation: Transfer Learning

Model:

ResNet-18 pretrained on ImageNet100-S

Methods:

Fine-tuning (F.T.)
Linear probing (Lin.)

Datasets

CIFAR10
CIFAR100
Flowers102
iNaturalist

15 of 19

Evaluation: Robustness

Model:

ResNet-18 pretrained on ImageNet100-S

Datasets

Backgrounds challenge (Xiao et al., 2021)
7 different types of variation on a subset of ImageNet data used to measure the impact of FG and BG

16 of 19

Evaluation: Robustness

17 of 19

Evaluation: Robustness

18 of 19

Analysis on Learned Masks

19 of 19

Summary

Core contributions:

Proposed a adversarial MIM for SSL (ADIOS)
Produces semantically meaningful masks
Improves all SSL methods

Thoughts:

Compare masks created for Convolutional vs Transformer Models
Combining different masks (rather than one-by-one)