1 of 19

Adversarial Masking for Self-Supervised Learning

Yuge Shi, N. Siddharth, Philip H.S. Torr, Adam R. Kosiorek

Presenter: Aarash

2 of 19

Self-Supervised Learning

Images from https://generallyintelligent.ai/blog/2020-08-24-understanding-self-supervised-contrastive-learning/

3 of 19

Masking

  • Masked Image Modeling (MIM) is becoming common

  • Masking is done on random regions
    • Reconstruct
    • Contrastive Learning

  • Used for pretraining (similar to BERT)

Image from “Masked Autoencoders Are Scalable Vision Learners”

4 of 19

Motivation

  1. Do not cover whole semantic “entities”, similar to language models like BERT
  2. Works mostly on transformers

5 of 19

Proposed Idea

Inference Model

Occlusion Model

Image

6 of 19

Inference Model

2 proposed methods:

  • Distance in pixel space
    • consists an encoder and decoder (auto-encoder)
    • Where does the imputation happen?
  • Distance in representation space
    • consists of only an encoder and decoder

Image from “Masked Autoencoders Are Scalable Vision Learners”

7 of 19

Inference Model with SimCLR

8 of 19

Occlusion Model

  • Learn N different masks
  • Learnable neural network with pixelwise softmax applied across N masks
  • A U-Net is used (often used for segmentation)
  • Binarising pixels didn’t improve performance

9 of 19

Adversarial Inference-Occlusion Self-supervision (ADIOS)

10 of 19

Sparsity Penalty

Add sparsity penalty to prevent trivial solutions:

11 of 19

Final Objective

  • Original version: (ADIOS)
    • Requires N forward passes

  • Light-weight version: (ADIOS-s)

12 of 19

Evaluation

  • Evaluation types:
    • Classification
      • Datasets: ImageNet100[-s], STL10
    • Transfer Learning
      • Datasets: CIFAR10, CIFAR100, Flowers102, iNaturalist
    • Robustness
      • Datasets: Backgrounds challenge (Xiao et al., 2021)

13 of 19

Evaluation: Classification

  • Methods
    • k-NN
    • Linear probing
    • Clustering
  • Models:
    • ViT-Tiny
    • ResNet-18
  • Datasets:
    • ImageNet100[-s]
    • STL10

14 of 19

Evaluation: Transfer Learning

  • Model:
    • ResNet-18 pretrained on ImageNet100-S
  • Methods:
    • Fine-tuning (F.T.)
    • Linear probing (Lin.)
  • Datasets
    • CIFAR10
    • CIFAR100
    • Flowers102
    • iNaturalist

15 of 19

Evaluation: Robustness

  • Model:
    • ResNet-18 pretrained on ImageNet100-S
  • Datasets
    • Backgrounds challenge (Xiao et al., 2021)
    • 7 different types of variation on a subset of ImageNet data used to measure the impact of FG and BG

16 of 19

Evaluation: Robustness

17 of 19

Evaluation: Robustness

18 of 19

Analysis on Learned Masks

19 of 19

Summary

  • Core contributions:
    • Proposed a adversarial MIM for SSL (ADIOS)
    • Produces semantically meaningful masks
    • Improves all SSL methods
  • Thoughts:
    • Compare masks created for Convolutional vs Transformer Models
    • Combining different masks (rather than one-by-one)