1 of 21

TUE-AM-335

2 of 21

Object Detection

Credit: Alberto Rizzoli

3 of 21

RCNN, Girshick et al. CVPR 2014

Fast RCNN, Girshick ICCV 2015

Recent Object Detectors

Faster RCNN, Ren et al. NeurIPS 2015

Yolo, Redmon et al. CVPR 2016

Detectron, Girshick et al. 2018

https://github.com/facebookresearch/Detectron

4 of 21

Domain Shift

“What you saw is not what you get”

Training data

“what you saw”

Deployment

“what you get”

  • “Domain adaptation for deep learning” - Kate Saenko

5 of 21

Domain Shifts

  • Oza, Poojan, Vishwanath A. Sindagi, Vibashan VS, and Vishal M. Patel. "Unsupervised Domain Adaption of Object Detectors: A Survey." arXiv preprint arXiv:2105.13502

Source domain: Labeled samples (Cityscapes, Synthetic, Visible)

Target domain: Unlabeled samples (FoggyCityscapes, Real-world, Thermal)

6 of 21

How to tackle a domain shift?

Include annotated samples

for target domain

Domain Adaptation

Labor intensive

Time consuming

Expensive

Solutions

Increasing model generalization capability and robustness

7 of 21

Unsupervised Domain Adaptation (UDA)

  • We have labelled samples from source domain.
  • We have unlabeled samples from target domain.

UDA

Class A

Class B

Source Domain

Target Domain

Setting:

8 of 21

UDA Drawbacks

  • Conventional UDA methods assume that source data is accessible during the adaptation process.
  • In real-world application source data is often restricted due to privacy regulations, data transmission constraints, or proprietary data concerns.
  • Source-Free Domain Adaptation (SFDA) setting specifically aims to alleviate these concerns by adapting the source-trained models for the target domain without requiring access to the source data.

9 of 21

Source-Free Domain Adaptation (SFDA)

  • We have access to a source-trained model but we do not have access to labelled samples from the source domain.
  • We have unlabeled samples from the target domain.

Setting:

10 of 21

Challenges

Source-model Prediction Ground Truth

  • Two major challenges in SFDA training:
    • Effectively distill target domain information into source-trained model.
    • Enhancing the target domain feature representations.

11 of 21

Knowledge Distillation of Target to Source model

To effectively distill target domain knowledge into a source-trained model we employ a student-teacher framework.

12 of 21

Enhancing Target Domain Feature Representations

  • RPN proposals provide multiple contrastive views of an object instance.
  • Utilizing this to improve target domain feature representations through RPN-view based contrastive learning.

13 of 21

Motivation and Key Idea

  • Top: Supervised training of detection model on the source domain.
  • Bottom: Source-free adaptation setup, i.e., a novel graph-based contrastive loss to enhance representations by exploiting instance relations of a target domain input.

14 of 21

Proposed Network

  • Student net is trained with pseudo labels from the teacher net, and the teacher is updated using EMA of student weights in a student-teacher framework for detector modeling.
  • The IRG network models object proposal relations in the detector.
  • Inter-proposal relations learned by the IRG are utilized to generate pairwise labels for contrastive learning, identifying positive/negative pairs.

15 of 21

Instance Relation Graph Network

  • Given proposal RoI features, the instance relation graph improves the similarity relations between proposals.
  • Using the relation, generate pairwise labels for proposals to obtain positive (white)/negative (black) pairs for computing the contrastive loss.

16 of 21

Loss functions

Pseudo-label Loss

Graph Contrastive Loss

Overall Loss

17 of 21

Quantitative Results

18 of 21

Quantitative Results

19 of 21

IRG Proposal Relation

  • Relation matrix analysis for 25 proposal RoI features before and after passing through IRG network.
  • IRG network models the relationship between the proposal, which maximizes the similarity between similar proposals and vice versa for dissimilar proposals.

20 of 21

Qualitative Results

Student-Teacher Proposed method

21 of 21

Acknowledgement

Vision and Image Understanding (VIU) Lab @JHU

Vibashan VS

Poojan Oza

Vishal M Patel