1 of 11

UniCat: Crafting a Stronger Fusion Baseline for Multimodal Re-Identification

Jennifer Crawford, Haoli Yin, Luke McDermott, Daniel Cummings

December 5, 2023

2022 | Modern Intelligence. Proprietary - do not distribute.

2 of 11

Background: Unimodal ReID

  • Re-Identification (ReID): match a query image/embedding to the most similar object in a vector database gallery, based on feature comparison
    • Supervised, Multiview, Retrieval Problem
    • Metric learning & Contrastive Learning
    • Metrics: mean Average Precision (mAP) and Ranks from CMC curve

2022 | Modern Intelligence. Proprietary - do not distribute.

3 of 11

Problem: Unimodal ReID

Unimodal ReID struggles in out-of-distribution (OOD) scenarios like night, fog, etc.

2022 | Modern Intelligence. Proprietary - do not distribute.

4 of 11

Background: Multimodal ReID

  • Multimodal ReID: fuses multiple modalities for a robust object representation

  • How do we best fuse modalities to achieve the best representational quality?

2022 | Modern Intelligence. Proprietary - do not distribute.

5 of 11

Datasets

  • RGBN300: Vehicle ReID; RGB and Near IR (NIR); 150 classes each train/test
  • RGBNT100: Subset of RGBN300, RGB, NIR, and Thermal IR (TIR), 50 classes each train/test
  • RGBNT201: Person ReID; RGB, NIR, TIR, 100 classes each train/test

2022 | Modern Intelligence. Proprietary - do not distribute.

6 of 11

Methods: Investigating Late Fusion

2022 | Modern Intelligence. Proprietary - do not distribute.

7 of 11

Results: Multimodal ReID

  • Our simple fusion baselines match or exceed SOTA results on numerous benchmark datasets.

  • Fusion strategy emerges as a leading factor in ReID performance

??

2022 | Modern Intelligence. Proprietary - do not distribute.

8 of 11

Discussion: Multimodal -> Unimodal ReID

  • Unimodal representational quality is strongly linked to multimodal performance

  • Training without fusion (UniCat) benefits all modalities (with one small exception)

2022 | Modern Intelligence. Proprietary - do not distribute.

9 of 11

Discussion: Modality Laziness

  • Modality laziness is a phenomenon in multimodal fusion in which modalities are underutilized when training with fusion [1].

  • Con: Relaxing training constraints on individual modalities often leads to weaker representations.

  • Pro: We observe that modality laziness might be useful for modalities that are especially unreliable (NIR in RGBNT201 dataset).

[1] Chenzhuang Du, Jiaye Teng, Tingle Li, Yichen Liu, Yue Wang, Yang Yuan, and Hang Zhao. Modality laziness: Everybody’s business is nobody’s business. 2021.

2022 | Modern Intelligence. Proprietary - do not distribute.

10 of 11

Discussion: Connection to Ensemble Learning

  • We show modality laziness can arise regardless of equal or comparable predictive strengths in input modalities.

  • Even when working with ensembles in unimodal learning, joint training is often subpar to bagging individual models [2]

[2] Jeffares, A., Liu, T., Crabbé, J., & van der Schaar, M. (2023). Joint Training of Deep Ensembles Fails Due to Learner Collusion. In Proceedings of the 2023 Conference on Neural Information Processing Systems (NeurIPS 2023). arXiv preprint arXiv:2301.11323.

2022 | Modern Intelligence. Proprietary - do not distribute.

11 of 11

Takeaways

UniReps Workshop

@ NeurIPS 2023

UniCat: a simple strategy to bypass modality laziness in late fusion architectures

  • SOTA results on common multimodal ReID benchmarks

  • Modality Laziness in late fusion architectures leads to suboptimal learning

  • Demonstrated similarity between modality laziness in late fusion and joint training issues in unimodal ensembles.

2022 | Modern Intelligence. Proprietary - do not distribute.