1 of 10

Computer Vision Papers

By Michel Liao

2 of 10

Project 1

By Michel Liao

3 of 10

A ConvNet for the 2020s, Liu et al. 2022

4 of 10

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, Wang et al. 2021

Conclusion: PVT is slightly better than ResNet.

5 of 10

PVT v2: Improved baselines with Pyramid Vision Transformer, Wang et al. 2022

Conclusion: Patch overlap and positional embedding is important.

6 of 10

Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance, Zhang et al. 2022

Conclusion: T4T is good. Reflections and edges are useful for segmentation.

7 of 10

A Computer Vision Paper Canon

By Michel Liao

Not the canon but a canon

8 of 10

Attention Is All You Need, Vaswani et al. 2017

9 of 10

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy et al. 2021

10 of 10

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers, Zheng et al. 2021