Computer Vision Papers
By Michel Liao
Project 1
By Michel Liao
A ConvNet for the 2020s, Liu et al. 2022
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, Wang et al. 2021
Conclusion: PVT is slightly better than ResNet.
PVT v2: Improved baselines with Pyramid Vision Transformer, Wang et al. 2022
Conclusion: Patch overlap and positional embedding is important.
Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance, Zhang et al. 2022
Conclusion: T4T is good. Reflections and edges are useful for segmentation.
A Computer Vision Paper Canon
By Michel Liao
Not the canon but a canon
Attention Is All You Need, Vaswani et al. 2017
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy et al. 2021
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers, Zheng et al. 2021