JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 10

Computer Vision Papers

By Michel Liao

2 of 10

Project 1

By Michel Liao

A ConvNet for the 2020s, Liu et al. 2022

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, Wang et al. 2021

Conclusion: PVT is slightly better than ResNet.

PVT v2: Improved baselines with Pyramid Vision Transformer, Wang et al. 2022

Conclusion: Patch overlap and positional embedding is important.

Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance, Zhang et al. 2022

Conclusion: T4T is good. Reflections and edges are useful for segmentation.

A Computer Vision Paper Canon

By Michel Liao

Not the canon but a canon

Attention Is All You Need, Vaswani et al. 2017

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy et al. 2021

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers, Zheng et al. 2021