Masked Autoencoders
Are Scalable Vision Learners
Masked Autoencoders
Are Scalable Vision Learners
Paper Discussion 13.07.2022
Transformers Rule!!!
Give Convs a chance
It was another day at the office in ML City for Huggy.
He just finished creating a nice new space and was ready for his lunch in the park.
He was looking forward to a nice sandwich out on a sunny summer day.
But he soon realized that it was getting way too hot.
38°C
That was the moment he decided to have a nice beach vacation.
He had heard about this nice place - Neural Network Island. Where ML folks go with their image pets to have a good time.
Google ViT Training today
Bring your image for free TPU credits !!!
ViT
But when he got to the beach he was extremely disappointed - it was packed with images.
He wondered why they were all hanging around at this one beach.
Then he saw a sign.
FAIR MAE Training today
Bring your image for free Instagram followers !!!
Luckily Huggy got a tip from a local about another nearby beach.
When he got there, he was relieved - way less images.
But something was strange about them.
He looked around and saw a sign, again.
MAE
Then he saw a booth offering free decoder glasses. He put them on and was amazed.
With his new decoder glasses, Huggy spent a nice day at the beach.
In the afternoon there was a classifiaction competition between the vanilla ViT and MAE.
Huggy was sure that MAE had no chance.
It was a close result, but in the end MAE managed to outperform vanilla ViT.
Vision Transformer (ViT) Recap
Masked Autoencoders
Masked Autoencoders
E
n
c
o
d
e
r
D
e
c
o
d
e
r
L
a
t
e
n
t
MAE Architecture
Masking
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
Encoder
Encoder
Encoder
Layer Normalization
Linear Projection
Decoder
Decoder
Layer Normalization
| | | |
| | | |
| | | |
| | | |
Reconstruction
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
Original
Reconstruction
| | | |
| | | |
| | | |
| | | |
Mean Squared Error
Classification
E
n
c
o
d
e
r
D
e
c
o
d
e
r
L
a
t
e
n
t
C
l
a
s
s.
H
e
a
d
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
Reconstruction
Reconstruction
Reconstruction
Reconstruction
Performance
Linear Probing
Fine Tuning
Performance - Comparison
Performance - Mask Sampling
Performance - Masking Ratio
Performance - Decoder Size
Performance - Augmentation
Performance - Decoder Size
Object Detection and Segmentation
Semantic Segmentation
Thank you!