Efficient Visual Self-Attention
Efficiency, Generalization, and Paradigm Shift
Shen Zhuoran, Nov. 1, 2021
Overview
Motivation for attention
Dot-product attention
Efficient attention
Efficient attention
Interpretation
Empirical comparison vs. the non-local module
Additional results
Stereo depth estimation (Scene Flow)
Temporal action localization (THUMOS14)
Global context module
Semi-supervised video object segmentation
Deep learning approaches
Space-time memory module
Global context module
GC module vs. STM module
Complexity | STM | GC |
Memory | | |
Computation | | |
Empirical results
Visualizations
Global Self-Attention Networks
Motivations for fully-attentional modeling
Goal
Bottleneck block
Attention bottleneck block
Efficient attention
21
Encoding positional information
22
Encoding relative positions for self-attention
23
Axial attention
24
GSA module
25
GSA-Net
26
Bottleneck block
GSA-Bottleneck block
Results on ImageNet
27
Discussion
28
Thank you