Attention-based Dual-stream Vision Transformer for Radar Gait Recognition
Shiliang Chen1, Wentao He1, Jianfeng Ren1, Xudong Jiang2
1University of Nottingham, Ningbo China
2Nanyang Technological University, Singapore
Outline
Outline
Introduction
signal
radar
ID: 42
Supervised Learning
…
ID: 39
ID: 40
ID: 41
ID: 42
…
You are #42!
Contributions
Outline
Methodology
Position and Patch Embedding
Transformer Encoder
Spectrogram Stream
CVD Stream
MLP Head
Element-wise multiplication
Raw signal
FFT along time axis
Attention-based Fusion
Kernel q
softmax
(a) Attention-based Two-Stream Architecture
MLP Head
Predicted Label
Spectrogram
CVD
Position and Patch Embedding
Transformer Encoder
MLP Head
Feature Generation
STFT
Element-wise multiplication
Methodology
(b) Architecture of Vision Transformer
Flatten
Concat(·)
Layer
Norm
Multi-head
Attention
Dropout
Layer
Norm
MLP
Block
Dropout
Position and Patch Embedding (PPE)
Transformer Encoder (TE)
Extract Class Token
Class Token
Position Embedding
Class Token
Input Image
12X
Methodology
Outline
Results
100.00
90.00
80.00
70.00
60.00
| Spectrogram | CVD | |
DCNN-AlexNet | 71.56 | 72.44 | |
VGG16 | 69.24 | 79.83 | |
ResNet18 | 85.56 | 80.73 | |
Ours | | | 91.02 |
Outline
Conclusion
Thank you!
Shiliang Chen
scysc1@nottingham.edu.cn
Wentao He
scxwh1@nottingham.edu.cn
Jianfeng Ren
jianfeng.ren@
nottingham.edu.cn
Xudong Jiang
exdjiang@ntu.edu.sg
100.00
90.00
80.00
70.00
60.00
| Spectrogram | CVD | |
DCNN-AlexNet | 71.56 | 72.44 | |
VGG16 | 69.24 | 79.83 | |
ResNet18 | 85.56 | 80.73 | |
Ours | | | 91.02 |