1 of 15

Attention-based Dual-stream Vision Transformer for Radar Gait Recognition

Shiliang Chen1, Wentao He1, Jianfeng Ren1, Xudong Jiang2

1University of Nottingham, Ningbo China

2Nanyang Technological University, Singapore

2 of 15

  1. Introduction

  • The proposed Attention-based Dual-Stream Vision Transformer (ADS-ViT)

  • Experiments and Results

  • Conclusion

Outline

3 of 15

  1. Introduction

  • The proposed Attention-based Dual-Stream Vision Transformer (ADS-ViT)

  • Experiments and Results

  • Conclusion

Outline

4 of 15

  • Side-view human gait recognition relies heavily on light conditions and may cause infringement on privacy.
  • Radar can capture the micro-Doppler signatures (mDS) of front-view gait features from a moving target.

Introduction

signal

radar

ID: 42

Supervised Learning

ID: 39

ID: 40

ID: 41

ID: 42

You are #42!

5 of 15

  • The proposed ADS-ViT effectively extracts and fuses from spectrogram and CVD for radar gait recognition.

  • Both streams of ADS-ViT utilize the patch-processing ability of ViT to effectively capture the gait characteristics embedded in patches corresponding to frequency bands of spectrogram and CVD.

Contributions

6 of 15

  1. Introduction

  • The proposed Attention-based Dual-Stream Vision Transformer (ADS-ViT)

  • Experiments and Results

  • Conclusion

Outline

7 of 15

Methodology

Position and Patch Embedding

Transformer Encoder

Spectrogram Stream

CVD Stream

MLP Head

Element-wise multiplication

Raw signal

FFT along time axis

Attention-based Fusion

Kernel q

softmax

(a) Attention-based Two-Stream Architecture

MLP Head

Predicted Label

Spectrogram

CVD

Position and Patch Embedding

Transformer Encoder

MLP Head

Feature Generation

STFT

Element-wise multiplication

 

 

 

 

 

 

 

8 of 15

  • Feature extractor: Vision Transformer (ViT)
  • In spectral representations, the spatial coordinates correspond to unique physical interpretations.
  • ViT can deeply exploit the gait characteristics embedded in patches of spectrogram and CVD.

Methodology

(b) Architecture of Vision Transformer

 

Flatten

Concat(·)

Layer

Norm

Multi-head

Attention

Dropout

Layer

Norm

MLP

Block

Dropout

Position and Patch Embedding (PPE)

Transformer Encoder (TE)

Extract Class Token

Class Token

Position Embedding

Class Token

Input Image

12X

 

 

 

9 of 15

  •  

Methodology

 

 

 

10 of 15

  1. Introduction

  • The proposed Attention-based Dual-Stream Vision Transformer (ADS-ViT)

  • Experiments and Results

  • Conclusion

Outline

11 of 15

Results

100.00

90.00

80.00

70.00

60.00

Spectrogram

CVD

DCNN-AlexNet

71.56

72.44

VGG16

69.24

79.83

ResNet18

85.56

80.73

Ours

91.02

12 of 15

  1. Introduction

  • The proposed Attention-based Dual-Stream Vision Transformer (ADS-ViT)

  • Experiments and Results

  • Conclusion

Outline

13 of 15

  • Contributions
    • The proposed ADS-ViT effectively extracts and fuses features from spectrogram and CVD for radar gait recognition.
    • Both streams of ADS-ViT utilize the patch-processing ability of ViT to effectively capture the gait characteristics embedded in patches corresponding to frequency bands of spectrogram and CVD.
  • Experiments
    • Outperforms AlexNet, VGG and ResNet.
    • Perform the best in ablation study.

Conclusion

14 of 15

Thank you!

Shiliang Chen

scysc1@nottingham.edu.cn

Wentao He

scxwh1@nottingham.edu.cn

Jianfeng Ren

jianfeng.ren@

nottingham.edu.cn

Xudong Jiang

exdjiang@ntu.edu.sg

15 of 15

100.00

90.00

80.00

70.00

60.00

Spectrogram

CVD

DCNN-AlexNet

71.56

72.44

VGG16

69.24

79.83

ResNet18

85.56

80.73

Ours

91.02