Continuous Sign Language Recognition
Group 18 - 費群安, 馬莎琳, 齊婉平, 杜威
Department of Computer Science & Information Engineering,
National Central University, Taiwan.
�
Introduction
Introduction & Motivation
Contributions
Proposed Architecture
Whole-body Pose Estimation
Pros and Cons of Skeleton-based SLR
Pros:
• Accuracy is high.
• No interference of background.
• Signer-invariant.
• Lightweight network, easy to train.
Cons:
• Finger key points estimation may not be accurate.
Solution:
• Those inaccurate key points may be corrected by other modalities (full-frame).
Finger that
wasn’t captured
Dataset
Dataset
Spatial Module
Pretrain Weight
Pixel-Map Weight Training
(56, 56, 256)�16 Sample Visualized
Preprocessing
Full-frame Feature
(n, 56, 56, 256)�64 Sample Visualized
(n, 224, 224, 3)
Cropped & Resized
Full-frame
input
*n = frame length
Keypoint Feature
(n x 1 x 27 x 3)
*n = frame length
Model Input
(n, 56, 56, 256)�64 Sample Visualized
(n, 1, 27, 3)�Keypoint Visualized
*n = frame length
Temporal Module
Sequence Learning Module
Sequence Learning Overview
Bidirectional Long Short-Term Memory (BLSTM)
Connectionist Temporal Classification (CTC)
BLSTM Input
Training
Overview
Result
Training Result
Training Result
Comparison With Other Result
Current Problem for Future Work
Future Works
(n, 7, 7, 512)�64 Sample Visualized
*n = frame length
Conclusion
In this project we proposed a novel approach combining full frame image with key points feature to achieve better translation. Our approach combine SMF and TMF, BiLSTM, and CTC. By using this approach we achieve 3% of WER which competitive enough compared to existing solution.
Demo
Q & A ?
Thank You!
謝謝!