EScALation: A Framework for Efficient and Scalable Spatio-temporal Action Localization
Bo Chen, Klara Nahrstedt
University of Illinois at Urbana-Champaign, USA
Email: boc2@illinois.edu
Introduction
Introduction
Biking
Ice dancing
Walking dog
Introduction
Biking
Ice dancing
Walking dog
Motivation
Approach | Backbone Network | Hardware | Speed (fps) | mAP (%) |
Peng et al.[1] | VGG16 | Offline | 32.1 | |
Saha et al.[2] | VGG16 | Offline | 36.4 | |
Singh et al.[3] | VGG16 | NVIDIA Titan X | 40 | 40.9 |
YOWO[4] | 3D-ResNext-101 + Darknet-19 | NVIDIA Titan XP | 34 | 48.8 |
[1] X. Peng, et al., “Multi-region two-stream r-cnn for action detection”. ECCV. (2016).
[2] S. Saha, et al., “Deep learning for detecting multiple space-time action tubes in videos”. arXiv:1608.01529. (2016).
[3] G. Singh, et al., "Online real-time multiple spatiotemporal action localisation and prediction." ICCV (2017).
[4] O. Kopuklu , et al., “You only watch once: A unified cnn architecture for real-time spatiotemporal action localization”. arXiv:1911.06644. (2019).
Motivation
EScALation
EScALation
EScALation
Frame Sampling
Frame Sampling
Frame Sampling
Action Detection Metrics
Frame Sampling
Class Filtering
94.5%
84%
Evaluation
[1] O. Kopuklu , et al., “You only watch once: A unified cnn architecture for real-time spatiotemporal action localization”. arXiv:1911.06644. (2019).
Overall Performance
Overall Performance
Conclusion
Future Works
Q&A
Thank you for listening!
Bo Chen, Klara Nahrstedt
University of Illinois at Urbana-Champaign, USA
Email: boc2@illinois.edu