(FYI) Talk by Prof. Yanzhi Wang, Assoc. Professor at Northeastern University, USA
Talk Title
Towards Best Possible Deep Learning Acceleration on the Edge – A
Compression-Compilation Co-Design Framework
Abstract
Mobile and embedded computing devices have become key carriers of
deep learning to facilitate the widespread of machine intelligence.
However, there is a widely recognized challenge to achieve real-time
DNN inference on edge devices, due to the limited
computation/storage resources on such devices. Model compression of
DNNs, including weight pruning and weight quantization, has been
investigated to overcome this challenge. However, current work on
DNN compression suffer from the limitation that accuracy and
hardware performance are somewhat conflicting goals difficult to
satisfy simultaneously.
We present our recent work CoCoPIE, representing
Compression-Compilation Codesign, to overcome this limitation
towards the best possible DNN acceleration on edge devices. We
propose novel fine-grained structured pruning schemes, including
pattern-based pruning, block-based pruning, etc. They can
simultaneously achieve high hardware performance (similar to
filter/channel pruning) while maintaining zero accuracy loss, with
the help of compiler, which is beyond the capability of prior work.
Similarly, we present novel quantization scheme that achieves
ultra-high hardware performance close to 2-bit weight quantization,
with almost no accuracy loss. Through the CoCoPIE framework, we are
able to achieve real-time on-device execution of a number of DNN
tasks, including object detection, pose estimation, activity
detection, speech recognition, just using an off-the-shelf mobile
device, with up to 180X speedup compared with prior work. Our
comprehensive demonstrations are at :
https://www.youtube.com/channel/UCCKVDtg2eheRTEuqIJ5cD8ABiography
Yanzhi Wang is currently an associate professor and faculty fellow
at Dept. of ECE at Northeastern University, Boston, MA. He received
the B.S. degree from Tsinghua University in 2009, and Ph.D. degree
from University of Southern California in 2014. His research
interests focus on model compression and platform-specific
acceleration of deep learning applications. His work has been
published broadly in top conference and journal venues (e.g., DAC,
ICCAD, ASPLOS, ISCA, MICRO, HPCA, PLDI, ICS, PACT, ISSCC, AAAI,
ICML, NeurIPS, CVPR, ICLR, IJCAI, ECCV, ICDM, ACM MM, FPGA, LCTES,
CCS, VLDB, PACT, ICDCS, RTAS, Infocom, C-ACM, JSSC, TComputer,
TCAS-I, TCAD, TCAS-I, JSAC, TNNLS, etc.), and has been cited above
12,000 times. He has received six Best Paper and Top Paper Awards,
and one Communications of the ACM cover featured article. He has
another 12 Best Paper Nominations and four Popular Paper Awards. He
has received the U.S. Army Young Investigator Program Award (YIP),
IEEE TC-SDM Early Career Award, Massachusetts Acorn Innovation
Award, Martin Essigmann Excellence in Teaching Award, Massachusetts
Acorn Innovation Award, Ming Hsieh Scholar Award, and other research
awards from Google, MathWorks, etc. He has received 22 federal
grants from NSF, DARPA, IARPA, ARO, ARFL/AFOSR, etc.. He has
participated in a total of $40M funds with personal share $8.5M. Six
of his former Ph.D./postdoc students become tenure track faculty at
Univ. of Connecticut, Clemson University, Chongqing University,
Beijing University of Technology, Texas A&M University, Corpse
Christi, and Cleveland State University.