Let's Make Neural Networks Efficient
Sathyaprakash Narayanan
ECE-x83: Special Topics in Engineering [Brain Inspired Machine Learning]
Jason Eshraghian
So what’s the catch?
Scaling these models up is a huge pain:
- binarized activations are hard to deal with; you are drastically limiting the ability of individual neurons to represent information
For equivalent tasks, non-spiking networks are easier to train to often better loss convergence
- recurrent, time-varying neurons are expensive to train using BPTT (linear memory complexity with time)
- Sparsity only makes sense if your hardware knows to skip “0” operations. GPUs, by default, do not know to do this.
Solutions?
- Computation is cheap, memory access is expensive. Maybe we focus on sparsity instead of binarization. I.e., silicon’s already optimized for computation. Use it.
- Real-time learning techniques?
- Dynamical weights
Major takeaway: don’t trust me. Go build cooler shit.
Stolen from Class slides: Week 2 :P
Von Neumann Bottleneck “Memory is Expensive”
Data Movement → More Memory Reference → More Energy
How should we make deep learning more efficient?
Sparsity is the Key!!!!!!!!
Von Neumann Bottleneck “Memory is Expensive”
Data Movement → More Memory Reference → More Energy
4
Operation | Energy [pJ] |
32 bit int ADD | 0.1 |
32 bit float ADD | 0.9 |
32 bit Register File | 1 |
32 bit int MULT | 3.1 |
32 bit float MULT | 3.7 |
32 bit SRAM Cache | 5 |
32 bit DRAM Memory | 640 |
10
100 1000 10000
Rough Energy Cost For Various Operations in 45nm 0.9V
Relative Energy Cost
200 ✕
1
= 200
1
This image is in the public domain
Computing's Energy Problem (and What We Can Do About it) [Horowitz, M., IEEE ISSCC 2014]
What’s the first step towards Sparsity?
How should we make deep learning more efficient?
Pruning Happens in Human Brain
5
Time
Slide Inspiration: Alila Medical Media
Number of Synapses
Newborn
2-4 years old
Adult
Adolescence
15000 synapses
2500 synapses
per neuron [1]
per neuron [1]
7000 synapses
per neuron [2]
Data Source: 1, 2
Do We Have Brain to Spare? [Drachman DA, Neurology 2004]
Peter Huttenlocher (1931–2013) [Walsh, C. A., Nature 2013]
Neural Network Pruning
6
Make neural network smaller by removing synapses and neurons
Optimal Brain Damage [LeCun et al., NeurIPS 1989]
Learning Both Weights and Connections for Efficient Neural Network [Han et al., NeurIPS 2015]
Dense Model
Sparse Model
Let make snn’s actually sparse(nn in general)!!
Wrote a package to fix this, :P
snn Notebook: https://colab.research.google.com/drive/1nNil4aj0GJxGfFQka8By9dTESVjLlied?usp=sharing
Tutorial: Show the DL Comparion on sconce/tutorials.
Is that all we could do?
Quantization
Knowledge-Distillation
Sparsity Aware Engine/Model Computations
CUDA Optimizations
Hardware Aware Optimizations
Neural-Architecture Search
Sconce (Note: Any Torch can be used)
Go and smash the Star on Github Repo(satabios/sconce)… I can’t get you extra credits, but I promise that you’ll be in my stargazers list 🥹
Didn’t want to bore you further
References
11
sconce v0.99
Auto-Sensitivity Scan
* - Future Features ( I too have only 24 hrs in a day :P)
sconce v1.1
Altruism is all you need !!! Don’t just be self attentive
Make Kernels Aware of the Future Kernel Spaces
Code: https://github.com/satabios/sconce/blob/main/tutorials/Pruning.ipynb
Citations:
Channel-Based Activation Aware Pruning
But Channel Wise!!
Activation Aware - QAT
But Channel Wise!!
Complete Flow
Possible Additions