Data Geometry and DL - Lecture 8
Network pruning and the "Lottery ticket hypothesis"
Survey on main approaches to pruning (paper)
Network pruning – biological parallels, computational motivations
(concrete ideas: CNNs/equivariance, curvature, Neural Architecture Search..)
Pruning works well → The Lottery Ticket Hypothesis
Pruning basics
Time effects in pruning – More biological parallels
Recall Lecture 7: First I(Y,T) grows then I(X,T) diminishes
Time effects in pruning – Early-Bird lottery tickets
Data-oblivious pruning (summary)
Data-dependent pruning. Saliency measures, incomplete list (1/2)
Data-dependent pruning. Saliency measures, incomplete list (2/2)
F. Hebbian learning: correlated neurons connect, uncorrelated ones can be cut out
G. Remove redundant populations (impact at “fine tuning” stage especially):
H. Use a gradient based scheme with respect to a binary gating function
→ during training, differentiate gating to determine weight importance
Sparsity versus depth: how to measure “global” saliency?
Sparse Evolutionary Training
(paper)
Towards sparse neural networks
(paper)
Extending the sparsity paradigm: Novelty-oriented search (paper)
Final comments
=Sloppiness of A
DNN dimension at minimum =
Summary of important directions