CSE 5524: �Foundation of learning - 3
Homework assignment - 1
Today
3
Recap: Key ingredients
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Case study – 1: regression
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Case study – 2: classification
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Gradient-based learning algorithm
Gradient-based learning algorithm
Derivative (-)
GOAL: minimum error
Basic gradient descent
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Today
10
Training vs. testing
11
x
y
In training, we only see training data!
Choosing a more complicated hypothesis class does not necessarily lead to lower test errors!
Under-fitting vs. over-fitting
Under-fitting vs. over-fitting
K too small: simple
K too large: complicated
training error = 0
13
[Slides: from USC CSCI567]
Under-fitting vs. over-fitting
14
[Slides: from USC CSCI567]
Under-fitting vs. over-fitting
Training data
Train
Val
Treating “Val” as the “pseudo” test data!
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Questions?
Regularization
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Finding a good regularizer is not always easy
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Occam’s razor principle
20
All things being equal,
the simplest explanation is usually the best!
Three tools in the search for Truth
Data (likelihood) & Prior
Three tools in the search for Truth
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Effect of data
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
More data, less over-fitting
[Figure: Bishop, PRML]
Green: true data distribution
Blue: training data
Red: learned model
Effect of priors
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Effect of hypothesis space
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Remark & Plan
Today
29
The progress of deep learning for classification
30
ImageNet-1K (ILSVRC)
Metric: Top-k accuracy
The progress of deep learning for classification
[Simonyan et al., 2015]
[Szegedy et al., 2015]
[Huang et al., 2017]
[He et al., 2016]
[Krizhevsky et al., 2012]
Top-5 error rate
General formulation for all these variants
32
Image (pixels)
Recap: classification
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Deep neural networks (DNN)
Convolution
A special computation between layers
35
Convolution
36
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
Feature map (nodes) at layer t
Feature map at layer t+1
“Filter” weights
(3-by-3)
Inner product
Element-wise multiplication and sum
1
Convolution
37
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
“Filter” weights
(3-by-3)
Inner product
6
Feature map (nodes) at layer t
Feature map at layer t+1
Convolution
38
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
“Filter” weights
(3-by-3)
Inner product
1
Zero-padding: Set the missing values to be 0
Feature map (nodes) at layer t
Feature map at layer t+1
Convolution example
39
0 | 0 | 0 |
1 | 1 | 1 |
0 | 0 | 0 |
1 | 1 | 1 |
0 | 0 | 0 |
1 | 1 | 1 |
Convolution
40
“Filter” weights
(3-by-3)
“Filter” weights
(3-by-3-by-“2”)
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
Inner product
Feature map (nodes) at layer t
Feature map at layer t+1
Convolution
41
“Filter” weights
(3-by-3-by-“2”)
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
Inner product
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | |
| | |
| | |
1 | 1 | 1 |
0 | 0 | 0 |
1 | 1 | 1 |
Feature map (nodes) at layer t
Feature map at layer t+1
One filter for one output “channel” to capture a different “pattern” (e.g., edges, circles, eyes, etc.)
Convolution: properties
42
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
Top-left, Top right: has ears
Middle: has eyes
Convolutional neural networks (CNN)
43
Shared weights
Vectorization + FC layers
Max pooling + down-sampling
Representative CNN networks
[Krizhevsky et al., 2012]
[Simonyan et al., 2015]
44
Representative CNN networks
Representative CNN networks
[He et al, 2016]
[Huang et al, 2017]
46
Advantages:
Representative CNN networks
A general architecture involves
47
Training a DNN for classification
48
100: elephant
Minimize the empirical risk
Four factors behind deep learning developments
Accessibility to large amount of data
50
Flexible neural networks for modeling
Visual transformers
[Liu et al., 2021]
[Battaglia et al., 2018]
Graph neural networks
[Qi et al., 2017]
PointNet
ConvNet
[Huang et al., 2017]
[Gu et al., 2024]
Recurrent neural networks
Powerful algorithms + losses to learn from data
Bi-level optimization
[Finn et al., 2017]
Adversarial learning
[Ganin et al., 2016]
[He et al., 2020]
Contrastive learning
Diffusion (denoising)
[Ho et al., 2020]
Autoregressive
[El-Nouby et al., 2024]
Preference learning
[Rafailov et al., 2023]
Computational resource
Today
54
Deep neural networks (DNN)
Re-Introduction
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Perceptron
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Perceptron as classifiers
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Learning a classifier
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Multi-layer perceptron (MLP)
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Activations vs. Parameters
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Fast activation and slow parameters
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]