CSE 5524: �Foundation of learning - 4
Homework assignment & midterm
Problem overview
Given
Goal: Generate
Y: 3D height
Z: 3D depth
Convention
i
j
For implementation, locations are indexed by [I, j]
Y: 3D height
Z: 3D depth
(i, j)
y: 2D vertical
Cue 1: edges (white pixels mean edges)
All edges
Contact edges
Vertical edges
Horizontal
You need to find edge locations!
Put all the constraints together
Least square solution!
For example,
A is like 2300-by-1681
Why linear system? Try this toy example
1
4
7
2
?
8
3
6
9
i
j
How to read a textbook?
My vision of this course
My experiences …
Today
11
Three tools in the search for Truth
Three tools in the search for Truth
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
More data, less over-fitting
[Figure: Bishop, PRML]
Green: true data distribution
Blue: training data
Red: learned model
Today
15
General formulation for all these variants
16
Image (pixels)
Convolution
A special computation between layers
17
Convolution
18
“Filter” weights
(3-by-3)
“Filter” weights
(3-by-3-by-“2”)
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
Inner product
Feature map (nodes) at layer t
Feature map at layer t+1
Convolution
19
“Filter” weights
(3-by-3-by-“2”)
| | |
| | |
| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 1 | 1 |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
Inner product
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | |
| | |
| | |
1 | 1 | 1 |
0 | 0 | 0 |
1 | 1 | 1 |
Feature map (nodes) at layer t
Feature map at layer t+1
One filter for one output “channel” to capture a different “pattern” (e.g., edges, circles, eyes, etc.)
Convolution: properties
20
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
Top-left, Top right: has ears
Middle: has eyes
Convolutional neural networks (CNN)
21
Shared weights
Vectorization + FC layers
Max pooling + down-sampling
Representative CNN networks
[Krizhevsky et al., 2012]
[Simonyan et al., 2015]
22
Representative CNN networks
Representative CNN networks
[He et al, 2016]
[Huang et al, 2017]
24
Advantages:
Representative CNN networks
A general architecture involves
25
Training a DNN for classification
26
100: elephant
Minimize the empirical risk
Four factors behind deep learning developments
Accessibility to large amount of data
28
Flexible neural networks for modeling
Visual transformers
[Liu et al., 2021]
[Battaglia et al., 2018]
Graph neural networks
[Qi et al., 2017]
PointNet
ConvNet
[Huang et al., 2017]
[Gu et al., 2024]
Recurrent neural networks
Powerful algorithms + losses to learn from data
Bi-level optimization
[Finn et al., 2017]
Adversarial learning
[Ganin et al., 2016]
[He et al., 2020]
Contrastive learning
Diffusion (denoising)
[Ho et al., 2020]
Autoregressive
[El-Nouby et al., 2024]
Preference learning
[Rafailov et al., 2023]
Computational resource
Today
32
Deep neural networks (DNN)
Re-Introduction
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Perceptron
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Perceptron as classifiers
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Learning a classifier
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Multi-layer perceptron (MLP)
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Activations vs. Parameters
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Fast activation and slow parameters
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Why do we need activation?
Deep Nets
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Deep Nets
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Deep nets are universal approximator
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Deep learning: learning with neural nets
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Data structure
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Data structure
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Layers
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Reading