JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 49

CSE 5524: �Foundation of learning - 4

2 of 49

Homework assignment & midterm

Homework – 1

Released on 2/4 morning
Due on 2/18 midnight

Homework – 2

Release around 2/14
Due on 2/28 midnight

Midterm

3/4 in class

3 of 49

Problem overview

Given

Goal: Generate

Y: 3D height

Z: 3D depth

4 of 49

Convention

In this homework, given a map (or a matrix), say I

I[i, j] means the i-th horizontal index (left-right) and j-th vertical index (bottom-up)
i >= 0, j >= 0

5 of 49

For implementation, locations are indexed by [I, j]

Y: 3D height

Z: 3D depth

(i, j)

y: 2D vertical

6 of 49

Cue 1: edges (white pixels mean edges)

All edges

Contact edges

Vertical edges

Horizontal

You need to find edge locations!

7 of 49

Put all the constraints together

Least square solution!

For example,

A is like 2300-by-1681

8 of 49

Why linear system? Try this toy example

9 of 49

How to read a textbook?

Check this one: How to Read a Paper https://web.stanford.edu/class/ee384m/Handouts/HowtoReadPaper.pdf

Key takeaway

Don’t do “linear” pass
Skip math equations for the first time if you cannot grasp it right away
Focus on its goal, not its detail
Understanding the topic, context, and scope is more important than understanding the equations, at least in the first pass

10 of 49

My vision of this course

You learn breadth and depth
You read the textbook to expand your understanding and scope
You are comfortable chatting with people about what computer vision is and what it can do

My experiences …

If CV is what you are interested in …
If CV is what you simply need credits …

11 of 49

Today

Recap
Neural networks overview (continued)
Neural networks

12 of 49

Three tools in the search for Truth

Data: what we observe

Prior: what we prefer & believe

Hypotheses: what the true function may be

13 of 49

Three tools in the search for Truth

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

14 of 49

More data, less over-fitting

[Figure: Bishop, PRML]

Green: true data distribution

Blue: training data

Red: learned model

15 of 49

Today

Recap
Neural networks overview (continued)
Neural networks

16 of 49

General formulation for all these variants

Image (pixels)

17 of 49

Convolution

A special computation between layers

A current node is not directly affected by “all nodes in the previous layer”
The network “weights” on the edges can be “re-used”

18 of 49

Convolution

“Filter” weights

(3-by-3)

“Filter” weights

(3-by-3-by-“2”)

0	0	0	0	1
0	0	0	1	1
0	0	1	1	1
0	1	1	1	1
1	1	1	1	1

0	0	1
0	1	1
1	1	1

Inner product

Feature map (nodes) at layer t

Feature map at layer t+1

19 of 49

Convolution

“Filter” weights

(3-by-3-by-“2”)

0	0	0	0	1
0	0	0	1	1
0	0	1	1	1
0	1	1	1	1
1	1	1	1	1

0	0	1
0	1	1
1	1	1

Inner product

1	1	1
0	0	0
1	1	1

Feature map (nodes) at layer t

Feature map at layer t+1

One filter for one output “channel” to capture a different “pattern” (e.g., edges, circles, eyes, etc.)

20 of 49

Convolution: properties

Process nearby pixels together
Translation invariant: “local patterns” can show up at different pixel locations
Can process arbitrary-size images

Top-left, Top right: has ears

Middle: has eyes

21 of 49

Convolutional neural networks (CNN)

Shared weights

Vectorization + FC layers

Max pooling + down-sampling

Remove redundancy
Translation-invariant
Enlarge receptive filed

22 of 49

Representative CNN networks

AlexNet

[Krizhevsky et al., 2012]

VGGnet

[Simonyan et al., 2015]

A block: parameters/compute
Edge: activation/tensors

23 of 49

Representative CNN networks

GoogleNet [Szegedy et al., 2014]
Inception

24 of 49

Representative CNN networks

ResNet

[He et al, 2016]

DenseNet

[Huang et al, 2017]

Advantages:

Optimization
Collect more information

A block: parameters/compute
Edge: activation/tensors

25 of 49

Representative CNN networks

A general architecture involves

Multiple layers of convolutions + ReLU (nonlinearity) + pooling + striding
These result in a (final) feature map

Positions on the map correspond to the image

The map goes through FC layers (MLP)
Usually, we keep the network till the feature map

For feature extraction
For down-stream tasks
For image-to-image search

26 of 49

Training a DNN for classification

100: elephant

Minimize the empirical risk

27 of 49

Four factors behind deep learning developments

Data

Neural network architecture

Powerful “learning” algorithms and loss

Computational resource

28 of 49

Accessibility to large amount of data

29 of 49

Flexible neural networks for modeling

Visual transformers

[Liu et al., 2021]

[Battaglia et al., 2018]

Graph neural networks

[Qi et al., 2017]

PointNet

ConvNet

[Huang et al., 2017]

[Gu et al., 2024]

Recurrent neural networks

30 of 49

Powerful algorithms + losses to learn from data

Bi-level optimization

[Finn et al., 2017]

Adversarial learning

[Ganin et al., 2016]

[He et al., 2020]

Contrastive learning

Diffusion (denoising)

[Ho et al., 2020]

Autoregressive

[El-Nouby et al., 2024]

Preference learning

[Rafailov et al., 2023]

31 of 49

Computational resource

32 of 49

Today

Recap
Neural networks overview (continued)
Neural networks

33 of 49

Deep neural networks (DNN)

34 of 49

Re-Introduction

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

35 of 49

Perceptron

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

36 of 49

Perceptron as classifiers

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

37 of 49

Learning a classifier

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

38 of 49

Multi-layer perceptron (MLP)

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

39 of 49

Activations vs. Parameters

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

40 of 49

Fast activation and slow parameters

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

41 of 49

Why do we need activation?

42 of 49

Deep Nets

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

43 of 49

Deep Nets

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

44 of 49

Deep nets are universal approximator

Deep or wider enough

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

45 of 49

Deep learning: learning with neural nets

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

46 of 49

Data structure

Classification problem

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

47 of 49

Data structure

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

48 of 49

Layers

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

49 of 49

Reading

Read 12.7.3
Read 12.7.4
Read 12.8