1 of 49

CSE 5524: �Foundation of learning - 4

2 of 49

Homework assignment & midterm

  • Homework – 1
    • Released on 2/4 morning
    • Due on 2/18 midnight

  • Homework – 2
    • Release around 2/14
    • Due on 2/28 midnight

  • Midterm
    • 3/4 in class

3 of 49

Problem overview

Given

Goal: Generate

Y: 3D height

Z: 3D depth

4 of 49

Convention

  • In this homework, given a map (or a matrix), say I
    • I[i, j] means the i-th horizontal index (left-right) and j-th vertical index (bottom-up)
    • i >= 0, j >= 0

i

j

5 of 49

For implementation, locations are indexed by [I, j]

Y: 3D height

Z: 3D depth

(i, j)

y: 2D vertical

 

 

6 of 49

Cue 1: edges (white pixels mean edges)

All edges

Contact edges

Vertical edges

Horizontal

You need to find edge locations!

7 of 49

Put all the constraints together

 

Least square solution!

For example,

 

 

 

 

A is like 2300-by-1681

8 of 49

Why linear system? Try this toy example

1

4

7

2

?

8

3

6

9

 

 

 

i

j

9 of 49

How to read a textbook?

  • Key takeaway
    • Don’t do “linear” pass
    • Skip math equations for the first time if you cannot grasp it right away
    • Focus on its goal, not its detail
    • Understanding the topic, context, and scope is more important than understanding the equations, at least in the first pass

10 of 49

My vision of this course

  • You learn breadth and depth
  • You read the textbook to expand your understanding and scope
  • You are comfortable chatting with people about what computer vision is and what it can do

My experiences

  • If CV is what you are interested in …
  • If CV is what you simply need credits …

11 of 49

Today

  • Recap
  • Neural networks overview (continued)
  • Neural networks

11

12 of 49

Three tools in the search for Truth

  • Data: what we observe

  • Prior: what we prefer & believe

  • Hypotheses: what the true function may be

13 of 49

Three tools in the search for Truth

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

14 of 49

More data, less over-fitting

 

 

 

[Figure: Bishop, PRML]

Green: true data distribution

Blue: training data

Red: learned model

15 of 49

Today

  • Recap
  • Neural networks overview (continued)
  • Neural networks

15

16 of 49

General formulation for all these variants

16

 

Image (pixels)

 

 

 

 

17 of 49

Convolution

A special computation between layers

  • A current node is not directly affected by “all nodes in the previous layer”
  • The network “weights” on the edges can be “re-used”

17

18 of 49

Convolution

18

“Filter” weights

(3-by-3)

“Filter” weights

(3-by-3-by-“2”)

0

0

0

0

1

0

0

0

1

1

0

0

1

1

1

0

1

1

1

1

1

1

1

1

1

0

0

1

0

1

1

1

1

1

Inner product

Feature map (nodes) at layer t

Feature map at layer t+1

19 of 49

Convolution

19

“Filter” weights

(3-by-3-by-“2”)

0

0

0

0

1

0

0

0

1

1

0

0

1

1

1

0

1

1

1

1

1

1

1

1

1

0

0

1

0

1

1

1

1

1

Inner product

1

1

1

0

0

0

1

1

1

Feature map (nodes) at layer t

Feature map at layer t+1

One filter for one output “channel” to capture a different “pattern” (e.g., edges, circles, eyes, etc.)

20 of 49

Convolution: properties

  • Process nearby pixels together
  • Translation invariant: “local patterns” can show up at different pixel locations
  • Can process arbitrary-size images

20

Top-left, Top right: has ears

Middle: has eyes

21 of 49

Convolutional neural networks (CNN)

21

Shared weights

Vectorization + FC layers

Max pooling + down-sampling

  • Remove redundancy
  • Translation-invariant
  • Enlarge receptive filed

22 of 49

Representative CNN networks

  • AlexNet

[Krizhevsky et al., 2012]

  • VGGnet

[Simonyan et al., 2015]

22

  • A block: parameters/compute
  • Edge: activation/tensors

23 of 49

Representative CNN networks

  • GoogleNet [Szegedy et al., 2014]
  • Inception

24 of 49

Representative CNN networks

  • ResNet

[He et al, 2016]

  • DenseNet

[Huang et al, 2017]

24

 

 

 

Advantages:

  • Optimization
  • Collect more information
  • A block: parameters/compute
  • Edge: activation/tensors

25 of 49

Representative CNN networks

A general architecture involves

  • Multiple layers of convolutions + ReLU (nonlinearity) + pooling + striding
  • These result in a (final) feature map
    • Positions on the map correspond to the image
  • The map goes through FC layers (MLP)
  • Usually, we keep the network till the feature map
    • For feature extraction
    • For down-stream tasks
    • For image-to-image search

25

26 of 49

Training a DNN for classification

  •  

26

100: elephant

 

Minimize the empirical risk

27 of 49

Four factors behind deep learning developments

  • Data

  • Neural network architecture

  • Powerful “learning” algorithms and loss

  • Computational resource

28 of 49

Accessibility to large amount of data

28

29 of 49

Flexible neural networks for modeling

Visual transformers

[Liu et al., 2021]

[Battaglia et al., 2018]

Graph neural networks

[Qi et al., 2017]

PointNet

ConvNet

[Huang et al., 2017]

[Gu et al., 2024]

Recurrent neural networks

30 of 49

Powerful algorithms + losses to learn from data

Bi-level optimization

[Finn et al., 2017]

Adversarial learning

[Ganin et al., 2016]

[He et al., 2020]

Contrastive learning

Diffusion (denoising)

[Ho et al., 2020]

Autoregressive

[El-Nouby et al., 2024]

Preference learning

[Rafailov et al., 2023]

31 of 49

Computational resource

32 of 49

Today

  • Recap
  • Neural networks overview (continued)
  • Neural networks

32

33 of 49

Deep neural networks (DNN)

34 of 49

Re-Introduction

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

35 of 49

Perceptron

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

36 of 49

Perceptron as classifiers

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

37 of 49

Learning a classifier

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

38 of 49

Multi-layer perceptron (MLP)

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

39 of 49

Activations vs. Parameters

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

40 of 49

Fast activation and slow parameters

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

41 of 49

Why do we need activation?

  •  

42 of 49

Deep Nets

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

43 of 49

Deep Nets

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

44 of 49

Deep nets are universal approximator

  • Deep or wider enough

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

45 of 49

Deep learning: learning with neural nets

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

46 of 49

Data structure

  • Classification problem

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

47 of 49

Data structure

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

48 of 49

Layers

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

49 of 49

Reading

  • Read 12.7.3
  • Read 12.7.4
  • Read 12.8