CSE 5524: �(Foundations of) Computer Vision
Course information
https://sites.google.com/view/osu-cse-5524-au25-chao
(for course information, weekly schedule, and reading assignment update)
Dr. Wei-Lun (Harry) Chao (chao.209), Office: DL 587
Associate professor in CSE (PhD: USC; Postdoc: Cornell)
Zheda Mai (mai.145), CSE PhD student
2
A bit about me
Machine learning and its applications to
3
Pancreatic
cancer
A bit about me
4
Learning with “imperfect” data
5
[Zhu et al., 2014]
KITTI
(Germany)
Argoverse
(USA)
nuScenes
(USA, Singapore)
Lyft
(USA)
Waymo
(USA)
[Wang et al., 2020]
Course information
Course information
7
Communications
8
Questions?
Grading and homework (tentative)
Grading (subject to slight change)
Guidelines
Final project first glance (SP25 version, to update)
Final project first glance (SP25 version, to update)
Tentative schedule
Homework
Exams & final project presentation
Policy
Academic integrity
(Re-)grading
Syllabus: accessible on the website and carmen
Standard syllabus statements
Link: https://ugeducation.osu.edu/academics/syllabus-policies-statements/standard-syllabus-statements
Optional syllabus statements
Link: https://ugeducation.osu.edu/academics/syllabus-policies-statements/optional-syllabus-statements
Pre-requisites & what to expect?
16
Review materials
17
Caution!
18
This is a “graduate-level” course!
Caution!
19
One of the aims is to provide students with a strong foundational background, enabling them to pursue computer-vision-centered or machine-learning-centered MS/PhD paths or explore future opportunities in the computer vision, machine learning, and artificial intelligence industries.
This is a “graduate-level” course!
Caution!
20
The course is not simply knowledge feeding, and I will leave space for you to read, think, and explore!
This is a “graduate-level” course!
Questions?
Course descriptions & goals
22
Course descriptions & goals
23
Textbook
24
Foundations of Computer Vision
Suggested References
25
Generative Deep Learning:
Teaching Machines To Paint, Write, Compose, and Play
(second edition)
Computer Vision: Algorithms and Applications
(second edition)
eBook accessible through the OSU Library website
Other great textbooks
26
Deep Learning:
Foundations and Concepts
Understanding Deep Learning
Dive into Deep Learning
PDF accessible for the 1st and 3rd books – check their websites
Other excellent CV courses
27
Other excellent CV courses
28
Important for this week
29
How to do/learn well?
30
Important dates
31
Questions?
About using AI tools
33
Writing is thinking
34
Writing is thinking
35
About math
36
About math
37
Questions?
Today
Introduction
Course overview
39
What is computer vision?
What is computer vision?
Human vision is capable of extracting information about the world around us using only the light that reflects off surfaces in the direction of our eyes.
Our eyes are sensors. Our brains have to translate the information collected by millions of photoreceptors in our retinas into an interpretation of the world in front of us.
Computer vision studies how to reproduce in a computer the ability to see
Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.
41
Input: the structure of ambient light
42
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Output: measuring lights vs. scene properties
43
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
The study of vision is interdisciplinary
44
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
The study of vision is interdisciplinary
45
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Visual pathways
46
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Questions?
What is computer vision?
Computer vision studies how to reproduce in a computer the ability to see
Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.
48
Vision is the process of discovering from images what is presented in the world, and where it is.
David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.
Computer vision
49
[Source: Detectron2]
[Source: Graham Murdoch/Popular Science]
Computer vision: data
50
Image (s)
Video (s) = sequence of images
RGB image (s): Three matrices
Computer vision: data
51
0 | 0 | 124 | 255 | 125 |
0 | 0 | 125 | 126 | 60 |
0 | 0 | 126 | 60 | 126 |
0 | 0 | 0 | 127 | 60 |
0 | 0 | 0 | 0 | 128 |
0 | 0 | 124 | 255 | 125 |
0 | 0 | 125 | 126 | 60 |
0 | 0 | 126 | 60 | 126 |
0 | 0 | 0 | 127 | 60 |
0 | 0 | 0 | 0 | 128 |
0 | 0 | 124 | 255 | 125 |
0 | 0 | 125 | 126 | 60 |
0 | 0 | 126 | 60 | 126 |
0 | 0 | 0 | 127 | 60 |
0 | 0 | 0 | 0 | 128 |
Computer vision: data
52
Image (s)
Video (s) = sequence of images
RGBD image (s): Four matrices
Entry value
= depth
Computer vision: data
53
Point cloud
A collection of 3D (or 4D) points
x coordinate |
y coordinate |
z coordinate |
reflectance |
N points = 3-by-N or 4-by-N matrix
Computer vision: data
54
Image aligned with point cloud
LiDAR-based vision
55
[Source: Graham Murdoch/Popular Science]
LiDAR:
LiDAR-based vision
56
[Credits: Lisa Wu’s presentation]
Questions?
What is computer vision?
Computer vision studies how to reproduce in a computer the ability to see
Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.
58
Vision is the process of discovering from images what is presented in the world, and where it is.
David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.
Three representation directions
59
S: scene
I: image
2: Reconstruction
1: Recognition
tree
3: Generation
tree
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Computer vision: representative tasks
60
Computer vision: representative tasks
Computer vision: representative tasks
62
Retrieval, image-to-image search
Computer vision: representative tasks
63
Depth estimation and 3D reconstruction
Computer vision: representative tasks
64
Computer vision: representative tasks
65
Style transfer
[Figure credit: CycleGAN, ICCV 2017]
Questions?
What is computer vision?
Computer vision studies how to reproduce in a computer the ability to see
Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.
67
Vision is the process of discovering from images what is presented in the world, and where it is.
David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.
How to let computers recognize objects?
A cat?
A lion?
A car?
Percept:
See a picture
Action:
Tell the object class
Human design vs. machine-learning-based
cat
Design
cat
cat
cat
Data
collection
“Learn”
“Coding” the rules:
Can you list the rules of recognizing a cat?
Underlying idea:
Humans sometimes are good at “making decisions” BUT are not good at “explaining decisions”.
Learning-based computer vision
70
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
What is machine learning?
This book is about learning from data.
Sergios Theodoridis. Machine learning: a Bayesian and optimization perspective.
We choose the title “learning from data” that faithfully describes what the subject is about.
Y. Abu-Mostafa, M. Magdon-Ismail, H-T Lin. Learning from data.
Machine Learning Overview
Learning from Data
72
Machine Learning Overview
Learning from Data
Algorithm
Data
Evaluation
73
Machine Learning Overview
Learning from Data
Algorithm
Data
Evaluation
Goal
74
Example: coin classifier
75
Machine learning algorithms
Training data
Learned models
Test data
[Figure credit: Y. Abu-Mostafa, M. Magdon-Ismail, H-T Lin. Learning from data.]
What is deep learning (deep neural networks)?
Image
Label (e.g., dog or cat)
Classifier
See a picture
Tell the object class
A sequence of “learnable” computation!
Example: image classification
77
[Gif credits: Gradient descent 3Blue1Brown series S3 E2]
A sequence of “learnable” computation!
The progress of deep learning
[Simonyan et al., 2015]
[Szegedy et al., 2015]
[Huang et al., 2017]
[He et al., 2016]
[Krizhevsky et al., 2012]
The progress of deep learning
Visual transformers
[Liu et al., 2021]
[Battaglia et al., 2018]
Graph neural networks
[Qi et al., 2017]
PointNet
[Zoph et al., 2017]
Neural architecture search
Questions?
Today
Introduction
Course overview
81
Topics
1. Introduction to computer vision
a. Introduction to the course
b. A simple vision system
82
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Topics
2. Image formation
a. Concepts of imaging and lenses
b. Images and 3D geometry
c. Camera modeling
d. Cameras as linear systems
83
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Topics
3. Foundations of image processing
a. Linear filtering and convolution
b. Fourier analysis
c. Blur filters, image derivatives, and filter banks
d. (Up/down) sampling
e. Image pyramids
84
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Topics
4. Foundations of learning
a. Introduction to learning
b. Gradient-based learning algorithms
c. Generalization
d. Neural networks as distribution transformers
85
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Topics
5. Probabilistic models of images
a. Color
b. Statistical image models
c. Textures
86
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Topics
6. Neural architectures for vision
a. Convolutional neural nets
b. Transformers
87
Visual transformers
[Liu et al., 2021]
[Simonyan et al., 2015]
ConvNet (VGG Net)
Topics
7. Generative image models and representation learning
a. Representation learning
b. Generative models
88
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Topics
8. Understanding vision with semantics and language
a. Visual recognition
b. Vision and language
89
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Topics
9. Challenges in learning-based vision
a. Data bias and shift
b. Robustness and generality
c. Transfer learning and adaptation
90
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
Topics
10. Understanding geometry
a. Stereo vision
b. Homographies
c. Depth estimation from single images
d. Feature detection and matching
e. Multi-view geometry and structure from motion
f. Radiance fields
91
Topics
11. Understanding motion
a. Motion estimation
b. Optical flow estimation
c. Object tracking
92
[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]
TODO
93