1 of 93

CSE 5524: �(Foundations of) Computer Vision

2 of 93

Course information

Course website:

https://sites.google.com/view/osu-cse-5524-au25-chao

(for course information, weekly schedule, and reading assignment update)

Instructor:

Dr. Wei-Lun (Harry) Chao (chao.209), Office: DL 587

Associate professor in CSE (PhD: USC; Postdoc: Cornell)

TA:

Zheda Mai (mai.145), CSE PhD student

2

3 of 93

A bit about me

Machine learning and its applications to

Autonomous driving
Computer vision
Natural language processing
Health care
Imageomics

3

Pancreatic

cancer

4 of 93

A bit about me

4

5 of 93

Learning with “imperfect” data

Limited data and supervision
Imbalanced data
Inaccessible data
Domain shifts

5

[Zhu et al., 2014]

KITTI

(Germany)

Argoverse

(USA)

nuScenes

(USA, Singapore)

Lyft

(USA)

Waymo

(USA)

[Wang et al., 2020]

6 of 93

Course information

Lecture time: Tuesday and Thursday, 12:45 PM - 2:05 PM

Office hours: DL587

Tentatively, Tuesday 3 – 4 pm & Friday 9 – 10 am
No office hours the first week

TA Office hours: tentatively BE406

Tentatively, Monday 11 am - 12 pm & Wednesday 2 pm - 3 pm
No office hours the first week

7 of 93

Course information

Carmen/GitHub:

For announcement, posting course materials (slides), and homework submission
Caution! We won’t read the Inbox in Carmen

Piazza:

For discussion. Please register!
Link: https://piazza.com/osu/autumn2025/cse5524
Please use name.#@osu.edu or or buckeyemail
Access code: osu-cse-5524-AU25-chao

Detailed syllabus (pdf):

can be found on Carmen and the course website

7

8 of 93

Communications

Schedule and reading will be updated on the website
Announcements will be made through Carmen
Discussions and questions must be posted in Piazza
Please only use email to contact me or the TA for urgent or personal issues. Please include the tag "[OSU-CSE-5524]" in the subject line.
More details: See website, Carmen, and the syllabus

8

9 of 93

Questions?

10 of 93

Grading and homework (tentative)

Grading (subject to slight change)

Quiz – 4% (linear algebra)
Homework – 50%
Midterm (October 23, in class) – 20%
Final project – 26%
The final project consists of multiple parts:

proposal sketch
proposal
project presentation (December 16, 2 – 6 pm, scheduled final exam time)
project report

Guidelines

Expect around 6 homework assignments (including problem and programming sets)

Solutions may involve derivations. Grading is based on correctness and clarity. Be concise and show your reasoning in a clear and precise way.
Homework completion and submissions are individual, but feel free to discuss. You must strictly follow the submission instructions.
NOT ALLOWED: ask/search for solutions
No individual late days are accepted.

11 of 93

Final project first glance (SP25 version, to update)

Team forming:

2 – 3 students: same expectation
MUST be a team project

Steps:

Team forming: starting in September
Project sketch: 1 page at most: what you plan to do, who your teammates are.
Project proposal: 2 – 3 pages
Project presentation: 7 – 10 minutes
Project report & code release: academic paper format (e.g., NeurIPS), LaTeX is required

12 of 93

Final project first glance (SP25 version, to update)

Pre-defined tasks:

CVPR 2025 competition: https://cvpr.thecvf.com/Conferences/2025/workshop-list

Keywords: competition & challenge
Example: https://sites.google.com/view/fgvc12/competitions

Reproducing existing algorithms
Benchmarking existing algorithms
Reproducing examples in the textbook

Self-defined “research” tasks:

Need approval
Need justification (not your lab’s work)

13 of 93

Tentative schedule

Homework

Dates: TBA

The first homework might be released next week

You will have 1~2 weeks to complete each homework
Due is at 23:59 ET

Exams & final project presentation

Midterm date(s): 10/23/2025
Final project presentation: 12/16/2025

14 of 93

Policy

Academic integrity

Plagiarism and other unacceptable violations

Zero tolerance
I MUST report incidents

Please study the related sections at the end of the syllabus (pdf) on academic integrity.
Please read OAA’s message on large language models: https://oaa.osu.edu/artificial-intelligence-and-academic-integrity

(Re-)grading

Only factual errors will be corrected.
Request: one week within the release of your homework and exam grade
Format: TBA

15 of 93

Syllabus: accessible on the website and carmen

Standard syllabus statements

Link: https://ugeducation.osu.edu/academics/syllabus-policies-statements/standard-syllabus-statements

Academic Misconduct
Artificial Intelligence and Academic Integrity
Religious Accommodations
Disability Statement (with Accommodations for Illness)
Intellectual Diversity
Grievances and Solving Problems
Creating an Environment Free from Harassment, Discrimination, and Sexual Misconduct

Optional syllabus statements

Link: https://ugeducation.osu.edu/academics/syllabus-policies-statements/optional-syllabus-statements

Copyright
Counseling and Consultation Services / Mental Health Statement
Content Warning Language
Military-Connected Students

16 of 93

Pre-requisites & what to expect?

Pre-requisites

Data structures and algorithms: 2331
Statistics and probability: 5522, Stat 3460, or 3470
Decent degree of mathematical sophistication
Knowledge of programming, algorithm design, and data structures

Suggested backgrounds

Linear algebra: Math 2568, 2174, 4568, or 5520H – geometry can be efficiently represented
Artificial intelligence: 3521, 5521, or 5243

Extensive math and programming-related homework

Multivariate calculus, linear algebra, and probability
Python 3
PyTorch & Hugging Face

CV algorithms are often difficult to debug

We strongly recommend that you start early, for both the homework and the final project.

16

17 of 93

Review materials

Please see the website: https://sites.google.com/view/osu-cse-5524-au25-chao

Please see the reading list in the spreadsheet on the website

Linear algebra review slides: https://drive.google.com/drive/folders/1QFyAcxkdl4REQJhlqixTbJYjLBqNjq3Y?usp=share_link

4 points quizzes related to linear algebra: completed by 9/16/2025

17

18 of 93

Caution!

18

This is a “graduate-level” course!

19 of 93

Caution!

19

One of the aims is to provide students with a strong foundational background, enabling them to pursue computer-vision-centered or machine-learning-centered MS/PhD paths or explore future opportunities in the computer vision, machine learning, and artificial intelligence industries.

This is a “graduate-level” course!

20 of 93

Caution!

20

The course is not simply knowledge feeding, and I will leave space for you to read, think, and explore!

This is a “graduate-level” course!

21 of 93

Questions?

22 of 93

Course descriptions & goals

Course Description: Computer vision algorithms for use in human-computer interactive systems; image formation, image features, image processing, object recognition, image generation, 3D from images, and applications.

This course focuses on the foundations of computer vision, with particular emphasis on learning-based methods and 3D. To build background, the course covers the basics of image formation, camera modeling, machine learning, and neural networks. With this groundwork, the course introduces image-processing-based methods and probabilistic models of images. Then, the course explores modern neural network architectures for computer vision, including convolutional neural networks and transformers. The course then builds upon these models to develop algorithms for image feature extraction, visual recognition, image generation, and vision-and-language understanding. Moving beyond single images, the course further introduces stereo vision and multi-view vision, including structure from motion and neural radiance fields. Finally, the course introduces algorithms for motion estimation and tracking. Along with the course, representative applications of computer vision will be introduced and discussed.

22

23 of 93

Course descriptions & goals

Course Goals / Objectives:

Master fundamental and recent computer vision concepts and algorithms
Be competent with computer vision application design and evaluation
Gain a deep understanding of learning-based algorithms and 3D inference for computer vision
Be exposed to original research and applications in computer vision
Be familiar with the Python/PyTorch programming environment
More broadly, the aim is to provide students with a strong foundational background, enabling them to pursue computer-vision-centered or machine-learning-centered MS/PhD paths or explore future opportunities in the computer vision, machine learning, and artificial intelligence industries.

23

24 of 93

Textbook

Required

https://mitpress.mit.edu/9780262048972/foundations-of-computer-vision/
Purchasable on MIT Press Bookstore or Amazon
eBook accessible through the OSU library website

24

Foundations of Computer Vision

25 of 93

Suggested References

25

Generative Deep Learning:

Teaching Machines To Paint, Write, Compose, and Play

(second edition)

Computer Vision: Algorithms and Applications

(second edition)

eBook accessible through the OSU Library website

26 of 93

Other great textbooks

26

Deep Learning:

Foundations and Concepts

Understanding Deep Learning

Dive into Deep Learning

PDF accessible for the 1^st and 3^rd books – check their websites

27 of 93

Other excellent CV courses

MIT

Stanford

CS 131: http://vision.stanford.edu/teaching/cs131_fall2223/
CS 231n: http://cs231n.stanford.edu/

CMU: http://16385.courses.cs.cmu.edu/spring2024/

Brown CV: https://browncsci1430.github.io/

27

28 of 93

Other excellent CV courses

NYU CV: https://www.sainingxie.com/cv-fall2024/

Wisconsin-Madison CV: https://sites.google.com/view/cs639spring2023dlcv

Michigan CV

Computer vision courses are hard to be comprehensive and unified

3D vision, generative vision, deep learning for vision, robotic vision, etc.
Even the basic CV courses can be very different

28

29 of 93

Important for this week

Register. If you are on the waitlist, you might or might not get in, depending on how many empty seats or how many students drop.
Register for the class on Piazza --- our main platform for discussion and communication
Math review/self-diagnostic: do “CSE5523” Homework #0 and check suggested materials on the website (e.g., linear algebra slides) --- extremely important to check your readiness for the course
Python: check suggested tutorials on the website
Decision: stay or drop

Office hours: start next week

29

30 of 93

How to do/learn well?

Lecture and lecture slides for basics

Describe basic concepts, tools
Describe algorithms and their development with intuition and rigor

Textbook reading for completeness and extension

Homework for practice, generalization, and implementation

Final project for thinking, exploration, implementation, and integration

Discussion (Piazza, office hours) for further understanding

30

31 of 93

Important dates

Midterm: in class (date: 10/23/2025, subject to change)

Final project presentation: in class

12/16/2025, 2 – 6 pm
Following and extending the final exam schedule, which is 2 – 3:45 pm

Online (or pre-recorded) teaching or guest lectures

For some weeks, I may be traveling, such as 10/21 and 12/2

No class

12/4: replaced by the extended final project presentation

31

32 of 93

Questions?

33 of 93

About using AI tools

Please check the syllabus about the university’s policy

33

https://www.nature.com/articles/s44222-025-00323-4.pdf

34 of 93

Writing is thinking

I “love” writing papers because it gives me a chance to rethink what I’m doing, identify logical holes, and uncover implications we miss. I hate (or love) but keep writing rebuttals because it helps sharpen my skills to convince people and debate.

34

35 of 93

Writing is thinking

35

36 of 93

About math

Groundbreaking ideas in computer vision often come from math and physical concepts or insights

36

37 of 93

About math

Groundbreaking ideas in computer vision often come from math and physical concepts or insights

37

38 of 93

Questions?

39 of 93

Today

Introduction

What is computer vision?

Course overview

39

40 of 93

What is computer vision?

41 of 93

What is computer vision?

Human vision is capable of extracting information about the world around us using only the light that reflects off surfaces in the direction of our eyes.

Our eyes are sensors. Our brains have to translate the information collected by millions of photoreceptors in our retinas into an interpretation of the world in front of us.

Computer vision studies how to reproduce in a computer the ability to see

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.

41

42 of 93

Input: the structure of ambient light

42

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

43 of 93

Output: measuring lights vs. scene properties

43

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

44 of 93

The study of vision is interdisciplinary

involving many disciplines (physics, phycology, biology, neuroscience, art, and computer science)

44

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

45 of 93

The study of vision is interdisciplinary

Gestalt phycology grouping rules for perceptual organization

45

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

46 of 93

Visual pathways

46

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

47 of 93

Questions?

48 of 93

What is computer vision?

Computer vision studies how to reproduce in a computer the ability to see

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.

48

Vision is the process of discovering from images what is presented in the world, and where it is.

David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.

49 of 93

Computer vision

49

[Source: Detectron2]

A computer sees the world through sensors, which generate images, videos, point cloud, etc.

[Source: Graham Murdoch/Popular Science]

50 of 93

Computer vision: data

50

Image (s)

Video (s) = sequence of images

RGB image (s): Three matrices

51 of 93

Computer vision: data

RGB images:

What is inside each matrix?

{0,1,……,255}
Interval: [0, 1]

51

124	255	125
125	126	60
126	60	126
0	127	60
0	0	128

124	255	125
125	126	60
126	60	126
0	127	60
0	0	128

124	255	125
125	126	60
126	60	126
0	127	60
0	0	128

52 of 93

Computer vision: data

52

Image (s)

Video (s) = sequence of images

RGBD image (s): Four matrices

Entry value

= depth

53 of 93

Computer vision: data

53

Point cloud

A collection of 3D (or 4D) points

x coordinate
y coordinate
z coordinate
reflectance

N points = 3-by-N or 4-by-N matrix

54 of 93

Computer vision: data

54

Image aligned with point cloud

55 of 93

LiDAR-based vision

55

[Source: Graham Murdoch/Popular Science]

LiDAR:

Light Detection and Ranging sensor
accurate 3D point clouds of the environment, centered at the ego-car

56 of 93

LiDAR-based vision

A point cloud is formed by LiDAR responses within a short time period

56

[Credits: Lisa Wu’s presentation]

57 of 93

Questions?

58 of 93

What is computer vision?

Computer vision studies how to reproduce in a computer the ability to see

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.

58

Vision is the process of discovering from images what is presented in the world, and where it is.

David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.

59 of 93

Three representation directions

59

S: scene

I: image

2: Reconstruction

1: Recognition

tree

3: Generation

tree

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

60 of 93

Computer vision: representative tasks

60

61 of 93

Computer vision: representative tasks

62 of 93

Computer vision: representative tasks

62

Retrieval, image-to-image search

63 of 93

Computer vision: representative tasks

63

Depth estimation and 3D reconstruction

64 of 93

Computer vision: representative tasks

64

65 of 93

Computer vision: representative tasks

65

Style transfer

[Figure credit: CycleGAN, ICCV 2017]

66 of 93

Questions?

67 of 93

What is computer vision?

Computer vision studies how to reproduce in a computer the ability to see

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.

67

Vision is the process of discovering from images what is presented in the world, and where it is.

David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.

68 of 93

How to let computers recognize objects?

A cat?

A lion?

A car?

Percept:

See a picture

Action:

Tell the object class

69 of 93

Human design vs. machine-learning-based

cat

Design

cat

Data

collection

“Learn”

“Coding” the rules:

Can you list the rules of recognizing a cat?

Underlying idea:

Humans sometimes are good at “making decisions” BUT are not good at “explaining decisions”.

70 of 93

Learning-based computer vision

70

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

71 of 93

What is machine learning?

This book is about learning from data.

Sergios Theodoridis. Machine learning: a Bayesian and optimization perspective.

We choose the title “learning from data” that faithfully describes what the subject is about.

Y. Abu-Mostafa, M. Magdon-Ismail, H-T Lin. Learning from data.

72 of 93

Machine Learning Overview

What is machine learning?

Learning from Data

72

73 of 93

Machine Learning Overview

What is machine learning?

Learning from Data

Algorithm

Data

Evaluation

73

74 of 93

Machine Learning Overview

What is machine learning?

Learning from Data

Algorithm

Data

Evaluation

Goal

74

75 of 93

Example: coin classifier

75

Machine learning algorithms

Training data

Learned models

Test data

[Figure credit: Y. Abu-Mostafa, M. Magdon-Ismail, H-T Lin. Learning from data.]

76 of 93

What is deep learning (deep neural networks)?

Image

Label (e.g., dog or cat)

Classifier

See a picture

Tell the object class

A sequence of “learnable” computation!

77 of 93

Example: image classification

77

[Gif credits: Gradient descent 3Blue1Brown series S3 E2]

A sequence of “learnable” computation!

78 of 93

The progress of deep learning

[Simonyan et al., 2015]

[Szegedy et al., 2015]

[Huang et al., 2017]

[He et al., 2016]

[Krizhevsky et al., 2012]

79 of 93

The progress of deep learning

Visual transformers

[Liu et al., 2021]

[Battaglia et al., 2018]

Graph neural networks

[Qi et al., 2017]

PointNet

[Zoph et al., 2017]

Neural architecture search

80 of 93

Questions?

81 of 93

Today

Introduction

What is computer vision?

Course overview

81

82 of 93

Topics

1. Introduction to computer vision

a. Introduction to the course

b. A simple vision system

82

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

83 of 93

Topics

2. Image formation

a. Concepts of imaging and lenses

b. Images and 3D geometry

c. Camera modeling

d. Cameras as linear systems

83

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

84 of 93

Topics

3. Foundations of image processing

a. Linear filtering and convolution

b. Fourier analysis

c. Blur filters, image derivatives, and filter banks

d. (Up/down) sampling

e. Image pyramids

84

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

85 of 93

Topics

4. Foundations of learning

a. Introduction to learning

b. Gradient-based learning algorithms

c. Generalization

d. Neural networks as distribution transformers

85

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

86 of 93

Topics

5. Probabilistic models of images

a. Color

b. Statistical image models

c. Textures

86

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

87 of 93

Topics

6. Neural architectures for vision

a. Convolutional neural nets

b. Transformers

87

Visual transformers

[Liu et al., 2021]

[Simonyan et al., 2015]

ConvNet (VGG Net)

88 of 93

Topics

7. Generative image models and representation learning

a. Representation learning

b. Generative models

88

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

89 of 93

Topics

8. Understanding vision with semantics and language

a. Visual recognition

b. Vision and language

89

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

90 of 93

Topics

9. Challenges in learning-based vision

a. Data bias and shift

b. Robustness and generality

c. Transfer learning and adaptation

90

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

91 of 93

Topics

10. Understanding geometry

a. Stereo vision

b. Homographies

c. Depth estimation from single images

d. Feature detection and matching

e. Multi-view geometry and structure from motion

f. Radiance fields

91

92 of 93

Topics

11. Understanding motion

a. Motion estimation

b. Optical flow estimation

c. Object tracking

92

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

93 of 93

TODO

See the beginning slides and the course website for suggested reading
Background review: math and programming

93