1 of 93

CSE 5524: �(Foundations of) Computer Vision

2 of 93

Course information

  • Course website:

https://sites.google.com/view/osu-cse-5524-au25-chao

(for course information, weekly schedule, and reading assignment update)

  • Instructor:

Dr. Wei-Lun (Harry) Chao (chao.209), Office: DL 587

Associate professor in CSE (PhD: USC; Postdoc: Cornell)

  • TA:

Zheda Mai (mai.145), CSE PhD student

2

3 of 93

A bit about me

Machine learning and its applications to

  • Autonomous driving
  • Computer vision
  • Natural language processing
  • Health care
  • Imageomics

3

Pancreatic

cancer

4 of 93

A bit about me

4

5 of 93

Learning with “imperfect” data

  • Limited data and supervision
  • Imbalanced data
  • Inaccessible data
  • Domain shifts

5

[Zhu et al., 2014]

KITTI

(Germany)

Argoverse

(USA)

nuScenes

(USA, Singapore)

Lyft

(USA)

Waymo

(USA)

[Wang et al., 2020]

6 of 93

Course information

  • Lecture time: Tuesday and Thursday, 12:45 PM - 2:05 PM

  • Office hours: DL587
    • Tentatively, Tuesday 3 – 4 pm & Friday 9 – 10 am
    • No office hours the first week

  • TA Office hours: tentatively BE406
    • Tentatively, Monday 11 am - 12 pm & Wednesday 2 pm - 3 pm
    • No office hours the first week

7 of 93

Course information

  • Carmen/GitHub:
    • For announcement, posting course materials (slides), and homework submission
    • Caution! We won’t read the Inbox in Carmen

  • Piazza:
    • For discussion. Please register!
    • Link: https://piazza.com/osu/autumn2025/cse5524
    • Please use name.#@osu.edu or or buckeyemail
    • Access code: osu-cse-5524-AU25-chao

  • Detailed syllabus (pdf):
    • can be found on Carmen and the course website

7

8 of 93

Communications

  • Schedule and reading will be updated on the website
  • Announcements will be made through Carmen
  • Discussions and questions must be posted in Piazza
  • Please only use email to contact me or the TA for urgent or personal issues. Please include the tag "[OSU-CSE-5524]" in the subject line.
  • More details: See website, Carmen, and the syllabus

8

9 of 93

Questions?

10 of 93

Grading and homework (tentative)

 Grading (subject to slight change)

  • Quiz – 4% (linear algebra)
  • Homework – 50%
  • Midterm (October 23, in class) – 20%
  • Final project – 26%
  • The final project consists of multiple parts:
    • proposal sketch
    • proposal
    • project presentation (December 16, 2 – 6 pm, scheduled final exam time)
    • project report

Guidelines

  •  Expect around 6 homework assignments (including problem and programming sets)
    • Solutions may involve derivations. Grading is based on correctness and clarity. Be concise and show your reasoning in a clear and precise way.
    • Homework completion and submissions are individual, but feel free to discuss. You must strictly follow the submission instructions.
    • NOT ALLOWED: ask/search for solutions
    • No individual late days are accepted.

11 of 93

Final project first glance (SP25 version, to update)

  • Team forming:
    • 2 – 3 students: same expectation
    • MUST be a team project

  • Steps:
    • Team forming: starting in September
    • Project sketch: 1 page at most: what you plan to do, who your teammates are.
    • Project proposal: 2 – 3 pages
    • Project presentation: 7 – 10 minutes
    • Project report & code release: academic paper format (e.g., NeurIPS), LaTeX is required

12 of 93

Final project first glance (SP25 version, to update)

  • Pre-defined tasks:

    • Reproducing existing algorithms
    • Benchmarking existing algorithms
    • Reproducing examples in the textbook

  • Self-defined “research” tasks:
    • Need approval
    • Need justification (not your lab’s work)

13 of 93

Tentative schedule

Homework

  • Dates: TBA
    • The first homework might be released next week
  • You will have 1~2 weeks to complete each homework
  • Due is at 23:59 ET

Exams & final project presentation

  • Midterm date(s): 10/23/2025
  • Final project presentation: 12/16/2025

14 of 93

Policy

Academic integrity

  • Plagiarism and other unacceptable violations
    • Zero tolerance
    • I MUST report incidents
  • Please study the related sections at the end of the syllabus (pdf) on academic integrity.
  • Please read OAA’s message on large language models: https://oaa.osu.edu/artificial-intelligence-and-academic-integrity

(Re-)grading

  • Only factual errors will be corrected.
  • Request: one week within the release of your homework and exam grade
  • Format: TBA

15 of 93

Syllabus: accessible on the website and carmen

Standard syllabus statements

Link: https://ugeducation.osu.edu/academics/syllabus-policies-statements/standard-syllabus-statements

  • Academic Misconduct
  • Artificial Intelligence and Academic Integrity
  • Religious Accommodations
  • Disability Statement (with Accommodations for Illness)
  • Intellectual Diversity
  • Grievances and Solving Problems
  • Creating an Environment Free from Harassment, Discrimination, and Sexual Misconduct

Optional syllabus statements

Link: https://ugeducation.osu.edu/academics/syllabus-policies-statements/optional-syllabus-statements

  • Copyright
  • Counseling and Consultation Services / Mental Health Statement
  • Content Warning Language
  • Military-Connected Students

16 of 93

Pre-requisites & what to expect?

  • Pre-requisites
    • Data structures and algorithms: 2331
    • Statistics and probability: 5522, Stat 3460, or 3470
    • Decent degree of mathematical sophistication
    • Knowledge of programming, algorithm design, and data structures
  • Suggested backgrounds
    • Linear algebra: Math 2568, 2174, 4568, or 5520H – geometry can be efficiently represented
    • Artificial intelligence: 3521, 5521, or 5243
  • Extensive math and programming-related homework
    • Multivariate calculus, linear algebra, and probability
    • Python 3
    • PyTorch & Hugging Face
  • CV algorithms are often difficult to debug
    • We strongly recommend that you start early, for both the homework and the final project.

16

17 of 93

Review materials

  • Please see the reading list in the spreadsheet on the website

  • 4 points quizzes related to linear algebra: completed by 9/16/2025

17

18 of 93

Caution!

18

This is a “graduate-level” course!

19 of 93

Caution!

19

One of the aims is to provide students with a strong foundational background, enabling them to pursue computer-vision-centered or machine-learning-centered MS/PhD paths or explore future opportunities in the computer vision, machine learning, and artificial intelligence industries. 

This is a “graduate-level” course!

20 of 93

Caution!

20

The course is not simply knowledge feeding, and I will leave space for you to read, think, and explore!

This is a “graduate-level” course!

21 of 93

Questions?

22 of 93

Course descriptions & goals

  • Course Description:  Computer vision algorithms for use in human-computer interactive systems; image formation, image features, image processing, object recognition, image generation, 3D from images, and applications.

  • This course focuses on the foundations of computer vision, with particular emphasis on learning-based methods and 3D. To build background, the course covers the basics of image formation, camera modeling, machine learning, and neural networks. With this groundwork, the course introduces image-processing-based methods and probabilistic models of images. Then, the course explores modern neural network architectures for computer vision, including convolutional neural networks and transformers. The course then builds upon these models to develop algorithms for image feature extraction, visual recognition, image generation, and vision-and-language understanding. Moving beyond single images, the course further introduces stereo vision and multi-view vision, including structure from motion and neural radiance fields. Finally, the course introduces algorithms for motion estimation and tracking. Along with the course, representative applications of computer vision will be introduced and discussed. 

22

23 of 93

Course descriptions & goals

  • Course Goals / Objectives: 
    • Master fundamental and recent computer vision concepts and algorithms
    • Be competent with computer vision application design and evaluation
    • Gain a deep understanding of learning-based algorithms and 3D inference for computer vision
    • Be exposed to original research and applications in computer vision
    • Be familiar with the Python/PyTorch programming environment
    • More broadly, the aim is to provide students with a strong foundational background, enabling them to pursue computer-vision-centered or machine-learning-centered MS/PhD paths or explore future opportunities in the computer vision, machine learning, and artificial intelligence industries. 

23

24 of 93

Textbook

  • Required

24

Foundations of Computer Vision

25 of 93

Suggested References

25

Generative Deep Learning:

Teaching Machines To Paint, Write, Compose, and Play

(second edition)

Computer Vision: Algorithms and Applications

(second edition)

eBook accessible through the OSU Library website

26 of 93

Other great textbooks

26

Deep Learning:

Foundations and Concepts

Understanding Deep Learning

Dive into Deep Learning

PDF accessible for the 1st and 3rd books – check their websites

27 of 93

Other excellent CV courses

  • Brown CV: https://browncsci1430.github.io/

27

28 of 93

Other excellent CV courses

  • Computer vision courses are hard to be comprehensive and unified
    • 3D vision, generative vision, deep learning for vision, robotic vision, etc.
    • Even the basic CV courses can be very different

28

29 of 93

Important for this week

  • Register. If you are on the waitlist, you might or might not get in, depending on how many empty seats or how many students drop.
  • Register for the class on Piazza --- our main platform for discussion and communication
  • Math review/self-diagnostic: do “CSE5523” Homework #0 and check suggested materials on the website (e.g., linear algebra slides) --- extremely important to check your readiness for the course
  • Python: check suggested tutorials on the website
  • Decision: stay or drop

  • Office hours: start next week

29

30 of 93

How to do/learn well?

  • Lecture and lecture slides for basics
    • Describe basic concepts, tools
    • Describe algorithms and their development with intuition and rigor

  • Textbook reading for completeness and extension

  • Homework for practice, generalization, and implementation

  • Final project for thinking, exploration, implementation, and integration

  • Discussion (Piazza, office hours) for further understanding

30

31 of 93

Important dates

  • Midterm: in class (date: 10/23/2025, subject to change)

  • Final project presentation: in class
    • 12/16/2025, 2 – 6 pm
    • Following and extending the final exam schedule, which is 2 – 3:45 pm

  • Online (or pre-recorded) teaching or guest lectures
    • For some weeks, I may be traveling, such as 10/21 and 12/2

  • No class
    • 12/4: replaced by the extended final project presentation

31

32 of 93

Questions?

33 of 93

About using AI tools

  • Please check the syllabus about the university’s policy

33

34 of 93

Writing is thinking

  • I “love” writing papers because it gives me a chance to rethink what I’m doing, identify logical holes, and uncover implications we miss. I hate (or love) but keep writing rebuttals because it helps sharpen my skills to convince people and debate.

34

35 of 93

Writing is thinking

35

36 of 93

About math

  • Groundbreaking ideas in computer vision often come from math and physical concepts or insights

36

37 of 93

About math

  • Groundbreaking ideas in computer vision often come from math and physical concepts or insights

37

38 of 93

Questions?

39 of 93

Today

Introduction

  • What is computer vision?

Course overview

39

40 of 93

What is computer vision?

41 of 93

What is computer vision?

Human vision is capable of extracting information about the world around us using only the light that reflects off surfaces in the direction of our eyes.

Our eyes are sensors. Our brains have to translate the information collected by millions of photoreceptors in our retinas into an interpretation of the world in front of us.

Computer vision studies how to reproduce in a computer the ability to see

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.

41

42 of 93

Input: the structure of ambient light

42

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

43 of 93

Output: measuring lights vs. scene properties

43

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

44 of 93

The study of vision is interdisciplinary

  • involving many disciplines (physics, phycology, biology, neuroscience, art, and computer science)

44

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

45 of 93

The study of vision is interdisciplinary

  • Gestalt phycology grouping rules for perceptual organization

45

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

46 of 93

Visual pathways

46

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

47 of 93

Questions?

48 of 93

What is computer vision?

Computer vision studies how to reproduce in a computer the ability to see

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.

48

Vision is the process of discovering from images what is presented in the world, and where it is.

David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.

49 of 93

Computer vision

49

[Source: Detectron2]

  • A computer sees the world through sensors, which generate images, videos, point cloud, etc.

[Source: Graham Murdoch/Popular Science]

50 of 93

Computer vision: data

50

Image (s)

Video (s) = sequence of images

RGB image (s): Three matrices

51 of 93

Computer vision: data

  • RGB images:

  • What is inside each matrix?
    • {0,1,……,255}
    • Interval: [0, 1]

51

0

0

124

255

125

0

0

125

126

60

0

0

126

60

126

0

0

0

127

60

0

0

0

0

128

0

0

124

255

125

0

0

125

126

60

0

0

126

60

126

0

0

0

127

60

0

0

0

0

128

0

0

124

255

125

0

0

125

126

60

0

0

126

60

126

0

0

0

127

60

0

0

0

0

128

52 of 93

Computer vision: data

52

Image (s)

Video (s) = sequence of images

RGBD image (s): Four matrices

Entry value

= depth

53 of 93

Computer vision: data

53

Point cloud

A collection of 3D (or 4D) points

x coordinate

y coordinate

z coordinate

reflectance

N points = 3-by-N or 4-by-N matrix

54 of 93

Computer vision: data

54

Image aligned with point cloud

55 of 93

LiDAR-based vision

55

[Source: Graham Murdoch/Popular Science]

LiDAR:

  • Light Detection and Ranging sensor
  • accurate 3D point clouds of the environment, centered at the ego-car

56 of 93

LiDAR-based vision

  • A point cloud is formed by LiDAR responses within a short time period

56

[Credits: Lisa Wu’s presentation]

57 of 93

Questions?

58 of 93

What is computer vision?

Computer vision studies how to reproduce in a computer the ability to see

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.

58

Vision is the process of discovering from images what is presented in the world, and where it is.

David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.

59 of 93

Three representation directions

59

S: scene

I: image

2: Reconstruction

1: Recognition

tree

3: Generation

tree

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

60 of 93

Computer vision: representative tasks

60

61 of 93

Computer vision: representative tasks

62 of 93

Computer vision: representative tasks

62

Retrieval, image-to-image search

63 of 93

Computer vision: representative tasks

63

Depth estimation and 3D reconstruction

64 of 93

Computer vision: representative tasks

64

65 of 93

Computer vision: representative tasks

65

Style transfer

[Figure credit: CycleGAN, ICCV 2017]

66 of 93

Questions?

67 of 93

What is computer vision?

Computer vision studies how to reproduce in a computer the ability to see

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision, 2024.

67

Vision is the process of discovering from images what is presented in the world, and where it is.

David Marr, Vision A Computational Investigation into the Human Representation and Processing of Visual Information, 1982.

68 of 93

How to let computers recognize objects?

A cat?

A lion?

A car?

Percept:

See a picture

Action:

Tell the object class

69 of 93

Human design vs. machine-learning-based

cat

Design

cat

cat

cat

Data

collection

“Learn”

“Coding” the rules:

Can you list the rules of recognizing a cat?

Underlying idea:

Humans sometimes are good at “making decisions” BUT are not good at “explaining decisions”.

70 of 93

Learning-based computer vision

70

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

71 of 93

What is machine learning?

This book is about learning from data.

Sergios Theodoridis. Machine learning: a Bayesian and optimization perspective.

We choose the title “learning from data” that faithfully describes what the subject is about.

Y. Abu-Mostafa, M. Magdon-Ismail, H-T Lin. Learning from data.

72 of 93

Machine Learning Overview

  • What is machine learning?

Learning from Data

72

73 of 93

Machine Learning Overview

  • What is machine learning?

Learning from Data

Algorithm

Data

Evaluation

73

74 of 93

Machine Learning Overview

  • What is machine learning?

Learning from Data

Algorithm

Data

Evaluation

Goal

74

75 of 93

Example: coin classifier

75

Machine learning algorithms

Training data

Learned models

Test data

[Figure credit: Y. Abu-Mostafa, M. Magdon-Ismail, H-T Lin. Learning from data.]

76 of 93

What is deep learning (deep neural networks)?

Image

Label (e.g., dog or cat)

Classifier

See a picture

Tell the object class

A sequence of “learnable” computation!

77 of 93

Example: image classification

77

[Gif credits: Gradient descent 3Blue1Brown series S3 E2]

A sequence of “learnable” computation!

78 of 93

The progress of deep learning

[Simonyan et al., 2015]

[Szegedy et al., 2015]

[Huang et al., 2017]

[He et al., 2016]

[Krizhevsky et al., 2012]

79 of 93

The progress of deep learning

Visual transformers

[Liu et al., 2021]

[Battaglia et al., 2018]

Graph neural networks

[Qi et al., 2017]

PointNet

[Zoph et al., 2017]

Neural architecture search

80 of 93

Questions?

81 of 93

Today

Introduction

  • What is computer vision?

Course overview

81

82 of 93

Topics

1. Introduction to computer vision

a. Introduction to the course

b. A simple vision system

82

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

83 of 93

Topics

2. Image formation

a. Concepts of imaging and lenses

b. Images and 3D geometry

c. Camera modeling

d. Cameras as linear systems

83

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

84 of 93

Topics

3. Foundations of image processing

a. Linear filtering and convolution

b. Fourier analysis

c. Blur filters, image derivatives, and filter banks

d. (Up/down) sampling

e. Image pyramids

84

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

85 of 93

Topics

4. Foundations of learning

a. Introduction to learning

b. Gradient-based learning algorithms

c. Generalization

d. Neural networks as distribution transformers

85

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

86 of 93

Topics

5. Probabilistic models of images

a. Color

b. Statistical image models

c. Textures

86

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

87 of 93

Topics

6. Neural architectures for vision

a. Convolutional neural nets

b. Transformers

87

Visual transformers

[Liu et al., 2021]

[Simonyan et al., 2015]

ConvNet (VGG Net)

88 of 93

Topics

7. Generative image models and representation learning

a. Representation learning

b. Generative models

88

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

89 of 93

Topics

8. Understanding vision with semantics and language

a. Visual recognition

b. Vision and language

89

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

90 of 93

Topics

9. Challenges in learning-based vision

a. Data bias and shift

b. Robustness and generality

c. Transfer learning and adaptation

90

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

91 of 93

Topics

10. Understanding geometry

a. Stereo vision

b. Homographies

c. Depth estimation from single images

d. Feature detection and matching

e. Multi-view geometry and structure from motion

f. Radiance fields

91

92 of 93

Topics

11. Understanding motion

a. Motion estimation

b. Optical flow estimation

c. Object tracking

92

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

93 of 93

TODO

  • See the beginning slides and the course website for suggested reading
  • Background review: math and programming

93