2 of 53

Course information

Course website:

https://sites.google.com/view/osu-cse-5524-sp25-chao/home

Instructor:

Dr. Wei-Lun (Harry) Chao (chao.209), Office: DL 587

Office hours: Tu 11 am – noon; Th 9 – 10 am (DL 587)

Amin Karimi Monsefi (karimimonsefi.1), CSE PhD student

Office hours: M 9 – 10 am; W 10 – 11 am (BE 406)

3 of 53

Course information

Carmen/GitHub:

For announcement, posting course materials (slides), and homework submission

Piazza:

For discussion. Please register!
Link: To set up by Thursday
Please use name.#@osu.edu
Access code: osu-cse-5524-SP25-chao

Detailed syllabus (pdf):

can be found on Carmen and the course website

4 of 53

Textbook

Required

https://mitpress.mit.edu/9780262048972/foundations-of-computer-vision/
Purchasable on MIT Press Bookstore or Amazon

Foundations of Computer Vision

In talking with OSU Library to have PDF access!

5 of 53

Final project vs. Final exam

Final project will be “team”-based

We will provide some options. You may propose projects, but they need to be concrete enough.

There will be multiple milestones (proposal, presentation, final report)

The % distribution over homework, exams, and the final project is subject to changes but will be finalized soon.

6 of 53

1/23 next Thursday

I will be traveling.

The current plan is that the TA will give a lecture about “PyTorch,” which will be very useful for the final project and potentially for homework as well.

7 of 53

Linear algebra quizzes are released

Please see Carmen’s announcements.

8 of 53

Today

Recap: a simple vision system
Image formation

9 of 53

Goal

Hand-design a vision system for 3D interpretation from images

Preview a set of the concepts of this semester
Optimism: an MIT summer project in 1966 🡪 computer vision tasks for decades

Depth estimation and 3D reconstruction

10 of 53

A simple world: the blocks world

What are inside?

Simple but varied set of objects
Flat horizontal or vertical surfaces
White horizontal ground plane

Image formation assumptions

Parallel (orthographic) projection

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

[Figure credit: https://www.geeksforgeeks.org/parallel-othographic-oblique-projection-in-computer-graphics]

11 of 53

Our goal: recover the world coordinate of all pixels

We want to know X(x, y), Y(x, y), and Z(x, y) from the given image!

What we know:

We need some cues from images and the 3D world!

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

12 of 53

Our goal: recover the world coordinate of all pixels

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

(x, y)

13 of 53

Reconstructed 3D worlds from other views

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

14 of 53

Cue 1: edges

Edges: image regions with strong color/intensity changes w.r.t. location

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

15 of 53

Cue 2: Surfaces & Cue 3: properties from 3D to 2D

Separate into foregrounds (figures)/backgrounds

Not always true, but let’s assume it is true

Vertical in 3D will project to vertical in 2D; thus, vertical in 2D mean vertical in 3D
Non-vertical in 2D means horizontal in 3D

16 of 53

Our goal: recover the world coordinate of all pixels

We want to know X(x, y), Y(x, y), and Z(x, y) from the given image!

If we know Y(x, y), we know Z(x, y)

17 of 53

Our goal: recover the world coordinate of all pixels

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

(x, y)

18 of 53

Estimating Y(x, y) from the input image

19 of 53

Estimating Y(x, y) from the input image

20 of 53

Estimating Y(x, y) from the input image

21 of 53

Estimating Y(x, y) from the input image

Horizontal edges: Y won’t change along the edge

= 0

22 of 53

Estimating Y(x, y) from the input image

Surfaces: flat, not curved

23 of 53

Constraints propagation via “optimization”

Least square solution!

25 of 53

Today

Recap: a simple vision system
Image formation

26 of 53

Goal

How are images formed?
How can light illuminating the space be captured by a device to form a picture?

27 of 53

“Visible” light interacting with surfaces

Light:

Wave (with wavelength, frequency)
Light ray – specified by position, direction, and intensity, as a function of wavelength and polarization

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Power (wavelength)

Bidirectional reflection distribution function

28 of 53

Lambertian surfaces

Bidirectional reflection distribution functions (BRDFs) can be complex

Assumptions: Lambertian model

Lambertian model: the outgoing ray intensity is a function of

Surface orientation relative to the incoming ray directions
Wavelength
A scalar surface reflectance, aka, albedo
Incoming light power

No dependency on the outgoing direction of the ray

29 of 53

Specular surfaces

Phong reflection model: widely used, with specular components of reflection

Ambient: Constant
Diffuse: Lambertian model
Specular reflection

30 of 53

Why are these models important?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Two light sources

31 of 53

From lights to world interpretation

To understand our world from the lights

We need to “associate” the reflected light with the surface in the world.
We need to know which light rays come from which direction in space.

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

33 of 53

Images & cameras

Forming an image = identifying which rays coming from which directions

Camera: organizing rays

Pinhole camera:

One location on the wall
Light from one direction

Projection surface

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

34 of 53

Examples of pinhole cameras

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Does the distance between the projection surface and the pinhole matter?

35 of 53

The world is full of accidental cameras

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

36 of 53

Image formation by perspective projection

A (pinhole) camera projects 3D coordinates in the world to 2D positions on the projection plane, through the straight-line paths of each light ray through the pinhole

37 of 53

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Image coordinates vs.

virtual camera coordinates?

38 of 53

Perspective projection equations

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

39 of 53

Orthographic (parallel) projection equations

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Good for the telephoto lenses

40 of 53

Can we really have orthographic projection?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

42 of 53

What’s wrong with pinhole cameras

Projection surface

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Images are dime …

Limited lights ...

43 of 53

From pinholes to lenses

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Light needs to be concentrated/ bent!

44 of 53

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

45 of 53

Lensmaker’s formula

From one material to the other, light changes its wavelength and speed

The changes at the surface will cause light to bend, i.e., refraction

Depend on the change of speed and orientation

46 of 53

Snell’s law

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

47 of 53

A lens

A specifically “shaped” piece of transparent material, positioned to focus light from a surface point onto a sensor

Ideally …

Need: numerical optimization!

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

48 of 53

Simplified optical system

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

49 of 53

Assumptions:

Paraxial: the angle is small
Thin lens: negligible thickness

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

50 of 53

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Assumptions:

Paraxial: the angle is small
Thin lens: negligible thickness

51 of 53

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Assumptions:

Paraxial: the angle is small
Thin lens: negligible thickness

Lensmaker’s formula:

52 of 53

General cases

Points of the optical axis

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

1 of 53

2 of 53

3 of 53

4 of 53

5 of 53

6 of 53

7 of 53

8 of 53

9 of 53

10 of 53

11 of 53

12 of 53

13 of 53

14 of 53

15 of 53

16 of 53

17 of 53

18 of 53

19 of 53

20 of 53

21 of 53

22 of 53

23 of 53

24 of 53

25 of 53

26 of 53

27 of 53

28 of 53

29 of 53

30 of 53

31 of 53

32 of 53

33 of 53

34 of 53

35 of 53

36 of 53

37 of 53

38 of 53

39 of 53

40 of 53

41 of 53

42 of 53

43 of 53

44 of 53

45 of 53

46 of 53

47 of 53

48 of 53

49 of 53

50 of 53

51 of 53

52 of 53

53 of 53