JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 75

CSE 5524: �A Simple Vision System (Cont.)�& Image Formation

2 of 75

Course information, grading, reading, policy, etc.

Please check slide decks 1 & 2, course website, Carmen, and syllabus.

Course website:

https://sites.google.com/view/osu-cse-5524-sp25-chao/home

Office hours start this week!

Dr. Chao (DL587): Tuesday 3 – 4 pm & Friday 9 – 10 am
Zheda Mai (BE 406): Monday 11 am - 12 pm & Wednesday 2 pm - 3 pm

Linear algebra quizzes: released last week, due 9/16

3 of 75

Today

Recap and continuation: a simple vision system
Image formation

4 of 75

Three representation computer vision sub-fields

S: scene

I: image

2: Reconstruction

1: Recognition

tree

3: Generation

tree

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

5 of 75

A simple world: the blocks world

What are inside?

Simple but varied set of objects
Flat horizontal or vertical surfaces
White horizontal ground plane

Image formation assumptions

Parallel (orthographic) projection

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

[Figure credit: https://www.geeksforgeeks.org/parallel-othographic-oblique-projection-in-computer-graphics]

6 of 75

Our goal: recover the world coordinate of all pixels

We want to know X(x, y), Y(x, y), and Z(x, y) from the given image!

What we know:

We need some cues from images and the 3D world!

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

7 of 75

Our goal: recover the world coordinate of all pixels

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

(x, y)

8 of 75

Reconstructed 3D worlds from other views

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

9 of 75

Reconstructed 3D worlds from other views

Depth estimation and 3D reconstruction

10 of 75

Questions?

11 of 75

Before we dive into details, let’s take a step back

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

12 of 75

Can you infer the 3D information from 2D?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

13 of 75

Can you write down what you just said by math/code?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

14 of 75

Cue 1: edges

Edges: image regions with strong color/intensity changes w.r.t. location

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

15 of 75

Cue 2: Surfaces & Cue 3: properties from 3D to 2D

Separate into foregrounds (figures)/backgrounds

Not always true, but let’s assume it is true

Vertical in 3D will project to vertical in 2D; thus, vertical in 2D mean vertical in 3D
Non-vertical in 2D means horizontal in 3D

16 of 75

Our goal: recover the world coordinate of all pixels

We want to know X(x, y), Y(x, y), and Z(x, y) from the given image!

If we know Y(x, y), we know Z(x, y)

17 of 75

Our goal: recover the world coordinate of all pixels

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

18 of 75

Questions?

19 of 75

How to represent the “3D height map” Y?

20 of 75

How to represent the “3D height map” Y?

Vectorization: matrix to vector

124	255	125
125	126	60
126	60	126
0	127	60
0	0	128

0
0
0
0
0
0
0
0
0
0
124
125
126
0
0
255
126
60
127
0
125
60
126
60
128

21 of 75

How to estimate the “3D height map” Y?

Reconstruction

Known equations
Cues from edges, surfaces, and 2D/3D relationships

22 of 75

Cues encoded by linear equations

23 of 75

Estimating Y(x, y) from the input image

24 of 75

Estimating Y(x, y) from the input image

25 of 75

Estimating Y(x, y) from the input image

26 of 75

Estimating Y(x, y) from the input image

Horizontal edges: Y won’t change along the edge

= 0

27 of 75

Estimating Y(x, y) from the input image – horizontal edges

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

28 of 75

Estimating Y(x, y) from the input image

Horizontal edges: Y won’t change along the edge

= 0

29 of 75

Estimating Y(x, y) from the input image

Surfaces: flat, not curved

30 of 75

Questions?

31 of 75

Information propagation via “optimization”

All the information:

3D vertical edge:
3D horizontal edge:
Flat surfaces:

Contact edges & background: Y = 0

= 0

32 of 75

Information propagation

33 of 75

Information propagation

34 of 75

Information propagation via “optimization”

All the information:

3D vertical edge:
3D horizontal edge:
Flat surfaces:

Contact edges & background: Y = 0

They can be rewritten as an overdetermined system of linear equations

= 0

35 of 75

Information propagation via “optimization”

Least square solution!

36 of 75

Results

37 of 75

Reconstructed 3D worlds from other views

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

38 of 75

Caution

The approach we developed so far does not always work.
We have made lots of assumptions.

Still, it gives some sense of the scope and challenge of computer vision.

39 of 75

Reading & keywords

Chapter 2

Image intensity
Edge, shadow edges
Surface
Linear system, Overdetermined system of linear equations
Least square solutions
3D reconstruction
Parallel and perspective projection

40 of 75

HW1

Release tonight or Thursday night!

41 of 75

Questions?

42 of 75

What is 3D reconstruction nowadays?

43 of 75

What is 3D reconstruction nowadays?

https://depth-anything-v2.github.io/

44 of 75

What is 3D reconstruction nowadays?

https://vgg-t.github.io/

45 of 75

How to let computers recognize objects?

A cat?

A lion?

A car?

Percept:

See a picture

Action:

Tell the object class

46 of 75

Human design vs. machine-learning-based

cat

Design

cat

Data

collection

“Learn”

“Coding” the rules:

Can you list the rules of recognizing a cat?

Underlying idea:

Humans sometimes are good at “making decisions” BUT are not good at “explaining decisions”.

47 of 75

Today

Recap and continuation: a simple vision system
Image formation

48 of 75

Goal

How are images formed?
How can light illuminating the space be captured by a device to form a picture?

49 of 75

“Visible” light interacting with surfaces

Light:

Wave (with wavelength, frequency)
Light ray – specified by position, direction, and intensity, as a function of wavelength and polarization

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Power (wavelength)

Bidirectional reflection distribution function

50 of 75

Lambertian surfaces

Bidirectional reflection distribution functions (BRDFs) can be complex

Assumptions: Lambertian model

Lambertian model: the outgoing ray intensity is a function of

Incoming light power
Wavelength
Surface orientation relative to the incoming ray directions
A scalar surface reflectance, aka, albedo

No dependency on the outgoing direction of the ray

51 of 75

Specular surfaces

Phong reflection model: widely used, with specular components of reflection

Ambient: Constant
Diffuse: Lambertian model
Specular reflection

52 of 75

Why are these models important?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Two light sources

53 of 75

From lights to world interpretation

To understand our world from the lights

We need to “associate” the reflected light with the surface in the world.
We need to know which light rays come from which direction in space.

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

54 of 75

Questions?

55 of 75

Images & cameras

Forming an image = identifying which rays coming from which directions

Camera: organizing rays

Pinhole camera:

One location on the wall
Light from one direction

Projection surface

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

56 of 75

Examples of pinhole cameras

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Does the distance between the projection surface and the pinhole matter?

57 of 75

The world is full of accidental cameras

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

58 of 75

Image formation by perspective projection

A (pinhole) camera projects 3D coordinates in the world to 2D positions on the projection plane, through the straight-line paths of each light ray through the pinhole

59 of 75

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Image coordinates vs.

virtual camera coordinates?

60 of 75

Perspective projection equations

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

61 of 75

Orthographic (parallel) projection equations

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Good for the telephoto lenses

62 of 75

Can we really have orthographic projection?

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

63 of 75

Questions?

64 of 75

What’s wrong with pinhole cameras

Projection surface

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Images are dime …

Limited lights ...

65 of 75

From pinholes to lenses

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Light needs to be concentrated/ bent!

66 of 75

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

67 of 75

Lensmaker’s formula

From one material to the other, light changes its wavelength and speed

The changes at the surface will cause light to bend, i.e., refraction

Depend on the change of speed and orientation

68 of 75

Snell’s law

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

69 of 75

A lens

A specifically “shaped” piece of transparent material, positioned to focus light from a surface point onto a sensor

Ideally …

Need: numerical optimization!

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

70 of 75

Simplified optical system

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

71 of 75

Assumptions:

Paraxial: the angle is small
Thin lens: negligible thickness

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

72 of 75

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Assumptions:

Paraxial: the angle is small
Thin lens: negligible thickness

73 of 75

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

Assumptions:

Paraxial: the angle is small
Thin lens: negligible thickness

Lensmaker’s formula:

74 of 75

General cases

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]

75 of 75

[Figure credit: A. Torralba, P. Isola, and W. T. Freeman, Foundations of Computer Vision.]