2 of 76

Contents

Basics of Video
Digital Video
Time Varying Image Formation Models/Video Modeling
Three Dimension Motion Models
Geometric Image Formation
Photometric mage Formation
Sampling of Video Signals
Filtering Operations

Dr. Rajkumar L. Biradar

4/23/2019

3 of 76

Basics of Video

Video signal is basically sequence of time varying images.
A still image has a spatial distribution of intensities that remain constant with time.
Time varying image has a spatial intensity distribution that varies with time.
Video signal is treated as a series of images called frames.
An illusion of continuous video is obtained by changing the frames in a faster manner which is generally termed as frame rate.
It is 3-D signal

2 spatial dimensions & 1 time dimension
Continuous I(x, y, t) ⇨ discrete I(m, n, k)
(In text book he used s in place I)

Dr. Rajkumar L. Biradar

4/23/2019

4 of 76

Dr. Rajkumar L. Biradar

4/23/2019

5 of 76

Dr. Rajkumar L. Biradar

4/23/2019

6 of 76

Dr. Rajkumar L. Biradar

4/23/2019

7 of 76

Dr. Rajkumar L. Biradar

4/23/2019

8 of 76

Dr. Rajkumar L. Biradar

4/23/2019

9 of 76

Dr. Rajkumar L. Biradar

4/23/2019

10 of 76

Dr. Rajkumar L. Biradar

4/23/2019

11 of 76

Dr. Rajkumar L. Biradar

4/23/2019

12 of 76

Dr. Rajkumar L. Biradar

4/23/2019

13 of 76

Dr. Rajkumar L. Biradar

4/23/2019

14 of 76

Dr. Rajkumar L. Biradar

4/23/2019

15 of 76

Dr. Rajkumar L. Biradar

4/23/2019

16 of 76

Dr. Rajkumar L. Biradar

4/23/2019

17 of 76

Dr. Rajkumar L. Biradar

4/23/2019

18 of 76

Dr. Rajkumar L. Biradar

4/23/2019

19 of 76

Dr. Rajkumar L. Biradar

4/23/2019

20 of 76

Dr. Rajkumar L. Biradar

4/23/2019

21 of 76

I(x,y,t) is analog video signal, where (x,y) denote spatial coordinates and t denotes temporal/time variable in continuous time domian.
I(x,y,t) is analog because it is continuous in both in spatial(space) and time domain.
However, the analog video that is viewed on the display monitors is not truly analog, since it is sampled along one space (vertical) coordinate and along the time domain.
Practically so called analog video systems, such as TV and monitors, represents video signal as one dimension electrical signal f(t).

Dr. Rajkumar L. Biradar

4/23/2019

22 of 76

Analog Video Signal

The analog video signal refers to one dimension electrical signal f(t), which is obtained by sampling I(x, y, t) along vertical (y) space direction and along the time (t) direction. This periodic sampling is called as scanning.
Scanning results a series of time samples which are called as frames or complete picture.
Each frame is composed of space samples called as scan lines

Dr. Rajkumar L. Biradar

4/23/2019

23 of 76

Types of Video Scanning

The two types of scanning, they are

Progressive Scanning

Interlaced Scanning

Progressive Scanning:

A progressive scan traces a complete frame at every ∆t sec.
The computer industry uses progressive scanning with scan rate of ∆t=1/72 S for high resolution monitors.
Optic or electronic beam of an analog camera continuously scan the frame from top to bottom and then back to top.
The resulting video signal consist of series of frames separated by a regular interval of time ∆t and each frame consist of a consecutive set of horizontal lines, separated by a regular vertical spacing.

Dr. Rajkumar L. Biradar

4/23/2019

24 of 76

The intensity values captured along contiguous scan lines over consecutive frames form a 1-D analog video signal f(t). It is also called as raster scan.

With a color camera, three 1-D raster's are converted into a composite signal, which is color raster.

(Fig from Y Wang, Video processing and communication text book)

It is important that frame rates should be high enough otherwise the displayed video will appear to ‘flicker’. The human eye detects the flicker if the scan rate(refresh rate) is less then 50 frames /sec.
Clearly, computer monitors (72 frame/s) exceeds this rate.
However in many other systems like TV such fast rate is not possible because of bandwidth limitations. Hence interlaced scanning is a solution to this.

Dr. Rajkumar L. Biradar

4/23/2019

25 of 76

Dr. Rajkumar L. Biradar

4/23/2019

26 of 76

Interlaced Scanning:

Interlaced Scanning: TV industry uses 2:1interlaced scanning.
In this, each frame is scanned in two fields, called odd fields and even fields. Each of this fields consist of half numbers of lines in a frames.
The odd field consist of the odd lines and even field consist of even lines of each frame.
Each frame is are read out in two separate scans of the odd and even fields respectively. This allows good reproduction of movement in the scene at relatively low field rate.
In this way flicker is effectively eliminated provided the field rate is above the visual limit of 50 Hz.
Broadcast television in US uses the frame rate of 30 Hz, hence the field rate is 60 Hz which is well above 50Hz.
The spot snaps back from B to C is called horizontal retrace.
The spot snaps back from D to E is called vertical retrace.

Dr. Rajkumar L. Biradar

4/23/2019

27 of 76

Dr. Rajkumar L. Biradar

4/23/2019

28 of 76

Analog Video Signal Format

Despite the advance of digital video technology, the most common consumer display mechanism for video still uses analogue display devices such as CRT.
Until all terrestrial and satellite broadcasts become digital, analogue video formats will remain significant.
The three principal Analogue Video Signal formats are:

NTSC (National Television Systems Committee),
PAL (Phase Alternate Line) and
SECAM (Sequential Color with Memory).

All the three are television video formats in which the information in each picture is captured by CCD or CRT is scanned from left to right to create a sequential intensity signal. The formats take advantage of the persistence of human vision by using interlaced scanning pattern in which the odd and even lines of each picture are read out in two separate scans of he odd and even fields respectively. This allows good reproduction of movement in the scene at the relatively low field rate of 50 fields/sec for PAL and SECAM and 60 fields/sec for NTSC

Dr. Rajkumar L. Biradar

4/23/2019

29 of 76

Digital Video

A digital video is obtained either sampling a raster scan (f(t)) or directly using a digital video camera.
Presently, all digital cameras use CCD(charged coupled device) sensors.
A digital camera samples the image scene as discrete frames.
Each frame consists of output values from a CCD array, which is discrete by nature in both horizontal and vertical direction.
Digital video is defined by following parameters

frame rate, f_s,t
the line numbers f_s,y and
number samples per line f_s,x

From above parameters, we can find

Temporal sampling interval or frame interval ∆t=1/ f_s,t

Vertical sampling interval interval ∆y=picture or frame height/ f_s,y

Horizontal sampling interval ∆x=picture or frame width/ f_s,x

Digital video is denoted by I(m,n,k) , where integer indices m and n are the column and row indices, and k is the frame number

Dr. Rajkumar L. Biradar

4/23/2019

30 of 76

The actual spatial and temporal location corresponding to the integer indices are x=m∆x, y=n∆y and t=k∆t.
For convenience, we use the notation I(x,y,t) to describe the video signal in general, which could be either analog or digital. We will use I(m,n,k) only when specifically addressing digital video.
Number of bits used to represent digital video
N_b is the number of bits used to denote the pixel value.
N_b=8 for monochrome video, & N_b=24 for color video.
The data rate R of digital video is determined by

R= f_s,t f_s,y f_s,xN_b bps or kbps or mbps

Dr. Rajkumar L. Biradar

4/23/2019

31 of 76

Ex: Video Cameras

Frame-by-frame capturing
CCD sensors (Charge-Coupled Devices)

2-D array of solid-state sensors
Each sensor corresponds to a pixel
Stored in a buffer and sequentially read out
Widely used.

Note: The width to height ratio of a video frame is called as image aspect ratio (IAR). It 3:4 is standard TV (SDTV), up to 2:2 used in wide screen movies and 16:9 is used in HDTV.

For digital video, the ratio of width to height of rectangular area is called pixel aspect ratio (PAR). It is related IAR by

PAR=IAR f_s,y /f_s,x .

For proper display of digitized video signal, one must specify either PAR or IAR along with f_s,y & f_s,x

The device should match to PAR specified for the signal, otherwise object shape will be distroted.

Dr. Rajkumar L. Biradar

4/23/2019

32 of 76

Why Digital?

“Exactness”

Exact reproduction without degradation
Accurate duplication of processing result

Convenient & powerful computer-aided processing

Can perform rather sophisticated processing through hardware or software

Easy storage and transmission

1 DVD can store a three-hour movie !!!
Transmission of high quality video through network in reasonable time

Dr. Rajkumar L. Biradar

4/23/2019

33 of 76

Application of Digital Video

Dr. Rajkumar L. Biradar

4/23/2019

34 of 76

Time Varying Image Formation Models or Video Modeling

In this, we present simplest models for temporal variations of spatial intensity pattern in the image plane.
For Ex: to describe the change between consecutive images/frames of video sequence in terms of objects motion, illumination changes and camera motion. We need some model that describe the real world and image formation process.
Most important models are scene, object, camera and illumination models. These model describe the assumption that we make about the real world.
Depending upon the selected model, we are able to describe the real world with more or less detail and precision.

Dr. Rajkumar L. Biradar

4/23/2019

35 of 76

We represent a time varying (temporal varying) image by a function of three continuous variable, , which is formed by projecting a time –varying 3-D spatial scene into 2-D image plane.
The temporal variations in 3-D scene are usually due to movements of objects in the scene.
Thus, time varying images reflect a projection of 3-D moving object into 2-D image plane as a function of time .(we called in first line of this slide).
Digital video corresponds to a spatio-temporally sampled version of this time varying image(frame).

Dr. Rajkumar L. Biradar

4/23/2019

36 of 76

A block diagram represents time varying image formation model

Observation Noise

Digital Video Formation

3-D scene modeling refers to modeling of 3-D motion and structure(shape) of the objects in 3-D.
Image formation refers to mapping 3-D scene to 2-D image plane.

Ex: Geometric and photographic image formation.

The last block is obtain digital video by spatio-temporal sampling.

Note: This what we are going to study in this unit.

Dr. Rajkumar L. Biradar

4/23/2019

3D Scene Modeling

Image Formation

�

Spatio-Temporal Sampling

37 of 76

Dr. Rajkumar L. Biradar

4/23/2019

We need to understand some basics:

1. Pinhole cameras

Abstract camera model - box with a small hole in it

Pinhole cameras work in practice

(Forsyth & Ponce)

38 of 76

Dr. Rajkumar L. Biradar

4/23/2019

39 of 76

2. Lens

Lens duplicate pinhole geometry without resorting to undesirably small apertures.

Gather all the light radiating from an object point towards the lens’s finite aperture .
Bring light into focus at a single distinct image point.

Dr. Rajkumar L. Biradar

4/23/2019

refraction

40 of 76

Dr. Rajkumar L. Biradar

4/23/2019

41 of 76

3. Thin lens equation

Dr. Rajkumar L. Biradar

4/23/2019

Assume an object at distance u from the lens plane:

object

image

42 of 76

Thin lens equation (cont’d)

Dr. Rajkumar L. Biradar

4/23/2019

Using similar triangles

y’/y = v/u

y’

image

43 of 76

Thin lens equation (cont’d)

Dr. Rajkumar L. Biradar

4/23/2019

y’

y’/y = (v-f)/f

Using similar triangles:: The relation between the focal

length (f), the distance of the object from the camera (u), and the distance at

which the object will be in focus (v) is given by

image

44 of 76

Geometric Image Formation

Imaging system capture 2-D projection of a time varying 3-D scene. This projection can be represented by a mapping from 4-D space to a 3-D space.

Where , the 3-D world co-ordinates, , the 2-D image plane coordinates and t time, are the continuous variables.

There are two types of projection

1. Perspective (central) projection.

2. Orthographic (parallel) projection.

Dr. Rajkumar L. Biradar

4/23/2019

45 of 76

1. Perspective Projection/Pin hole Camera Model/Central Projection

it is widely used model for approximation of the projection of real world objects (3-D) onto 2-D plane.
It reflects the 2-D image formation using an ideal pin hole camera according to the principles of geometrical optics.
All the rays from the object pass through the center of projection, which corresponds to the center of lens. For this reason, it is also called as “central projection”.
Perspective projection is illustrated in below fig. when the center of projection is between the object and image plane, and image plane (x1,x2) coincides the (X1,X2) plane of real world coordinates.

Dr. Rajkumar L. Biradar

4/23/2019

46 of 76

and where f is focal length of distance the center of projection to the image plane. If we move center of projection to coincide with the origin of real world coordinates as shown next fig, simple change in above eqn yields

Dr. Rajkumar L. Biradar

4/23/2019

The algebraic relation that describe

the perspective transformation for

the configuration shown in figure is

obtained based on the similar

triangles formed by drawing

perpendicular lines from the object

point (X1,X2,X3) and image

point (x1,x2,0) or (x1,x2) to the

X3 axis respectively.

From Fig, we have (negative sign because of divergence lens [-f/(X3-f)]

47 of 76

x1/X1=-(-f/X3)
=f/X3

Dr. Rajkumar L. Biradar

4/23/2019

48 of 76

Note: Previous analysis was from Tekalp, following is from Wang

Dr. Rajkumar L. Biradar

4/23/2019

49 of 76

Dr. Rajkumar L. Biradar

4/23/2019

50 of 76

Dr. Rajkumar L. Biradar

4/23/2019

51 of 76

2. Orthographic Projection

When image plane is very far from camera plane, perspective projection can be approximated by orthographic projection, which is also known as parallel projection.
Orthograhic projection approximation is an actual imaging process where it is assumed that all the ray from 3-D scene (object) to the image plane travel parallel to each other.
The relation between world coordinates and orthographic projection are one to one and is given by
and
Or in terms of vector–matrix relation

Dr. Rajkumar L. Biradar

4/23/2019

52 of 76

Dr. Rajkumar L. Biradar

4/23/2019

53 of 76

Geometric Image Formation Models: Conclusion

Dr. Rajkumar L. Biradar

4/23/2019

54 of 76

Photometric Image Formation Model

Image intensities can be modeled as proportional to the amount of the light reflected by the objects of the scene. In general, the scene reflection can contain two components.
Lambertian component: It has equal energy distribution in all the direction. It is also called as diffuse reflection. Wood surface and cement surface belongs to this categories.
Specular Component: It is strongest in the mirror direction of the incident light. Shiny and mirror surfaces belongs to this categories. (specularly reflected light is reflected only along a direction where it is equal to the angle of incident light).
In real life surfaces are a mixture of Lambertian (i.e. diffusely reflecting or satisfying Lambert's law) and specular surfaces.
We concentrate only on the surfaces where the specular component can be neglected.

Dr. Rajkumar L. Biradar

4/23/2019

55 of 76

Lambertian Relection Model

If the lambertian surface is illuminated by a single point source with uniform intensity (in time), then the resulting image intensity (reflected light intensity) in all directions is proportional to product of the incident light L, normal of the surface N(t) and the cosine of the angle them (Lambert's law).

Where denotes the surface albedo, i.e., the fraction of the light reflected by the surface (range is 0-1) (reflection coeeficient).
N is the unit vector normal to the scene of

surface.

L is the unit vector in the mean illumination

direction

Dr. Rajkumar L. Biradar

4/23/2019

56 of 76

Three Dimension Motion Models

In this, we address modeling of the relative 3-D motion between the camera and objects in the scene.
This includes 3-D motion of the objects in the scene such as translation and rotation as well as 3-D motion of camera such as zooming and panning (rotating camera in its vertical and horizonatl direction) (develop in particular way ie pan out or pan in).

According classical kinematics, 3-D motion can be classified as

Rigid motion: In this, the relative distance between the set of 3-D points remain fixed as object evolves (develops) in time. That is, the 3-D structure/shape of moving object can be modeled by a non-deformable surface, e.g., a planer, piece wise planer or polynomial surface.

Non Rigid Motion: in this, a deformable surface model is utilized in modeling the 3-D structure.

Dr. Rajkumar L. Biradar

4/23/2019

57 of 76

Dr. Rajkumar L. Biradar

4/23/2019

58 of 76

Dr. Rajkumar L. Biradar

4/23/2019

59 of 76

Dr. Rajkumar L. Biradar

4/23/2019

60 of 76

Rigid Motion Model in Cartesian Coordinates

The motion of a rigid objects can be expressed in terms of motion parameters like translation vector and rotation matrix [R].
The 3-D translation vector describes the displacement (translation) of a point from X

to by in the direction of the coordinates axis X, Y and Z respectively.

X and X’ denote the coordinates of point at times t anf t’ wrt to center of rotation.
If an object is translated, above equation (1) holds good for all objects points.

Dr. Rajkumar L. Biradar

4/23/2019

61 of 76

Rotation Matrix:

The 3-D rotation matrix in Cartesian coordinates can be characterized by Eular angles of rotation about the three coordinates axis or by an axis of rotation and angle about the axis.
The two description can be shown to be equivalent under the assumption of infinitesimal rotation.
Eular angles in the three coordinates: The 3-D rotation in the space can be represented by Eular angles rotation about the X, Y and Z axis, respectively

Dr. Rajkumar L. Biradar

4/23/2019

62 of 76

The matrix R that describe clockwise rotation about the individual axis are given by�

Assuming the rotation from frame to frame is infinitesimally small, i.e.,

and thus approximating , the above matrix will simplify as

Dr. Rajkumar L. Biradar

4/23/2019

63 of 76

Then the composite rotation matrix is given by

Now, we can express the motion of a point X on the object surface from X to X’ as

Eqn (1)🡪

In aboev equation both translation and rotation are taken care. (2)

Dr. Rajkumar L. Biradar

4/23/2019

64 of 76

Effect of camera in 3-D Motion

It is possible to incorporate the effect of zooming into the 3-D rotation model if we assume that the camera has fixed parameters but the object is artificially scaled up or down.
Eqn (2) 🡪

Where S is a scaling matrix

Dr. Rajkumar L. Biradar

4/23/2019

65 of 76

Non-Rigid or Deformable Motion

Modeling the 3-D structure and motion of non-rigid is a complex task. It is active research area today.
In theory, according to the mechanics of deformation bodies, the model which is studied in previous section can be extended to include 3-D non-rigid motion as

Where D is an arbitrary deformation matrix.

Dr. Rajkumar L. Biradar

4/23/2019

66 of 76

Sampling of video signal

In order to obtain an analog or digital signal, the continuous time varying image (video) needs to be sampled in both the spatial and temporal coordinates.
Analog video signal representation require sampling in vertical and temporal coordinates.
Ex: An analog video signal is a 1-D continuous function , where one of spatial coordinate is mapped onto time by means of scanning process.
For a digital video representation, is sampled in all three coordinates as per the sampling theorem or .
The spatio-tempral sampling process is depicted in below figure, where

is digital video signal and denotes the discrete spatial and temporal coordinates, respec tively.

Dr. Rajkumar L. Biradar

4/23/2019

Spatio-Temporal

Sampling

67 of 76

Dr. Rajkumar L. Biradar

4/23/2019

68 of 76

Sampling for Analog and Digital Video

In this section, we study the sampling structure utilized for the representation of analog and digital video.

Sampling structure for Analog Video

An analog video is obtained by sampling the time-varying image intensity distribution in vertical x2 and temporal t direction by 2-D sampling process known as scanning.
Continuous intensity information along each horizontal line is concatenated to form the 1-D analog video signal as a function of time.
The two most commonly used vertical-temporal sampling structure are

Orthogonal sampling structure

Hexagonal sampling structure

Dr. Rajkumar L. Biradar

4/23/2019

69 of 76

In these figures, each dot indicates a continuous line of video perpendicular to the plane of this slide.
The matrix V shown in these figure are called the sampling matrix.
Orthogonal structure is used in the representation of progressive analog video, which in turn used in computer monitors.
Hexagonal structure is used in the representation of of 2:1 interlaced analog video, which in turn used in TV monitors.

Dr. Rajkumar L. Biradar

4/23/2019

70 of 76

Sampling structure for digital video

Digital video can be obtained by sampling analog video in the horizontal direction along the scan lines, or by applying an inherently 3-D sampling structure to sample the time varying image, as in case of some of the solid state sensors.
Examples of most popular 3-D sampling structure are shown in figures below figures, in which each circle indicates the a pixel location and the number inside the circle indicates the time of sampling.

Dr. Rajkumar L. Biradar

4/23/2019

71 of 76

Dr. Rajkumar L. Biradar

4/23/2019

72 of 76

Filtering Operations in Camera and Display Device

In general, all sampled signals (ex: digital video) require pre filters and reconstruction filters for perfect sampling and reconstruction of signal from its samples.
In this section, we discuss how practical cameras and display device perform above task in crude way and how HVS partially accomplished the required interpolation task.
Here camera aperture behaves like pre filtering operation and display aperture behaves like reconstruction filter.

Camera Aperture

Consider a camera that samples a continuously time varying scene at regular intervals in horizontal, vertical and temporal directions, respectively.
The sampling frequencies are
.
The ideal pre filter should be a LPF with cut-off frequencies equal to half of sampling frequencies.
In following, we study the actual pre filters implemented in typical cameras i.e.

Temporal Aperture
Spatial Aperture.

Dr. Rajkumar L. Biradar

4/23/2019

73 of 76

Temporal Aperture

A video camera typically accomplishes a certain degree of pre filtering operation in the capturing process.
First, the intensity values read out at any frame instant are not the sensed values at that time, rather, they are averages over a certain time interval, , known as exposure time. Therefore, the camera is applying a pre filter in temporal domain with an impulse response of the form:

The frequency respose of this filter is:

We can see that it reaches to zero at .
The temporal sampling and ideal prefilter for this task is LPF wth cuttoff frequency at half of the sampling rate.
By choosing , the camera can suppress temporal aliasing component near the sampling rate. But too large will blur the signal.

Dr. Rajkumar L. Biradar

4/23/2019

74 of 76

Spatial Aperture

In addition to temporal integration, camera perform spatial integration also.
The value read out at any pixel position ( a position on a scan line in a tube based camera or a sensor in a CCD camera) is not the optical signal at that point alone but rather a weighted integration of the signals in a small window surrounding it, called the aperture.
The shape of the aperture and weighting values constitute the camera spatial aperture function. This aperture function serves as the spatial pre filter and its FT is known as modulation transfer function (MTF) of camera.
With most cameras, the spatial aperture function can be approximated by a circularly symmetric Gaussian function,

The frequency response of this function is also Gaussian

with

Dr. Rajkumar L. Biradar

4/23/2019

75 of 76

The value of depends on the aperture size and shape, which are usually designed so that the frequency response is 0.5 at half of the vertical and horizontal sampling rates.
Assuming , we see that this require .

Combined Aperture

The overall camera aperture function or pre filter is

With frequency response

Dr. Rajkumar L. Biradar

4/23/2019

76 of 76

Display Aperture

In CRT monitor, an electronic gun emits an electronic beam across the screen line by line.
The beam strikes phosphorous with intensities proportional to the intensity of the video signal at corresponding locations.
To diplay a color image, three beams are emitted by three separate guns, striking red, green and blue phosphorous with desired intensity combination at each location.
The HVS system has LP or BP characteristics, therefore eye perform some degree the required interpolation task.

Dr. Rajkumar L. Biradar

4/23/2019