1 of 76

UNIT IV: Introduction Digital Video Processing

Shri Bhagyavanthi Krupa

Dr. Rajkumar L. Biradar

Prof. ETE Dept, GNITS

2 of 76

Contents

  • Basics of Video
  • Digital Video
  • Time Varying Image Formation Models/Video Modeling
  • Three Dimension Motion Models
  • Geometric Image Formation
  • Photometric mage Formation
  • Sampling of Video Signals
  • Filtering Operations

Dr. Rajkumar L. Biradar

2

4/23/2019

3 of 76

Basics of Video

  • Video signal is basically sequence of time varying images.
  • A still image has a spatial distribution of intensities that remain constant with time.
  • Time varying image has a spatial intensity distribution that varies with time.
  • Video signal is treated as a series of images called frames.
  • An illusion of continuous video is obtained by changing the frames in a faster manner which is generally termed as frame rate.
  • It is 3-D signal
    • 2 spatial dimensions & 1 time dimension
    • Continuous I(x, y, t) discrete I(m, n, k)
    • (In text book he used s in place I)

Dr. Rajkumar L. Biradar

3

4/23/2019

4 of 76

Dr. Rajkumar L. Biradar

4

4/23/2019

5 of 76

Dr. Rajkumar L. Biradar

5

4/23/2019

6 of 76

Dr. Rajkumar L. Biradar

6

4/23/2019

7 of 76

Dr. Rajkumar L. Biradar

7

4/23/2019

8 of 76

Dr. Rajkumar L. Biradar

8

4/23/2019

9 of 76

Dr. Rajkumar L. Biradar

9

4/23/2019

10 of 76

Dr. Rajkumar L. Biradar

10

4/23/2019

11 of 76

Dr. Rajkumar L. Biradar

11

4/23/2019

12 of 76

Dr. Rajkumar L. Biradar

12

4/23/2019

13 of 76

Dr. Rajkumar L. Biradar

13

4/23/2019

14 of 76

Dr. Rajkumar L. Biradar

14

4/23/2019

15 of 76

Dr. Rajkumar L. Biradar

15

4/23/2019

16 of 76

Dr. Rajkumar L. Biradar

16

4/23/2019

17 of 76

Dr. Rajkumar L. Biradar

17

4/23/2019

18 of 76

Dr. Rajkumar L. Biradar

18

4/23/2019

19 of 76

Dr. Rajkumar L. Biradar

19

4/23/2019

20 of 76

Dr. Rajkumar L. Biradar

20

4/23/2019

21 of 76

  • I(x,y,t) is analog video signal, where (x,y) denote spatial coordinates and t denotes temporal/time variable in continuous time domian.
  • I(x,y,t) is analog because it is continuous in both in spatial(space) and time domain.
  • However, the analog video that is viewed on the display monitors is not truly analog, since it is sampled along one space (vertical) coordinate and along the time domain.
  • Practically so called analog video systems, such as TV and monitors, represents video signal as one dimension electrical signal f(t).

Dr. Rajkumar L. Biradar

21

4/23/2019

22 of 76

Analog Video Signal

  • The analog video signal refers to one dimension electrical signal f(t), which is obtained by sampling I(x, y, t) along vertical (y) space direction and along the time (t) direction. This periodic sampling is called as scanning.
  • Scanning results a series of time samples which are called as frames or complete picture.
  • Each frame is composed of space samples called as scan lines

Dr. Rajkumar L. Biradar

22

4/23/2019

23 of 76

Types of Video Scanning

The two types of scanning, they are

Progressive Scanning

Interlaced Scanning

Progressive Scanning:

  • A progressive scan traces a complete frame at every ∆t sec.
  • The computer industry uses progressive scanning with scan rate of ∆t=1/72 S for high resolution monitors.
  • Optic or electronic beam of an analog camera continuously scan the frame from top to bottom and then back to top.
  • The resulting video signal consist of series of frames separated by a regular interval of time ∆t and each frame consist of a consecutive set of horizontal lines, separated by a regular vertical spacing.

Dr. Rajkumar L. Biradar

23

4/23/2019

24 of 76

  • The intensity values captured along contiguous scan lines over consecutive frames form a 1-D analog video signal f(t). It is also called as raster scan.

  • With a color camera, three 1-D raster's are converted into a composite signal, which is color raster.

(Fig from Y Wang, Video processing and communication text book)

  • It is important that frame rates should be high enough otherwise the displayed video will appear to ‘flicker’. The human eye detects the flicker if the scan rate(refresh rate) is less then 50 frames /sec.
  • Clearly, computer monitors (72 frame/s) exceeds this rate.
  • However in many other systems like TV such fast rate is not possible because of bandwidth limitations. Hence interlaced scanning is a solution to this.

Dr. Rajkumar L. Biradar

24

4/23/2019

25 of 76

Dr. Rajkumar L. Biradar

25

4/23/2019

26 of 76

Interlaced Scanning:

  • Interlaced Scanning: TV industry uses 2:1interlaced scanning.
  • In this, each frame is scanned in two fields, called odd fields and even fields. Each of this fields consist of half numbers of lines in a frames.
  • The odd field consist of the odd lines and even field consist of even lines of each frame.
  • Each frame is are read out in two separate scans of the odd and even fields respectively. This allows good reproduction of movement in the scene at relatively low field rate.
  • In this way flicker is effectively eliminated provided the field rate is above the visual limit of 50 Hz.
  • Broadcast television in US uses the frame rate of 30 Hz, hence the field rate is 60 Hz which is well above 50Hz.
  • The spot snaps back from B to C is called horizontal retrace.
  • The spot snaps back from D to E is called vertical retrace.

Dr. Rajkumar L. Biradar

26

4/23/2019

27 of 76

Dr. Rajkumar L. Biradar

27

4/23/2019

28 of 76

Analog Video Signal Format

  • Despite the advance of digital video technology, the most common consumer display mechanism for video still uses analogue display devices such as CRT.
  • Until all terrestrial and satellite broadcasts become digital, analogue video formats will remain significant.
  • The three principal Analogue Video Signal formats are:
    • NTSC (National Television Systems Committee),
    • PAL (Phase Alternate Line) and
    • SECAM (Sequential Color with Memory).
  • All the three are television video formats in which the information in each picture is captured by CCD or CRT is scanned from left to right to create a sequential intensity signal. The formats take advantage of the persistence of human vision by using interlaced scanning pattern in which the odd and even lines of each picture are read out in two separate scans of he odd and even fields respectively. This allows good reproduction of movement in the scene at the relatively low field rate of 50 fields/sec for PAL and SECAM and 60 fields/sec for NTSC

Dr. Rajkumar L. Biradar

28

4/23/2019

29 of 76

Digital Video

  • A digital video is obtained either sampling a raster scan (f(t)) or directly using a digital video camera.
  • Presently, all digital cameras use CCD(charged coupled device) sensors.
  • A digital camera samples the image scene as discrete frames.
  • Each frame consists of output values from a CCD array, which is discrete by nature in both horizontal and vertical direction.
  • Digital video is defined by following parameters
    • frame rate, fs,t
    • the line numbers fs,y and
    • number samples per line fs,x
  • From above parameters, we can find

Temporal sampling interval or frame interval ∆t=1/ fs,t

Vertical sampling interval interval ∆y=picture or frame height/ fs,y

Horizontal sampling interval ∆x=picture or frame width/ fs,x

  • Digital video is denoted by I(m,n,k) , where integer indices m and n are the column and row indices, and k is the frame number

Dr. Rajkumar L. Biradar

29

4/23/2019

30 of 76

  • The actual spatial and temporal location corresponding to the integer indices are x=m∆x, y=n∆y and t=k∆t.
  • For convenience, we use the notation I(x,y,t) to describe the video signal in general, which could be either analog or digital. We will use I(m,n,k) only when specifically addressing digital video.
  • Number of bits used to represent digital video
  • Nb is the number of bits used to denote the pixel value.
  • Nb=8 for monochrome video, & Nb =24 for color video.
  • The data rate R of digital video is determined by

R= fs,t fs,y fs,x Nb bps or kbps or mbps

Dr. Rajkumar L. Biradar

30

4/23/2019

31 of 76

Ex: Video Cameras

  • Frame-by-frame capturing
  • CCD sensors (Charge-Coupled Devices)
    • 2-D array of solid-state sensors
    • Each sensor corresponds to a pixel
    • Stored in a buffer and sequentially read out
    • Widely used.

Note: The width to height ratio of a video frame is called as image aspect ratio (IAR). It 3:4 is standard TV (SDTV), up to 2:2 used in wide screen movies and 16:9 is used in HDTV.

For digital video, the ratio of width to height of rectangular area is called pixel aspect ratio (PAR). It is related IAR by

PAR=IAR fs,y /fs,x .

For proper display of digitized video signal, one must specify either PAR or IAR along with fs,y & fs,x

The device should match to PAR specified for the signal, otherwise object shape will be distroted.

Dr. Rajkumar L. Biradar

31

4/23/2019

32 of 76

Why Digital?

  • “Exactness”
    • Exact reproduction without degradation
    • Accurate duplication of processing result
  • Convenient & powerful computer-aided processing
    • Can perform rather sophisticated processing through hardware or software
  • Easy storage and transmission
    • 1 DVD can store a three-hour movie !!!
    • Transmission of high quality video through network in reasonable time

Dr. Rajkumar L. Biradar

32

4/23/2019

33 of 76

Application of Digital Video

Dr. Rajkumar L. Biradar

33

4/23/2019

34 of 76

Time Varying Image Formation Models or Video Modeling

  • In this, we present simplest models for temporal variations of spatial intensity pattern in the image plane.
  • For Ex: to describe the change between consecutive images/frames of video sequence in terms of objects motion, illumination changes and camera motion. We need some model that describe the real world and image formation process.
  • Most important models are scene, object, camera and illumination models. These model describe the assumption that we make about the real world.
  • Depending upon the selected model, we are able to describe the real world with more or less detail and precision.

Dr. Rajkumar L. Biradar

34

4/23/2019

35 of 76

  • We represent a time varying (temporal varying) image by a function of three continuous variable, , which is formed by projecting a time –varying 3-D spatial scene into 2-D image plane.
  • The temporal variations in 3-D scene are usually due to movements of objects in the scene.
  • Thus, time varying images reflect a projection of 3-D moving object into 2-D image plane as a function of time .(we called in first line of this slide).
  • Digital video corresponds to a spatio-temporally sampled version of this time varying image(frame).

Dr. Rajkumar L. Biradar

35

4/23/2019

36 of 76

  • A block diagram represents time varying image formation model

                  • Observation Noise

Digital Video Formation

  • 3-D scene modeling refers to modeling of 3-D motion and structure(shape) of the objects in 3-D.
  • Image formation refers to mapping 3-D scene to 2-D image plane.

Ex: Geometric and photographic image formation.

The last block is obtain digital video by spatio-temporal sampling.

Note: This what we are going to study in this unit.

Dr. Rajkumar L. Biradar

36

4/23/2019

3D Scene Modeling

Image Formation

Spatio-Temporal Sampling

+

37 of 76

Dr. Rajkumar L. Biradar

37

4/23/2019

We need to understand some basics:

1. Pinhole cameras

  • Abstract camera model - box with a small hole in it
  • Pinhole cameras work in practice

(Forsyth & Ponce)

38 of 76

Dr. Rajkumar L. Biradar

38

4/23/2019

39 of 76

2. Lens

  • Lens duplicate pinhole geometry without resorting to undesirably small apertures.
    • Gather all the light radiating from an object point towards the lens’s finite aperture .
    • Bring light into focus at a single distinct image point.

Dr. Rajkumar L. Biradar

39

4/23/2019

refraction

40 of 76

Dr. Rajkumar L. Biradar

40

4/23/2019

41 of 76

3. Thin lens equation

Dr. Rajkumar L. Biradar

41

4/23/2019

Assume an object at distance u from the lens plane:

object

f

u

v

image

42 of 76

Thin lens equation (cont’d)

Dr. Rajkumar L. Biradar

42

4/23/2019

Using similar triangles

y’/y = v/u

f

u

v

y’

y

image

43 of 76

Thin lens equation (cont’d)

Dr. Rajkumar L. Biradar

43

4/23/2019

f

u

v

y’

y

y’/y = (v-f)/f

Using similar triangles:: The relation between the focal

length (f), the distance of the object from the camera (u), and the distance at

which the object will be in focus (v) is given by

image

44 of 76

Geometric Image Formation

  • Imaging system capture 2-D projection of a time varying 3-D scene. This projection can be represented by a mapping from 4-D space to a 3-D space.

Where , the 3-D world co-ordinates, , the 2-D image plane coordinates and t time, are the continuous variables.

There are two types of projection

1. Perspective (central) projection.

2. Orthographic (parallel) projection.

Dr. Rajkumar L. Biradar

44

4/23/2019

45 of 76

1. Perspective Projection/Pin hole Camera Model/Central Projection

  • it is widely used model for approximation of the projection of real world objects (3-D) onto 2-D plane.
  • It reflects the 2-D image formation using an ideal pin hole camera according to the principles of geometrical optics.
  • All the rays from the object pass through the center of projection, which corresponds to the center of lens. For this reason, it is also called as “central projection”.
  • Perspective projection is illustrated in below fig. when the center of projection is between the object and image plane, and image plane (x1,x2) coincides the (X1,X2) plane of real world coordinates.

Dr. Rajkumar L. Biradar

45

4/23/2019

46 of 76

and where f is focal length of distance the center of projection to the image plane. If we move center of projection to coincide with the origin of real world coordinates as shown next fig, simple change in above eqn yields

Dr. Rajkumar L. Biradar

46

4/23/2019

The algebraic relation that describe

the perspective transformation for

the configuration shown in figure is

obtained based on the similar

triangles formed by drawing

perpendicular lines from the object

point (X1,X2,X3) and image

point (x1,x2,0) or (x1,x2) to the

X3 axis respectively.

From Fig, we have (negative sign because of divergence lens [-f/(X3-f)]

47 of 76

  • x1/X1=-(-f/X3)
  • =f/X3

f

Dr. Rajkumar L. Biradar

47

4/23/2019

48 of 76

Note: Previous analysis was from Tekalp, following is from Wang

Dr. Rajkumar L. Biradar

48

4/23/2019

49 of 76

Dr. Rajkumar L. Biradar

49

4/23/2019

50 of 76

Dr. Rajkumar L. Biradar

50

4/23/2019

51 of 76

2. Orthographic Projection

  • When image plane is very far from camera plane, perspective projection can be approximated by orthographic projection, which is also known as parallel projection.
  • Orthograhic projection approximation is an actual imaging process where it is assumed that all the ray from 3-D scene (object) to the image plane travel parallel to each other.
  • The relation between world coordinates and orthographic projection are one to one and is given by
  • and
  • Or in terms of vector–matrix relation

Dr. Rajkumar L. Biradar

51

4/23/2019

52 of 76

Dr. Rajkumar L. Biradar

52

4/23/2019

53 of 76

Geometric Image Formation Models: Conclusion

Dr. Rajkumar L. Biradar

53

4/23/2019

54 of 76

Photometric Image Formation Model

  • Image intensities can be modeled as proportional to the amount of the light reflected by the objects of the scene. In general, the scene reflection can contain two components.
  • Lambertian component: It has equal energy distribution in all the direction. It is also called as diffuse reflection. Wood surface and cement surface belongs to this categories.
  • Specular Component: It is strongest in the mirror direction of the incident light. Shiny and mirror surfaces belongs to this categories. (specularly reflected light is reflected only along a direction where it is equal to the angle of incident light).
  • In real life surfaces are a mixture of Lambertian (i.e. diffusely reflecting or satisfying Lambert's law) and specular surfaces.
  • We concentrate only on the surfaces where the specular component can be neglected.

Dr. Rajkumar L. Biradar

54

4/23/2019

55 of 76

Lambertian Relection Model

  • If the lambertian surface is illuminated by a single point source with uniform intensity (in time), then the resulting image intensity (reflected light intensity) in all directions is proportional to product of the incident light L, normal  of the surface N(t) and the cosine of the angle them (Lambert's law).

  • Where denotes the surface albedo, i.e., the fraction of the light reflected by the surface (range is 0-1) (reflection coeeficient).
  • N is the unit vector normal to the scene of

surface.

L is the unit vector in the mean illumination

direction

Dr. Rajkumar L. Biradar

55

4/23/2019

θ

56 of 76

Three Dimension Motion Models

  • In this, we address modeling of the relative 3-D motion between the camera and objects in the scene.
  • This includes 3-D motion of the objects in the scene such as translation and rotation as well as 3-D motion of camera such as zooming and panning (rotating camera in its vertical and horizonatl direction) (develop in particular way ie pan out or pan in).

  • According classical kinematics, 3-D motion can be classified as

  • Rigid motion: In this, the relative distance between the set of 3-D points remain fixed as object evolves (develops) in time. That is, the 3-D structure/shape of moving object can be modeled by a non-deformable surface, e.g., a planer, piece wise planer or polynomial surface.

  • Non Rigid Motion: in this, a deformable surface model is utilized in modeling the 3-D structure.

Dr. Rajkumar L. Biradar

56

4/23/2019

57 of 76

Dr. Rajkumar L. Biradar

57

4/23/2019

58 of 76

Dr. Rajkumar L. Biradar

58

4/23/2019

59 of 76

Dr. Rajkumar L. Biradar

59

4/23/2019

60 of 76

Rigid Motion Model in Cartesian Coordinates

  • The motion of a rigid objects can be expressed in terms of motion parameters like translation vector and rotation matrix [R].
  • The 3-D translation vector describes the displacement (translation) of a point from X

to by in the direction of the coordinates axis X, Y and Z respectively.

  • (1)

  • X and X’ denote the coordinates of point at times t anf t’ wrt to center of rotation.
  • If an object is translated, above equation (1) holds good for all objects points.

Dr. Rajkumar L. Biradar

60

4/23/2019

61 of 76

Rotation Matrix:

  • The 3-D rotation matrix in Cartesian coordinates can be characterized by Eular angles of rotation about the three coordinates axis or by an axis of rotation and angle about the axis.
  • The two description can be shown to be equivalent under the assumption of infinitesimal rotation.
  • Eular angles in the three coordinates: The 3-D rotation in the space can be represented by Eular angles rotation about the X, Y and Z axis, respectively

Dr. Rajkumar L. Biradar

61

4/23/2019

62 of 76

The matrix R that describe clockwise rotation about the individual axis are given by�

  • Assuming the rotation from frame to frame is infinitesimally small, i.e.,

and thus approximating , the above matrix will simplify as

Dr. Rajkumar L. Biradar

62

4/23/2019

63 of 76

  • Then the composite rotation matrix is given by

  • Now, we can express the motion of a point X on the object surface from X to X’ as

Eqn (1)🡪

In aboev equation both translation and rotation are taken care. (2)

Dr. Rajkumar L. Biradar

63

4/23/2019

64 of 76

Effect of camera in 3-D Motion

  • It is possible to incorporate the effect of zooming into the 3-D rotation model if we assume that the camera has fixed parameters but the object is artificially scaled up or down.
  • Eqn (2) 🡪

  • Where S is a scaling matrix

Dr. Rajkumar L. Biradar

64

4/23/2019

65 of 76

Non-Rigid or Deformable Motion

  • Modeling the 3-D structure and motion of non-rigid is a complex task. It is active research area today.
  • In theory, according to the mechanics of deformation bodies, the model which is studied in previous section can be extended to include 3-D non-rigid motion as

  • Where D is an arbitrary deformation matrix.

Dr. Rajkumar L. Biradar

65

4/23/2019

66 of 76

Sampling of video signal

  • In order to obtain an analog or digital signal, the continuous time varying image (video) needs to be sampled in both the spatial and temporal coordinates.
  • Analog video signal representation require sampling in vertical and temporal coordinates.
  • Ex: An analog video signal is a 1-D continuous function , where one of spatial coordinate is mapped onto time by means of scanning process.
  • For a digital video representation, is sampled in all three coordinates as per the sampling theorem or .
  • The spatio-tempral sampling process is depicted in below figure, where

is digital video signal and denotes the discrete spatial and temporal coordinates, respec tively.

Dr. Rajkumar L. Biradar

66

4/23/2019

Spatio-Temporal

Sampling

67 of 76

Dr. Rajkumar L. Biradar

67

4/23/2019

68 of 76

Sampling for Analog and Digital Video

  • In this section, we study the sampling structure utilized for the representation of analog and digital video.

Sampling structure for Analog Video

  • An analog video is obtained by sampling the time-varying image intensity distribution in vertical x2 and temporal t direction by 2-D sampling process known as scanning.
  • Continuous intensity information along each horizontal line is concatenated to form the 1-D analog video signal as a function of time.
  • The two most commonly used vertical-temporal sampling structure are

Orthogonal sampling structure

Hexagonal sampling structure

Dr. Rajkumar L. Biradar

68

4/23/2019

69 of 76

  • In these figures, each dot indicates a continuous line of video perpendicular to the plane of this slide.
  • The matrix V shown in these figure are called the sampling matrix.
  • Orthogonal structure is used in the representation of progressive analog video, which in turn used in computer monitors.
  • Hexagonal structure is used in the representation of of 2:1 interlaced analog video, which in turn used in TV monitors.

Dr. Rajkumar L. Biradar

69

4/23/2019

70 of 76

Sampling structure for digital video

  • Digital video can be obtained by sampling analog video in the horizontal direction along the scan lines, or by applying an inherently 3-D sampling structure to sample the time varying image, as in case of some of the solid state sensors.
  • Examples of most popular 3-D sampling structure are shown in figures below figures, in which each circle indicates the a pixel location and the number inside the circle indicates the time of sampling.

Dr. Rajkumar L. Biradar

70

4/23/2019

71 of 76

Dr. Rajkumar L. Biradar

71

4/23/2019

72 of 76

Filtering Operations in Camera and Display Device

  • In general, all sampled signals (ex: digital video) require pre filters and reconstruction filters for perfect sampling and reconstruction of signal from its samples.
  • In this section, we discuss how practical cameras and display device perform above task in crude way and how HVS partially accomplished the required interpolation task.
  • Here camera aperture behaves like pre filtering operation and display aperture behaves like reconstruction filter.

Camera Aperture

  • Consider a camera that samples a continuously time varying scene at regular intervals in horizontal, vertical and temporal directions, respectively.
  • The sampling frequencies are
  • .
  • The ideal pre filter should be a LPF with cut-off frequencies equal to half of sampling frequencies.
  • In following, we study the actual pre filters implemented in typical cameras i.e.
    • Temporal Aperture
    • Spatial Aperture.

Dr. Rajkumar L. Biradar

72

4/23/2019

73 of 76

Temporal Aperture

  • A video camera typically accomplishes a certain degree of pre filtering operation in the capturing process.
  • First, the intensity values read out at any frame instant are not the sensed values at that time, rather, they are averages over a certain time interval, , known as exposure time. Therefore, the camera is applying a pre filter in temporal domain with an impulse response of the form:

  • The frequency respose of this filter is:

  • We can see that it reaches to zero at .
  • The temporal sampling and ideal prefilter for this task is LPF wth cuttoff frequency at half of the sampling rate.
  • By choosing , the camera can suppress temporal aliasing component near the sampling rate. But too large will blur the signal.

Dr. Rajkumar L. Biradar

73

4/23/2019

74 of 76

Spatial Aperture

  • In addition to temporal integration, camera perform spatial integration also.
  • The value read out at any pixel position ( a position on a scan line in a tube based camera or a sensor in a CCD camera) is not the optical signal at that point alone but rather a weighted integration of the signals in a small window surrounding it, called the aperture.
  • The shape of the aperture and weighting values constitute the camera spatial aperture function. This aperture function serves as the spatial pre filter and its FT is known as modulation transfer function (MTF) of camera.
  • With most cameras, the spatial aperture function can be approximated by a circularly symmetric Gaussian function,

  • The frequency response of this function is also Gaussian

  • with

Dr. Rajkumar L. Biradar

74

4/23/2019

75 of 76

  • The value of depends on the aperture size and shape, which are usually designed so that the frequency response is 0.5 at half of the vertical and horizontal sampling rates.
  • Assuming , we see that this require .

Combined Aperture

  • The overall camera aperture function or pre filter is

  • With frequency response

Dr. Rajkumar L. Biradar

75

4/23/2019

76 of 76

Display Aperture

  • In CRT monitor, an electronic gun emits an electronic beam across the screen line by line.
  • The beam strikes phosphorous with intensities proportional to the intensity of the video signal at corresponding locations.
  • To diplay a color image, three beams are emitted by three separate guns, striking red, green and blue phosphorous with desired intensity combination at each location.
  • The HVS system has LP or BP characteristics, therefore eye perform some degree the required interpolation task.

Dr. Rajkumar L. Biradar

76

4/23/2019