2 of 53

Basics

A digital image – a version of the visual stimulus sampled at discrete locations, with discretized values
Can be regarded as a function I = f(x,y) where (x,y) are spatial (integer) coordinates in typically a rectangular domain Ω = [0,W-1] x [0,H-1].
Each ordered pair (x,y) is called a pixel.
The pixel is generally square (sometimes rectangular) in shape.

3 of 53

Basics

Pixel dimensions (height/width of the pixel) relate to the spatial resolution of the sensor in the camera that collects light reflected from a scene.

Remember: the actual visual signal is analog, but digital cameras capture a discrete version of it, and also quantize the intensity values.

That is they capture the visual stimulus only at specific points (x,y), usually evenly spaced from each other in both X and Y directions.

4 of 53

Basics

In a typical photographic grayscale image, the intensity values f(x,y) lie in the range from 0 to 255 (8 bit integers).

They are quantized versions continuous values corresponding to the actual light intensity that strikes a light sensor in the camera.

5 of 53

Image Alignment

Consider images I₁ and I₂ of a scene acquired through different viewpoints.

Pixels in digital correspondence (same coordinate values in the image domain Ω, not necessarily containing measurements of the same physical point)

Pixels in physical correspondence (containing measurements of the same physical point, but not necessarily the same coordinate values in the image domain Ω)

6 of 53

Image Alignment

I₁ and I₂ are said to be aligned if for every (x,y) in the domain Ω, the pixels at (x,y) in I₁ and I₂ are in physical correspondence.

If not, the images are said to be misaligned with respect to each other.

Or we say there is relative motion between the images.

Image alignment (also called registration) is the process of correcting for the relative motion between I₁ and I₂.

7 of 53

Motion Models

Let us denote the coordinates in I₁ as (x₁,y₁) and those in I₂ as (x₂,y₂).
Translation:

Rotation about point (0,0) anti-clockwise through angle θ

2D Rotation matrix (orthonormal matrix)

8 of 53

Motion Models

Rotation about point (x_c,y_c) anti-clockwise through angle θ

-Perform translation such that (x_c,y_c) coincides with the origin (0,0).

-Rotate about the new origin.

-Translate back.

-The extra ones (third row) are called homogeneous coordinates – they facilitate using matrix multiplication to represent translations.

9 of 53

Motion Models

Rotation and translation:

Affine transformation: (rotation, scaling and shearing) besides translation

Assumption: the 2 x 2 sub-matrix A is NOT rank deficient, otherwise it will transform two-dimensional figures into a line or a point

10 of 53

Motion Models

Scaling about the origin

11 of 53

Motion Models

Shearing:

12 of 53

Motion Models

The 2D affine motion model (including translation in X and Y direction) includes 6 degrees of freedom.
Note: this motion model accounts for in-plane motion only (example: not an appropriate model for “3D head profile view versus head frontal view”)
Note: even with in-plane motion, there exist more complicated motion models, but we will stick to this one for now.

13 of 53

Motion Models

Composition of multiple types of motion is given by the multiplication of their corresponding matrices.

So, if you first scale (matrix S) and then rotate (matrix R), then the resultant transformation = RS.

Note: most motion compositions are not commutative (RS is not equal to SR).

14 of 53

Motion Models

In actual coding, you will typically not use matrices.

Rather you will implement the formula as is.

So why is matrix-based motion representation useful?

Because it allows for a compact way to represent the composition of many different types of motion.

15 of 53

Alignment with control points

Control points: pairs of physically corresponding points – maybe marked out manually, or automatically using geometric properties of the image.

Number of control points k MUST be

>= u/2, where u = number of unknown parameters in the motion model (each point has two coordinates – x and y). The number of control points is several times smaller than the number of image pixels.

Solve for unknown parameters using least-squares framework (i.e. pseudo-inverse)

Apply the motion based on these parameters to the first image

16 of 53

Alignment with control points

Not always feasible, if it requires manual intervention

Error-prone

There are methods for finding matching control points automatically (eg: the popular SIFT technique), but we will not cover them in this course.

17 of 53

Alignment with mean squared error

Mean squared error is given by:

Find motion parameters as follows:

Find transformation matrix T which produces the least value of MSSD

18 of 53

Alignment with mean squared error

For simplicity, assume there was only rotation and translation.
Then we have

19 of 53

Alignment with mean-squared error

There are many ways to do this minimization. The simplest but least efficient way is to do a brute-force search.
Sample θ, t_x and t_y uniformly from a certain range (example: θ from -45 to +45, t_x or t_y from -30 to +30).
Apply this motion to I₁ keeping I₂ fixed (alternatively to I₂keeping I₁fixed, if the matrix is invertible), and compute the MSSD.
Each time, compute the MSSD. Pick the parameter values corresponding to minimum MSSD.

20 of 53

Image Warping

Forward warping method:
Apply the transformation T to every coordinate vector v = [x y] in the original image to yield new coordinate Tv.
Copy the intensity value from v in the original image to the new location in the warped image. Careful handling is needed if Tv is not an integer (highly likely).

21 of 53

Image Warping

Forward warping:

-Can leave the destination image with some holes if you scale up.

-Can lead to multiple answers in one pixel if you scale down.

22 of 53

Image warping

Reverse warping:
For every coordinate v = [x y] in the destination image, copy the intensity value from coordinate T^-1v in the original image.
In case of non-integer value, perform interpolation (nearest neighbor or bilinear)

23 of 53

Interpolation

Nearest neighbor method:

Use value a₄ (as the pixel that is nearest to the red point contains value a₄)

Bilinear method:

Use the following value, a weighted combination of the four neighboring pixel values, with more weight to nearer values:

24 of 53

Bilinear interpolation in more detail

We interpolate first in the X direction:

We then interpolate first in the Y direction:

In both cases, notice that higher weight is given to the pixels that are closer to P.

25 of 53

Bilinear interpolation in more detail

This formula will remain unchanged if you first interpolated in the Y direction and then in the X direction. Verify this yourself.

26 of 53

Bilinear interpolation in more detail

Here we are approximating the image intensity in the form of the following bilinear function:

Here a₀, a₁, a₂, a₃ are scalar coefficients. The value of f(x,y) is known at the four corners of a pixel.

The function would have been linear if the term in xy were absent (i.e. if a₃= 0).

27 of 53

Bilinear interpolation in more detail

How do we determine the coefficients a₀, a₁, a₂, a₃ ?

These coefficients can be obtained by inverting the 4 x 4 matrix.

It can be shown (through tedious calculations) that the result of this is equivalent to the formula we derived earlier.

28 of 53

Alignment with mean squared error

In the ideal case, the MSSD between two perfectly aligned images is 0. In practice, it will have some small non-zero value even under more or less perfect alignment due to sensor noise or slight mismatch in pixel grids.

29 of 53

Careful: field of view issues!

Region of overlap (also called field of view) when the moving image is warped

Fixed image (also called reference image)

Note: compute MSSD only over region of overlap.

30 of 53

-60

-45

-30

-15

+15

+30

+45

+60

Change in region of overlap (also called field of view), as the moving image is warped. The field of view in any of these figures consists of all those pixels not marked black. It represents the set of pixels (x,y) where both I₁(x,y) and I₂(x,y) are well defined.

31 of 53

Alignment with mean squared error

MSSD is called an “image similarity metric”.

MAJOR ASSUMPTION: Physically corresponding pixels have the same intensity!

32 of 53

Image alignment: Intensity changes in images

Images acquired by different sensors (MR and CT, camera with and without flash, etc.)
Changes in lighting condition

MR-T1

MR-PD

33 of 53

Image alignment: Intensity changes in images

If following relationship (function g) exists and we knew it, the solution is easy:

Physically corresponding points

34 of 53

Image alignment: Intensity changes in images

What if the relationship exists in the following linear form, but we knew it only partially?

Physically corresponding points

Normalized cross-correlation, also called correlation-coefficient

We are taking the absolute value here, to take care of cases where one image has positive values and the other has negative values. We are assuming a linear relationship between the intensities of I1 and I2 but we do not assume knowledge of the scalar coefficients a and b.

35 of 53

Image alignment: Intensity changes in images

Normalized cross-correlation, also called correlation-coefficient.

The NCC is like the absolute normalized dot product between mean-deducted images.

36 of 53

Image alignment: intensity changes in images?

Assume there exists a functional relationship between intensities at physically corresponding locations in the two images.

But suppose we didn’t know it (most practical scenario) and couldn’t find it out.

We will use image histograms!

37 of 53

Image Histogram

In a typical digital image, the intensity levels lie in the range [0,L-1].
The (normalized) histogram of the image is a discrete function of the form p(r_k) = n_k/HW, where r_k is the k-th intensity value, and n_k is the number of pixels with that intensity. (H,W = image dimensions)
Sometimes, we may consider a range of intensity values for one entry in the histogram, in which case r_k = [r^min_k, r^max_k] represents an intensity bin, and n_k is the number of pixels whose intensity falls within this bin.
Note p(r_k) >= 0 always, and all the p(r_k) values sum up to 1.

38 of 53

These are unnormalized histograms of two different images. That is, the entries of the histogram represent frequencies in this case without division by HW = #pixels.

39 of 53

Joint Image Histogram

A joint image histogram is a function of the form p(r_k1,r_k2) where r_k1and r_k2represent intensity bins from the two images I₁ and I₂ respectively.

p(r_k1,r_k2) = number of pixels (x,y) such that I₁(x,y) and I₂(x,y) lie in bins r_k1and r_k2respectively, divided by HW.

40 of 53

Registered images: joint histogram plot looks “streamlined”

Values in I₁(from 0 to 60)

Values in I₂

from 0 to 60

I₁

I₂

41 of 53

Misaligned images: joint histogram plot looks “dispersed”

We need a method to quantify how dispersed a joint histogram actually is.

Values in I₁(from 0 to 60)

Values in I₂

from 0 to 60

42 of 53

Measure of dispersion

Consider a discrete random variable X with normalized histogram p(X) [also called the probability mass function].
The entropy of X is a measure of the uncertainty in X, given by the following formula:

Note: entropy is a function of the normalized histogram of X.
Not a function of the actual values of X.
The entropy is always non-negative.

43 of 53

Entropy

The entropy is maximum if X has a discrete uniform distribution, i.e. p(X=x₁) = p(X=x₂) for all values x₁and x₂in DX. The maximum value is log(|DX|).

The entropy is minimum (zero) if the normalized histogram of X is a Kronecker delta function, i.e. p(X= x₁) = 1 for some x₁, and p(X= x₂) = 0 for all x₂ ≠ x₁.

44 of 53

Joint entropy

The joint entropy of two random variables X and Y is given as follows:

Maximum entropy:

-Uniform distribution on X and Y: entropy value log(|DX||DY|) where DX, DY are the set of discrete values that X and Y can acquire

Minimum entropy:

-Kronecker delta (i.e. a PMF will all probability concentrated on only one entry with others being 0): entropy value 0 = non uncertainty

45 of 53

Joint entropy

Minimizing joint entropy is one method of aligning two images with different intensity profiles.

46 of 53

I2: obtained by squaring the intensities of I1, and rotating I1 anticlockwise by 30 degrees.

I2 treated as moving image, I1 treated as fixed image. Joint entropy minimum occurs at -30 degrees.

47 of 53

Components of an Image Alignment Algorithm

Choice of a metric to optimize (joint entropy or mean squared error)

Choice of a motion model (only translation, translation + rotation, affine, etc.)

Choice of an interpolation algorithm to generate the warped image

Choice of an optimization algorithm (here, we just used brute force search)

48 of 53

Image Alignment: applications, related problems

Template matching

Image Panoramas

49 of 53

Template Matching

Look for the occurrence of a template (a smaller image) inside a larger image.
Example: eyes within face image

Templates

50 of 53

Template matching

Let T = template image (smaller) of size h x w and J = larger image of size H x W
For every pixel (x,y) in J, consider a portion z_xy = J(x:x+w-1,y:y+h-1)
Select the portion z_xy with smallest MSSD compared to T
In some variants, the larger image J may be rotated with respect to T.
In such cases, repeat the above procedure for every rotation of J from (say) -90 to +90 degrees.
That is for every θ from -90 to +90 degrees, let J_θ be a rotated version of J. Now consider all z_xy in J_θ and report the (x,y, θ) triple that produces the least MSSD with respect to T.

51 of 53

Image Panoramas

http://cs.bath.ac.uk/brown/autostitch/autostitch.html

A camera has a limited field of view. A scene may have much larger “area” than what can be captured from a single camera.

So one can acquire multiple images, each from a different viewpoint, and you need to stitch these together to form a panorama or mosaic. The stitching involves more complicated motion models (called homography) than what you have studied in this course.

52 of 53

What we learnt..

Affine motion model
Forward and reverse image warping
Field of view during image alignment
Measures for Image alignment: sum of squared differences, normalized cross-correlation, joint entropy

53 of 53

What we didn’t learn

Complicated motion models: higher degree polynomials, non-rigid models (example: motion of an amoeba, motion of the heart during the cardiac cycle, facial expressions, etc.)

Efficient techniques for optimizing the measure for image alignment