1 of 53

Image Warping and Alignment

AJIT RAJWADE

2 of 53

Basics

  • A digital image – a version of the visual stimulus sampled at discrete locations, with discretized values
  • Can be regarded as a function I = f(x,y) where (x,y) are spatial (integer) coordinates in typically a rectangular domain Ω = [0,W-1] x [0,H-1].
  • Each ordered pair (x,y) is called a pixel.
  • The pixel is generally square (sometimes rectangular) in shape.

2

3 of 53

Basics

  • Pixel dimensions (height/width of the pixel) relate to the spatial resolution of the sensor in the camera that collects light reflected from a scene.

  • Remember: the actual visual signal is analog, but digital cameras capture a discrete version of it, and also quantize the intensity values.

  • That is they capture the visual stimulus only at specific points (x,y), usually evenly spaced from each other in both X and Y directions.

3

4 of 53

Basics

  • In a typical photographic grayscale image, the intensity values f(x,y) lie in the range from 0 to 255 (8 bit integers).

  • They are quantized versions continuous values corresponding to the actual light intensity that strikes a light sensor in the camera.

4

5 of 53

Image Alignment

  • Consider images I1 and I2 of a scene acquired through different viewpoints.

5

Pixels in digital correspondence (same coordinate values in the image domain Ω, not necessarily containing measurements of the same physical point)

Pixels in physical correspondence (containing measurements of the same physical point, but not necessarily the same coordinate values in the image domain Ω)

6 of 53

Image Alignment

  • I1 and I2 are said to be aligned if for every (x,y) in the domain Ω, the pixels at (x,y) in I1 and I2 are in physical correspondence.

  • If not, the images are said to be misaligned with respect to each other.

  • Or we say there is relative motion between the images.

  • Image alignment (also called registration) is the process of correcting for the relative motion between I1 and I2.

6

7 of 53

Motion Models

  • Let us denote the coordinates in I1 as (x1,y1) and those in I2 as (x2,y2).
  • Translation:

  • Rotation about point (0,0) anti-clockwise through angle θ

7

2D Rotation matrix (orthonormal matrix)

8 of 53

Motion Models

  • Rotation about point (xc,yc) anti-clockwise through angle θ

8

-Perform translation such that (xc,yc) coincides with the origin (0,0).

-Rotate about the new origin.

-Translate back.

-The extra ones (third row) are called homogeneous coordinates – they facilitate using matrix multiplication to represent translations.

9 of 53

Motion Models

  • Rotation and translation:

  • Affine transformation: (rotation, scaling and shearing) besides translation

9

Assumption: the 2 x 2 sub-matrix A is NOT rank deficient, otherwise it will transform two-dimensional figures into a line or a point

10 of 53

Motion Models

  • Scaling about the origin

10

11 of 53

Motion Models

  • Shearing:

11

12 of 53

Motion Models

  • The 2D affine motion model (including translation in X and Y direction) includes 6 degrees of freedom.
  • Note: this motion model accounts for in-plane motion only (example: not an appropriate model for “3D head profile view versus head frontal view”)
  • Note: even with in-plane motion, there exist more complicated motion models, but we will stick to this one for now.

12

13 of 53

Motion Models

  • Composition of multiple types of motion is given by the multiplication of their corresponding matrices.

  • So, if you first scale (matrix S) and then rotate (matrix R), then the resultant transformation = RS.

  • Note: most motion compositions are not commutative (RS is not equal to SR).

13

14 of 53

Motion Models

  • In actual coding, you will typically not use matrices.

  • Rather you will implement the formula as is.

  • So why is matrix-based motion representation useful?

  • Because it allows for a compact way to represent the composition of many different types of motion.

14

15 of 53

Alignment with control points

15

Control points: pairs of physically corresponding points – maybe marked out manually, or automatically using geometric properties of the image.

Number of control points k MUST be

>= u/2, where u = number of unknown parameters in the motion model (each point has two coordinates – x and y). The number of control points is several times smaller than the number of image pixels.

Solve for unknown parameters using least-squares framework (i.e. pseudo-inverse)

Apply the motion based on these parameters to the first image

16 of 53

Alignment with control points

  • Not always feasible, if it requires manual intervention

  • Error-prone

  • There are methods for finding matching control points automatically (eg: the popular SIFT technique), but we will not cover them in this course.

16

17 of 53

Alignment with mean squared error

  • Mean squared error is given by:

  • Find motion parameters as follows:

17

Find transformation matrix T which produces the least value of MSSD

18 of 53

Alignment with mean squared error

  • For simplicity, assume there was only rotation and translation.
  • Then we have

18

19 of 53

Alignment with mean-squared error

  • There are many ways to do this minimization. The simplest but least efficient way is to do a brute-force search.
  • Sample θ, tx and ty uniformly from a certain range (example: θ from -45 to +45, tx or ty from -30 to +30).
  • Apply this motion to I1 keeping I2 fixed (alternatively to I2 keeping I1 fixed, if the matrix is invertible), and compute the MSSD.
  • Each time, compute the MSSD. Pick the parameter values corresponding to minimum MSSD.

19

20 of 53

Image Warping

  • Forward warping method:
  • Apply the transformation T to every coordinate vector v = [x y] in the original image to yield new coordinate Tv.
  • Copy the intensity value from v in the original image to the new location in the warped image. Careful handling is needed if Tv is not an integer (highly likely).

20

21 of 53

Image Warping

  • Forward warping:

-Can leave the destination image with some holes if you scale up.

-Can lead to multiple answers in one pixel if you scale down.

21

22 of 53

Image warping

  • Reverse warping:
  • For every coordinate v = [x y] in the destination image, copy the intensity value from coordinate T-1v in the original image.
  • In case of non-integer value, perform interpolation (nearest neighbor or bilinear)

22

23 of 53

Interpolation

23

a1

a2

a3

a4

A

B

C

D

Nearest neighbor method:

  • Use value a4 (as the pixel that is nearest to the red point contains value a4)

Bilinear method:

  • Use the following value, a weighted combination of the four neighboring pixel values, with more weight to nearer values:

24 of 53

Bilinear interpolation in more detail

24

We interpolate first in the X direction:

We then interpolate first in the Y direction:

In both cases, notice that higher weight is given to the pixels that are closer to P.

25 of 53

Bilinear interpolation in more detail

25

This formula will remain unchanged if you first interpolated in the Y direction and then in the X direction. Verify this yourself.

26 of 53

Bilinear interpolation in more detail

  • Here we are approximating the image intensity in the form of the following bilinear function:

  • Here a0, a1, a2, a3 are scalar coefficients. The value of f(x,y) is known at the four corners of a pixel.

  • The function would have been linear if the term in xy were absent (i.e. if a3 = 0).

26

27 of 53

Bilinear interpolation in more detail

  • How do we determine the coefficients a0, a1, a2, a3 ?

  • These coefficients can be obtained by inverting the 4 x 4 matrix.

  • It can be shown (through tedious calculations) that the result of this is equivalent to the formula we derived earlier.

27

28 of 53

Alignment with mean squared error

  • In the ideal case, the MSSD between two perfectly aligned images is 0. In practice, it will have some small non-zero value even under more or less perfect alignment due to sensor noise or slight mismatch in pixel grids.

28

29 of 53

Careful: field of view issues!

29

Region of overlap (also called field of view) when the moving image is warped

Fixed image (also called reference image)

Note: compute MSSD only over region of overlap.

30 of 53

30

-60

-45

-30

-15

0

+15

+30

+45

+60

Change in region of overlap (also called field of view), as the moving image is warped. The field of view in any of these figures consists of all those pixels not marked black. It represents the set of pixels (x,y) where both I1(x,y) and I2(x,y) are well defined.

31 of 53

Alignment with mean squared error

  • MSSD is called an “image similarity metric”.

  • MAJOR ASSUMPTION: Physically corresponding pixels have the same intensity!

31

32 of 53

Image alignment: Intensity changes in images

  • Images acquired by different sensors (MR and CT, camera with and without flash, etc.)
  • Changes in lighting condition

32

MR-T1

MR-PD

33 of 53

Image alignment: Intensity changes in images

  • If following relationship (function g) exists and we knew it, the solution is easy:

33

Physically corresponding points

34 of 53

Image alignment: Intensity changes in images

  • What if the relationship exists in the following linear form, but we knew it only partially?

34

Physically corresponding points

Normalized cross-correlation, also called correlation-coefficient

We are taking the absolute value here, to take care of cases where one image has positive values and the other has negative values. We are assuming a linear relationship between the intensities of I1 and I2 but we do not assume knowledge of the scalar coefficients a and b.

35 of 53

Image alignment: Intensity changes in images

35

Normalized cross-correlation, also called correlation-coefficient.

The NCC is like the absolute normalized dot product between mean-deducted images.

36 of 53

Image alignment: intensity changes in images?

  • Assume there exists a functional relationship between intensities at physically corresponding locations in the two images.

  • But suppose we didn’t know it (most practical scenario) and couldn’t find it out.

  • We will use image histograms!

36

37 of 53

Image Histogram

  • In a typical digital image, the intensity levels lie in the range [0,L-1].
  • The (normalized) histogram of the image is a discrete function of the form p(rk) = nk/HW, where rk is the k-th intensity value, and nk is the number of pixels with that intensity. (H,W = image dimensions)
  • Sometimes, we may consider a range of intensity values for one entry in the histogram, in which case rk = [rmink, rmaxk] represents an intensity bin, and nk is the number of pixels whose intensity falls within this bin.
  • Note p(rk) >= 0 always, and all the p(rk) values sum up to 1.

37

38 of 53

38

These are unnormalized histograms of two different images. That is, the entries of the histogram represent frequencies in this case without division by HW = #pixels.

39 of 53

Joint Image Histogram

  • A joint image histogram is a function of the form p(rk1,rk2) where rk1 and rk2 represent intensity bins from the two images I1 and I2 respectively.

  • p(rk1,rk2) = number of pixels (x,y) such that I1(x,y) and I2(x,y) lie in bins rk1 and rk2 respectively, divided by HW.

39

40 of 53

40

Registered images: joint histogram plot looks “streamlined”

Values in I1 (from 0 to 60)

Values in I2

from 0 to 60

I1

I2

41 of 53

41

Misaligned images: joint histogram plot looks “dispersed”

We need a method to quantify how dispersed a joint histogram actually is.

Values in I1 (from 0 to 60)

Values in I2

from 0 to 60

42 of 53

Measure of dispersion

  • Consider a discrete random variable X with normalized histogram p(X) [also called the probability mass function].
  • The entropy of X is a measure of the uncertainty in X, given by the following formula:

  • Note: entropy is a function of the normalized histogram of X.
  • Not a function of the actual values of X.
  • The entropy is always non-negative.

42

43 of 53

Entropy

  • The entropy is maximum if X has a discrete uniform distribution, i.e. p(X=x1) = p(X=x2) for all values x1 and x2 in DX. The maximum value is log(|DX|).

  • The entropy is minimum (zero) if the normalized histogram of X is a Kronecker delta function, i.e. p(X= x1) = 1 for some x1, and p(X= x2) = 0 for all x2 x1.

43

44 of 53

Joint entropy

  • The joint entropy of two random variables X and Y is given as follows:

  • Maximum entropy:

-Uniform distribution on X and Y: entropy value log(|DX||DY|) where DX, DY are the set of discrete values that X and Y can acquire

  • Minimum entropy:

-Kronecker delta (i.e. a PMF will all probability concentrated on only one entry with others being 0): entropy value 0 = non uncertainty

44

45 of 53

Joint entropy

  • Minimizing joint entropy is one method of aligning two images with different intensity profiles.

45

46 of 53

46

I1

I2: obtained by squaring the intensities of I1, and rotating I1 anticlockwise by 30 degrees.

I2 treated as moving image, I1 treated as fixed image. Joint entropy minimum occurs at -30 degrees.

47 of 53

Components of an Image Alignment Algorithm

  • Choice of a metric to optimize (joint entropy or mean squared error)

  • Choice of a motion model (only translation, translation + rotation, affine, etc.)

  • Choice of an interpolation algorithm to generate the warped image

  • Choice of an optimization algorithm (here, we just used brute force search)

47

48 of 53

Image Alignment: applications, related problems

  • Template matching

  • Image Panoramas

48

49 of 53

Template Matching

  • Look for the occurrence of a template (a smaller image) inside a larger image.
  • Example: eyes within face image

49

Templates

50 of 53

Template matching

  • Let T = template image (smaller) of size h x w and J = larger image of size H x W
  • For every pixel (x,y) in J, consider a portion zxy = J(x:x+w-1,y:y+h-1)
  • Select the portion zxy with smallest MSSD compared to T
  • In some variants, the larger image J may be rotated with respect to T.
  • In such cases, repeat the above procedure for every rotation of J from (say) -90 to +90 degrees.
  • That is for every θ from -90 to +90 degrees, let Jθ be a rotated version of J. Now consider all zxy in Jθ and report the (x,y, θ) triple that produces the least MSSD with respect to T.

50

51 of 53

Image Panoramas

51

http://cs.bath.ac.uk/brown/autostitch/autostitch.html

A camera has a limited field of view. A scene may have much larger “area” than what can be captured from a single camera.

So one can acquire multiple images, each from a different viewpoint, and you need to stitch these together to form a panorama or mosaic. The stitching involves more complicated motion models (called homography) than what you have studied in this course.

52 of 53

What we learnt..

  • Affine motion model
  • Forward and reverse image warping
  • Field of view during image alignment
  • Measures for Image alignment: sum of squared differences, normalized cross-correlation, joint entropy

52

53 of 53

What we didn’t learn

  • Complicated motion models: higher degree polynomials, non-rigid models (example: motion of an amoeba, motion of the heart during the cardiac cycle, facial expressions, etc.)

  • Efficient techniques for optimizing the measure for image alignment

53