1 of 114

Image Compression

CS 663, Ajit Rajwade

1

2 of 114

Image Compression

  • Process of converting an image file into another image file that occupies less storage space, without sacrificing its visual content as far as possible.

  • Useful for saving storage space, and transmission costs.

2

3 of 114

Types of compression

  • Lossless: the compressed image can be converted back with zero error.

  • Lossy: the compressed image cannot be converted back to the original without error. The amount of error is inversely proportional to the storage space (usually) and can be controlled by the user.

3

4 of 114

Lossless compression - examples

  • LZW method (used in Winzip)
  • Huffman encoding (part of the JPEG algorithm, although overall JPEG is lossy)
  • Run-length encoding (also part of the JPEG algorithm, although JPEG is lossy overall)

4

5 of 114

Lossy compression

  • JPEG
  • MPEG (for video)
  • MP3 (for audio)
  • Machine learning based techniques for compression of images or video (not covered in this course).

5

6 of 114

Lossy image compression

  • Compression of text files or exe files cannot afford to be lossy.
  • But some portion of image content is often not very noticeable to the human eye, especially the higher frequencies. Discarding this extraneous information leads to compression without significant loss of visual appeal.

6

7 of 114

7

Source: Article on compressive sensing by Candes and Wakin, from IEEE Signal Processing Magazine, 2008

DCT coefficients or

8 of 114

JPEG compression method

  • JPEG = Joint Photographic Experts Group
  • One of the most popular standards for compression of photographic images – widely used on the internet.
  • Widely used in digital cameras.
  • Implemented in all standard image processing software (MATLAB, OpenCV, etc.)
  • Essentially lossy (though there are some lossless variants)
  • Applicable for color as well as grayscale images.

8

9 of 114

JPEG image compression

  • User specifies a quality factor (Q) between 0 and 100 (higher Q means better quality)
  • JPEG algorithm compresses the image based on the user-provided Q.
  • Higher the Q, less will be the compression rate (but higher image quality). Lower Q will give higher compression rate (but poorer image quality).
  • JPEG can achieve 1/10 or 1/15 compression rate with little loss of quality.

9

10 of 114

JPEG image compression

  • How is the loss of quality measured?
  • As MSE between original (uncompressed) and reconstructed images:

10

11 of 114

11

Q = 100, compression rate = 1/2.6

Q = 50, compression rate = 1/15

Q = 25, compression rate = 1/23

Q = 1, compression rate = 1/144

Q = 10, compression rate = 1/46

12 of 114

Steps of the JPEG algorithm (encoder): Overview (approximate)

  1. Divide the image into non-overlapping 8 x 8 blocks and compute the discrete cosine transform (DCT) of each block. This produces a set of 64 “DCT coefficients” per block.
  2. Quantize these DCT coefficients, i.e. divide by some number and round off to nearest integer (that’s why it is lossy). Many coefficients now become 0 and need not be stored!
  3. Now run a lossless compression algorithm (typically Huffman encoding) on the entire set of integers.

We will go through each step in detail in the several slides to follow.

12

13 of 114

13

14 of 114

STEP 1: Discrete Cosine Transform (DCT)

14

15 of 114

Discrete Cosine Transform (DCT) in 1D

15

16 of 114

Discrete Cosine Transform (DCT) in 1D

16

n

u

u

n

17 of 114

DCT

  • Expresses a signal as a linear combination of cosine bases (as opposed to the complex exponentials as in the Fourier transform).
  • The coefficients of this linear combination are called DCT coefficients.
  • Is real-valued unlike the Fourier transform!
  • Discovered by Ahmed, Natarajan and Rao (1974)

17

18 of 114

18

u

n

  • DCT basis matrix is orthonormal. The dot product of any row (or column) with itself is 1. The dot product of any two different rows (or two different columns) is 0. The inverse is equal to the transpose.

  • Being orthonormal, it preserves the squared norm, i.e.

  • DCT is NOT the real part of the Fourier!

  • DCT basis matrix is NOT symmetric.

  • Columns of the DCT matrix are called the DCT basis vectors.

19 of 114

Digression: matrix view of a discrete orthonormal transform (Fourier transform used as example here)

  • Remember:

  • In matrix form, we write:

19

Fourier matrix: in any row, the value of x is fixed, the value of u ranges from 0 to M-1

20 of 114

20

21 of 114

DCT in 2D

21

The DCT matrix is this case will have size MN x MN, and it will be the Kronecker product of two DCT matrices – one of size M x M, the other of size N x N. The DCT matrix for the 2D case is also orthonormal, it is NOT symmetric and it is NOT the real part of the 2D DFT.

22 of 114

What is a 2D Fourier Matrix?

  • It is of the following form:

22

23 of 114

What is a 2D Fourier Matrix?

23

Consider a matrix A of size N1 x N2 and a matrix B of size M1 x M2. The size of their Kronecker product is given by N1 M1 x N2 M2. The Kronecker product is constructed by creating a rectangular grid of size N1 x N2 . In each cell of the grid, you place B. The copy of B in the cell at grid location (i,j) is multiplied by Aij.

24 of 114

24

This big matrix is nothing but the Kronecker product of two 2 x 2 Fourier matrices.

25 of 114

How do the DCT bases look like? (1D case)

25

26 of 114

How do the DCT bases look like? (2D-case)

26

The DCT transforms an 8×8 block of input values to a linear combination of these 64 patterns. The patterns are referred to as the two-dimensional DCT basis vectors, and the output values are referred to as transform coefficients. Here each basis vector is reshaped to form an image.

27 of 114

Again: DCT is NOT the real part of the DFT

27

Real part of DFT

DCT

28 of 114

DCT on grayscale image patches

  • The DCT coefficients of natural image patches have an amazing property.
  • It is observed that most of the signal energy is concentrated in only a small number of coefficients.
  • This is good news for compression! Store only a few coefficients, and throw away the rest.
  • The corresponding error will be small, due to the orthonormal nature of the DCT.

28

29 of 114

29

149 74 92 74 74 74 149 162

87 74 117 30 74 105 180 130

30 117 105 43 105 130 149 105

74 162 105 74 105 117 105 105

117 149 74 117 74 105 74 149

149 87 74 87 74 74 117 180

105 74 105 43 61 117 180 149

74 74 105 74 105 130 149 105

IMAGE PATCH

828.3750 -106.7827 126.4183 -8.2540 -57.3750 -0.5311 -2.1682 29.8472

-6.0004 2.5328 8.3779 -7.1377 -17.3419 -6.9695 -11.1366 22.7612

-6.5212 -56.2336 23.5930 16.3746 -5.5436 74.2016 23.1543 65.2328

17.2141 29.9058 91.3782 -19.9119 106.2541 37.4804 15.8409 -25.1828

14.1250 53.2562 -30.5477 -0.8891 30.8750 -23.2787 -9.4005 -41.8019

5.7938 -2.9468 10.0191 2.8929 -16.5056 -2.4595 -5.1284 12.7364

-3.6579 2.3417 -14.8457 -0.7304 34.6327 -10.3257 -7.3430 -5.6082

-1.7071 -9.8264 -6.4722 -1.3611 -10.5811 -4.5081 -0.4332 -20.6615

DCT COEFFICIENTS

HISTOGRAM OF DCT COEFFICIENTS

30 of 114

30

Original image

Image reconstructed after discarding all DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 10, and then computing inverse DCT

Number of DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 10 was 34,377 out of a total of 65536 (64 coefficients for each 8 x 8 patch, totally 1024 such patches). This is more than 50%. Corresponding percentage for DFT was 1%.

Image reconstructed after discarding all DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 20, and then computing inverse DCT

Number of DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 20 was 51,045 out of a total of 65536 (64 coefficients for each 8 x 8 patch, totally 1024 such patches). This is more than 78%. Corresponding percentage for DFT was 7%.

31 of 114

Why DCT? DFT and DCT comparison

31

DFT

DCT

Orthonormal

Yes

Yes

Real/complex

Complex

Real

Separable in 2D

Yes

Yes

Norm-preserving

Yes

Yes

Inverse exists

Yes

Yes

Fast implementation

Yes (fft)

Yes (uses fft)

Energy compaction for natural images

Good/Fair

Much Better

32 of 114

DCT has better energy compaction than DFT because…

32

Recall that the DFT of a sequence is equal to the Discrete Fourier Series (DFS) of a periodic extension of that sequence. In computing the DFT of a signal of length n, there is the implicit extension of several copies of the signal placed one after the other (n-point periodicity). The resultant discontinuities require several frequencies for good representation in the DFS. As against this, the discontinuities are reduced in a DCT because a reflected copy of the signal is appended to it (2n-point periodicity) before computing the DFS.

33 of 114

DCT has better energy compaction than DFT because…

33

34 of 114

DCT computational complexity

  • Naïve implementation (matrix times vector) is O(N2) for a vector of N elements.
  • You can speed this up to O(N log N) using the FFT as shown on next slide.

34

35 of 114

35

DCT of f is computed from the DFT of the sequence of double length as f.

Only the first N frequencies are picked.

Reflected version of f, appended to f.

In MATLAB, you have the commands called dct and idct (in 1D) and dct2 and idct2 (in 2D).

(with some caveats – next slide)

36 of 114

36

1

You would noticed that the constant factors sqrt(1/N) and sqrt(2/N) are missing in this expression. These factors are essential for the DCT matrix to be orthonormal, but their presence doesn’t allow for this relationship between DCT and DFT.

See code in google drive folder

37 of 114

Which is the best orthonormal basis?

  • Consider a set of M data-points (e.g. image patches in a vectorized form) represented as a linear combination of column vectors of an ortho-normal basis matrix:

  • Suppose we reconstruct each patch using only a subset of some k coefficients as follows:

37

38 of 114

Which is the best orthonormal basis?

  • For which orthonormal basis U is the following error the lowest:

38

39 of 114

Which is the best orthonormal basis?

  • The answer is the PCA basis, i.e. the set of k eigenvectors of the correlation matrix C, corresponding to the k largest eigen-values. Here is C is defined as:

39

40 of 114

PCA: separable 2D version

  • Find the correlation matrix CR of row vectors from the patches.
  • Find the correlation matrix CC of column vectors from the patches.
  • The final PCA basis is the Kronecker product of the individual bases:

40

41 of 114

But PCA is not used in JPEG, because…

  • It is image-dependent, and the basis matrix would need to be computed afresh for each image.
  • The basis matrix would need to be stored for each image.
  • It is expensive to compute – O(n3) for a vector with n elements.

The DCT is used instead!

41

42 of 114

DCT and PCA

  • DCT can be computed very fast using fft.
  • It is universal – no need to store the DCT bases explicitly.
  • DCT has very good energy compaction properties, only slightly worse than PCA.

42

43 of 114

Experiment

  • Suppose you extract M ~ 100,000 small-sized (8 x 8) patches from a set of images.
  • Compute the column-column and row-row correlation matrices.

  • Compute their eigenvectors VR and VC.
  • The eigenvectors will be very similar to the columns of the 1D-DCT matrix! (as evidenced by dot product values).
  • Now compute the Kronecker product of VR and VC and call it V. Reshape each column of V to form an image. These images will appear very similar to the DCT bases.

43

Code: See folder on google drive

44 of 114

44

0.3536 0.4904 0.4619 0.4157 0.3536 0.2778 0.1913 0.0975

0.3536 0.4157 0.1913 -0.0975 -0.3536 -0.4904 -0.4619 -0.2778

0.3536 0.2778 -0.1913 -0.4904 -0.3536 0.0975 0.4619 0.4157

0.3536 0.0975 -0.4619 -0.2778 0.3536 0.4157 -0.1913 -0.4904

0.3536 -0.0975 -0.4619 0.2778 0.3536 -0.4157 -0.1913 0.4904

0.3536 -0.2778 -0.1913 0.4904 -0.3536 -0.0975 0.4619 -0.4157

0.3536 -0.4157 0.1913 0.0975 -0.3536 0.4904 -0.4619 0.2778

0.3536 -0.4904 0.4619 -0.4157 0.3536 -0.2778 0.1913 -0.0975

DCT matrix: dctmtx command from MATLAB (see code on website)

0.3517 -0.4493 -0.4278 0.4230 0.3754 0.3247 -0.2250 -0.1245

0.3534 -0.4366 -0.2276 -0.0110 -0.3078 -0.4746 0.4732 0.2975

0.3543 -0.3101 0.1728 -0.4830 -0.3989 0.0498 -0.4299 -0.4109

0.3546 -0.1115 0.4799 -0.3005 0.3342 0.4102 0.1856 0.4761

0.3547 0.1141 0.4823 0.2944 0.3301 -0.4182 0.1745 -0.4771

0.3543 0.3104 0.1771 0.4834 -0.3977 -0.0322 -0.4308 0.4103

0.3535 0.4357 -0.2319 0.0143 -0.3009 0.4656 0.4851 -0.2975

0.3520 0.4468 -0.4328 -0.4204 0.3686 -0.3253 -0.2342 0.1261

0.3520 -0.4461 -0.4305 0.4224 0.3696 0.3247 0.2342 0.1283

0.3537 -0.4338 -0.2345 -0.0114 -0.3000 -0.4671 -0.4814 -0.3028

0.3545 -0.3086 0.1662 -0.4896 -0.4007 0.0359 0.4261 0.4102

0.3548 -0.1145 0.4763 -0.3031 0.3339 0.4198 -0.1800 -0.4713

0.3548 0.1056 0.4839 0.2926 0.3349 -0.4194 -0.1766 0.4733

0.3543 0.3043 0.1863 0.4833 -0.4028 -0.0354 0.4269 -0.4097

0.3532 0.4389 -0.2269 0.0180 -0.3008 0.4654 -0.4811 0.3037

0.3512 0.4562 -0.4300 -0.4126 0.3694 -0.3242 0.2335 -0.1319

VC: Eigenvectors of column-column correlation matrix

VR: Eigenvectors of row-row correlation matrix

1.0000 0.0007 0.0032 0.0002 0.0013 0.0001 0.0005 0.0000

0.0007 0.9970 0.0097 0.0689 0.0009 0.0322 0.0003 0.0110

0.0033 0.0106 0.9968 0.0118 0.0713 0.0004 0.0334 0.0025

0.0002 0.0718 0.0124 0.9926 0.0007 0.0927 0.0017 0.0276

0.0010 0.0001 0.0737 0.0004 0.9942 0.0008 0.0780 0.0010

0.0000 0.0261 0.0015 0.0962 0.0005 0.9934 0.0011 0.0569

0.0003 0.0007 0.0276 0.0021 0.0802 0.0010 0.9964 0.0013

0.0000 0.0076 0.0026 0.0227 0.0012 0.0596 0.0015 0.9979

1.0000 0.0002 0.0029 0.0001 0.0010 0.0000 0.0004 0.0000

0.0002 0.9965 0.0028 0.0766 0.0005 0.0314 0.0009 0.0107

0.0029 0.0025 0.9969 0.0046 0.0728 0.0017 0.0304 0.0013

0.0001 0.0795 0.0044 0.9923 0.0029 0.0916 0.0015 0.0243

0.0008 0.0003 0.0747 0.0026 0.9948 0.0061 0.0696 0.0004

0.0000 0.0246 0.0021 0.0949 0.0069 0.9940 0.0131 0.0452

0.0003 0.0004 0.0252 0.0003 0.0715 0.0137 0.9970 0.0002

0.0000 0.0076 0.0013 0.0207 0.0001 0.0476 0.0009 0.9986

Absolute value of dot products between the columns of DCT matrix and columns of VR (left) and VC (right)

45 of 114

45

DCT bases

64 columns of V – each reshaped to form an 8 x 8 image, and rescaled to fit in the 0-1 range. Notice the similarity between the DCT bases and the columns of V. Again, V is the Kronecker product of VR and VC.

46 of 114

DCT and PCA

  • DCT is very close to PCA when the patches come from what is called as a stationary first order Markov process, i.e.

46

47 of 114

DCT and PCA

  • One can show that the eigenvectors of the correlation matrix of the form seen on the previous slide are the DCT basis vectors!

  • Natural images approximate this first order Markov model, and hence DCT is almost as good as PCA for compression of a large ensemble of image patches.

  • DCT has the advantage of being a universal basis and also the DCT coefficients are more efficiently computable than PCA coefficients (because DCT computation uses FFT).

47

48 of 114

48

1.0000 0.9902 0.9795 0.9733 0.9682 0.9639 0.9604 0.9570

0.9902 1.0005 0.9908 0.9795 0.9734 0.9684 0.9643 0.9605

0.9795 0.9908 1.0010 0.9908 0.9796 0.9735 0.9689 0.9646

0.9733 0.9795 0.9908 1.0005 0.9904 0.9793 0.9735 0.9686

0.9682 0.9734 0.9796 0.9904 1.0004 0.9903 0.9794 0.9734

0.9639 0.9684 0.9735 0.9793 0.9903 1.0001 0.9903 0.9793

0.9604 0.9643 0.9689 0.9735 0.9794 0.9903 1.0004 0.9904

0.9570 0.9605 0.9646 0.9686 0.9734 0.9793 0.9904 1.0002

1.0000 0.9888 0.9770 0.9704 0.9648 0.9599 0.9554 0.9510

0.9888 1.0004 0.9891 0.9768 0.9703 0.9646 0.9596 0.9548

0.9770 0.9891 1.0004 0.9886 0.9764 0.9698 0.9640 0.9587

0.9704 0.9768 0.9886 0.9994 0.9878 0.9755 0.9687 0.9627

0.9648 0.9703 0.9764 0.9878 0.9986 0.9870 0.9746 0.9676

0.9599 0.9646 0.9698 0.9755 0.9870 0.9978 0.9861 0.9734

0.9554 0.9596 0.9640 0.9687 0.9746 0.9861 0.9967 0.9847

0.9510 0.9548 0.9587 0.9627 0.9676 0.9734 0.9847 0.9951

CR/CR(1,1) -

Notice it can be approximated by the form shown two slides before, with ρ~0.99

CC/CC(1,1) -

Notice it can be approximated by the form shown two slides before, with ρ~0.9888

More results from the previous experiment. See code:

49 of 114

Computation of DCT coefficients in JPEG

  • Before computation, the value 128 (midpoint of the range 0 to 255) is subtracted from every pixel value.
  • This changes the range of intensity values from 0 to 255, to -128 to 127.
  • This also changes the range of DCT coefficient values from 0 to 2048, to -1024 to +1024.

49

50 of 114

STEP 2: Quantization

50

51 of 114

Quantization

  • The DCT coefficients are floating point numbers and storing them in a file will produce no compression. So they need to be quantized.
  • The human eye is not sensitive to changes in the higher frequency content.
  • So we can have cruder quantization for the higher frequency coefficients and a finer one for the lower frequency coefficients.

51

52 of 114

Quantization

  • Quantization is performed by dividing the DCT coefficient matrix element-wise by a quantization matrix and rounding off to the nearest integer.
  • The quantization matrix on the next slide is for quality factor Q = 50.
  • Matrices for lower Q values are obtained by scaling the Q = 50 matrix with a constant 50/Q – which increases the values in the quantization matrix.
  • Matrices for higher Q values are obtained by scaling the Q = 50 matrix with a constant 50/Q – which decreases the values in the quantization matrix.

52

53 of 114

53

Quantization matrix for Q = 50: notice the higher values in the matrix for higher frequency coefficients

M

Most of the values in B are 0! They need not be stored! Only the non-zero values in B will be stored!

54 of 114

How was this quantization matrix picked?

  • The quantization error is given by

where .

  • The maximum possible value of the error is given by Muv/2.
  • Psychophysical studies have been performed to find threshold values of DCT coefficients, i.e. for each frequency (u,v), these studies have determined the smallest DCT coefficient value that yielded a visible signal. This threshold is called tuv.
  • We set Muv = 2tuv so that the errors remain invisible.

54

55 of 114

STEP 3: Lossless compression steps: Huffman encoding and Run length encoding

55

56 of 114

56

57 of 114

Huffman encoding

  • Input: a set of non-zero quantized DCT coefficients from all the different blocks of the image (values lying between -1024 to +1024).
  • Output: a set of encoded coefficients with length (in terms of number of bits) less than that of the original set.
  • Principles behind Huffman encoding:
  • Encode the more frequently occurring coefficients with fewer bits. Encode the rarely occurring coefficients with more bits. This will reduce the average bit-length.
  • Ensure that the encoding for no coefficient is a strict prefix of the encoding of any other coefficient (to be explained on next slide). This is called a “prefix-free code”.

57

58 of 114

Huffman encoding example

  • Consider a set of alphabets {a,e,q}. Let the frequency of an alphabet x be denoted as p(x).
  • Assume p(e) > p(a) > p(q) [actually true in the English language].
  • Consider the following code-word assignment: e – 0, a – 1, q – 01 (note: we assigned more bits for q). Now consider the encoded stream: 001. It can be interpreted as ‘eea’ or ‘eq’.
  • The reason for this ambiguity is that the code for ‘e’ is a strict prefix of the code for ‘q’.
  • For unambiguous decoding, we need prefix-free codes. Example e – 0, a – 10, q – 11 is one example of a prefix-free code.

58

59 of 114

Huffman encoding example

  • The Huffman encoding algorithm asks the following question:

Given a set of n alphabets A = {ai} with corresponding frequencies {p(ai)} (each frequency lies from 0 to 1), what prefix-free encoding yields the least average bit length? That is, which set of code-words {λ(ai)} will minimize

59

Length of the code-word λ(ai)

60 of 114

Algorithm

  1. Sort alphabets in increasing order of frequency. Create a leaf node from each alphabet. These leaf nodes will belong to a binary tree called the Huffman tree.
  2. Combine the two lowest frequency nodes s1 and s2 to create a parent node s12. s1 and s2 will be the left and right child of s12. The frequency of s12 is given by p(s12) = p(s1) + p(s2).
  3. Label the edge from s12 to s1 with a ‘0’ and the edge from s12 to s2 with a ‘1’.
  4. Delete s1 and s2 from the sorted list of alphabets and insert the node s12, i.e. root node of the tree (s12,s1,s2) in the correct place depending on the value of p(s12).
  5. Repeat steps 2 to 4 until there is only one node in the list. This will be the root node of the final Huffman tree.
  6. Traverse the tree from the root node until each leaf and collect all the binary symbols along every edge into a string. This string will form the code word for that symbol.

60

61 of 114

Example

61

1

2

3

4

1

0

1

0

0

1

0

0

1

1

1

0

1

1

1

1

0

0

0

A = 0�B = 100�C = 1010�D = 1011�R = 11

This is a prefix-free code. No leaf node is on the path to any other node.

0.4

0.2

0.1

0.1

0.2

0.4

0.2

0.1

0.1

0.2

0.4

0.2

0.1

0.1

0.2

0.4

0.2

0.1

0.1

0.2

0.2

0.4

0.6

1.00

0.4

0.2

0.2

0.4

0.6

0

0.2

62 of 114

62

1

1

1

1

0

0

0

0.4

0.2

0.1

0.1

0.2

0.2

0.4

0.6

1.00

A = 0�B = 100�C = 1010�D = 1011�R = 11

Input string: RABBCDR

Encoded bit stream:

11-0-100-100-1010-1011-11

Decoded string: RABBCDR

To perform encoding, we maintain an initially empty encoded bit stream. Read a symbol from the input, traverse the Huffman tree from root node to the leaf node for that symbol, collecting all the bit labels on the traversed path, and appending them to the encoded bit stream. Repeat this for every symbol from the input. Example: for R, we write 11.

To perform decoding, read the encoded bit stream, and traverse the Huffman tree from the root node toward a leaf node, following the path as indicated by the bit stream. For example, if you read in 11, you would travel to the leaf node R. When you reach a leaf node, append its associated symbol to the decoded output. Go back to the root node and traverse the tree as per the remaining bits from the encoded bit stream.

0

Average code length here

= 1 * p(A) + 3 * p(B) + 4 * p(C) + 4*p(D) + 2 *p(R) = 0.4 + 0.6 + 0.4 + 0.4 + 0.4 = 2.2.

63 of 114

About the algorithm

  • This is a greedy algorithm, which is guaranteed to produce the prefix-free code with minimal average length (proof beyond the scope of the course).
  • There could be multiple sets of code words with the same average bit length. Huffman encoding produces one of them, depending on the order in which the nodes were combined, and the convention for labeling the edges with a 0 or a 1.

63

64 of 114

Huffman Trees and Entropy

  • The average code length as computed by this algorithm satisfies

64

Entropy of the random variable (set of alphabets) – measure of uncertainty of the random variable, or the measure of how much a random variable surprises you, or the average number of bits required to store a random variable.

Recall: Entropy is minimum in the case where the random variable takes on only one value. It is the maximum when it can take on any one of some k different values, each with equal probability.

Note: Entropy is also the average number of bits required to encode the values of the random variable. It is not possible to code the intensity values of an image with fewer bits per pixel than its entropy.

65 of 114

Zig-zag ordering

  • The quantized DCT coefficients are arranged now in a zigzag order as follows. The zig-zag pattern leaves a bunch of consecutive zeros at the end.

65

−26

−3

0

−3

−2

−6

2

−4

1

−3

1

1

5

1

2

−1

1

−1

2

0

0

0

0

0

−1

−1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

66 of 114

Run length encoding

  • The non-zero re-ordered quantized DCT coefficients (except for the DC coefficient) are written down in the following format:

66

  • run-length (number of zeros before this coefficient),
  • size (no. of bits to store the Huffman code for the coefficient),
  • actual Huffman code of the coefficient

4 bits each

We refer to the above set as a triple. In case there are more than 15 zeros in between 2 non-zero AC coefficients, a special triple is inserted. That triple is (15,0,0). If there are a large number of trailing zeros at the end of a block, we but in an “end of block” triple given as (0,0).

67 of 114

67

−26

−3

0

−3

−2

−6

2

−4

1

−3

1

1

5

1

2

−1

1

−1

2

0

0

0

0

0

−1

−1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

(0, 2)(-3); (1, 2)(-3); (0, 2)(-2); (0, 3)(-6); (0, 2)(2); (0, 2)(-4); (0, 1)(1); (0, 2)(-3); (0, 1)(1);

(0, 1)(1); (0, 3)(5); (0, 1)(1); (0, 2)(2); (0, 1)(-1); (0, 1)(1); (0, 2)(2); (5, 1)(-1); (0, 1)(-1); (0, 0).

68 of 114

Encoding DC coefficients

  • The difference between the DC coefficient of the current and previous patch is encoded and stored.
  • These difference values are Huffman encoded using a separate table (different from the Huffman table used for AC coefficients).
  • The DC coefficient of the first patch is stored explicitly.

68

69 of 114

JPEG encoded file

  • Begins with a header that contains information such as size of file, whether color or grayscale, the table of different alphabets (i.e. DCT coefficient values and their Huffman codes) and the quantization matrix.
  • This is followed by a bit stream containing triples of the form: (run-length, the length of the Huffman code of the coefficient, and the Huffman code for the coefficient).

69

70 of 114

JPEG DECODER

70

71 of 114

JPEG decoding

  • Perform Huffman decoding and obtain the DCT coefficients (AC).
  • Multiply the AC coefficients point-wise with the entries in the quantization matrix.
  • Compute the DC coefficients for each patch using the differences between the DC coefficients of successive patches. Multiply by the appropriate entry from the quantization matrix.
  • Reconstruct the image patches of size 8 x 8 using the inverse DCT. Add 128 to the intensity values in the patch.
  • Note: During JPEG encoding, the round-off errors from the quantization step can never be recovered again. Hence JPEG is overall a lossy algorithm.

71

72 of 114

JPEG for color images

72

73 of 114

JPEG for color images

  • The RGB values are converted to the YCbCr color space using:

  • Encode the Y, Cb, and Cr channels separately, using the “grayscale” JPEG algorithm on each channel. The Cb and Cr channels (the chrominance channels) are down-sampled by a factor of 2 in X and Y direction to further save storage space.
  • The Y channel (luminance) is not down-sampled. This is because the human eye is much more sensitive to luminance than to chrominance information.

73

74 of 114

PCA on RGB values

  • Why can you not separately compress the R,G,B images – instead of converting to Y, Cb, Cr?
  • The answer lies in PCA!
  • Suppose you take N color images and extract RGB values of each pixel (3 x 1 vector at each location).
  • Now, suppose you build an eigenspace out of this – you get 3 eigenvectors, each corresponding to 3 different eigenvalues.

74

75 of 114

PCA on RGB values

  • The eigenvectors will look typically as follows:

0.5952 0.6619 0.4556

0.6037 0.0059 -0.7972

0.5303 -0.7496 0.3961

  • Exact numbers are not important, but the first eigenvector is like an average of RGB. It is called as the Luminance Channel (Y). It is similar to the intensity in the HSI space.

75

76 of 114

PCA on RGB values

  • The second eigenvector is like Y-B, and the third is like Y-G. These are called as the Chrominance Channels.
  • The Y-Cb-Cr color space is related to this PCA-based space (though there are some details in the relative weightings of RGB to get Luminance and Chrominance – denoted by Cb and Cr).
  • The values in the three channels Y, Cb and Cr are decorrelated, similar to the values projected onto the PCA-based channels.

76

77 of 114

PCA on RGB values

  • Why does PCA produce decorrelated values (i.e. why are the values of the eigencoefficients decorrelated)?

  • Recall: in PCA, we built the correlation matrix C from N original data-points {xi}, 1 ≤ iN.

  • Recall that .

  • The eigencoefficients are given as {αi}, 1 ≤ iN where xi = i where C = VΛVT.

  • The correlation matrix of the eigencoefficients is given by:

77

78 of 114

PCA on RGB values

  • The correlation matrix of the eigencoefficients is given by:

  • This is a diagonal matrix (of eigenvalues).

  • Which means that the different elements of the vector of eigencoefficients are decorrelated.

78

79 of 114

PCA on RGB values

  • Why is it important to have decorrelated values in compression?

  • Because if the variables were not decorrelated, any operations you perform on one variable have an effect on the other variables.

  • So if you compressed the R image, G image and B image separately, a change in the R values (due to quantization) unintentionally affects the G and B values (and vice versa).

  • To prevent this, you convert the color values from RGB to a decorrelated color space such as YCbCr.

79

80 of 114

PCA on RGB values

  • The luminance channel (Y) carries most information from the point of view of human perception, and the human eye is less sensitive to changes in chrominance.
  • This fact can be used to assign coarser quantization levels (i.e. fewer bits) for storing or transmitting Cb and Cr values as compared to the Y channel. This improves the compression rate.
  • The JPEG standard for color image compression uses the YCbCr format. For an image of size M x N x 3, it stores Y with full resolution (i.e. as an M x N image), and Cb and Cr with 25% resolution, i.e. as M/2 x N/2 images.

80

81 of 114

81

82 of 114

82

83 of 114

83

84 of 114

84

The variances of the three eigen-coefficient values:

8411, 159.1, 71.7

85 of 114

85

86 of 114

86

87 of 114

87

88 of 114

88

RGB and its corresponding Y, Cb, Cr channels

89 of 114

89

90 of 114

90

Down-sampling of Cb and Cr in X and Y directions by a factor of 2

No down-sampling of chrominance or luminance channels

Cb channel under different down-sampling factors

Down-sampling of Cb and Cr only in X direction by a factor of 2

91 of 114

Modes of JPEG compression

  • Sequential: encoding and decoding of patches takes place in left to right, top to bottom order.
  • Progressive: encoding and decoding in multiple scans, each one with finer quantization levels.
  • Hierarchical: encoding and decoding performed at different scales.

91

92 of 114

92

Commonly seen in web applications (e.g.: Facebook)

93 of 114

JPEG artifacts

  • Seam artifacts at patch boundaries (more prominent for lower Q values).
  • Ringing artifacts around edges.
  • Some loss of edge and textural detail.
  • Color artifacts

93

94 of 114

Video Compression

94

95 of 114

Need for video compression

  • Huge data – typical HDTV has frames of size 1920 x 1080 and 30 fps frame-rate. That is more than 1 GB per second.
  • Network channel bandwidths are limited (around 20 Mbps).
  • We need large rates of compression (around 1:80)!

95

96 of 114

Motion JPEG

  • Encode each frame using JPEG.
  • That will yield only around 1:10 compression.
  • This makes use of only spatial redundancy, no temporal redundancy.

96

97 of 114

MPEG (Motion Pictures Expert Group)

  • Heavy use of temporal redundancy!
  • Uses a process called predictive coding.
  • Consider pixel f(x,y,t) in frame t. We try to predict its value, denoted as g(x,y,t), using a linear combinations of the values from previous k frames, i.e. f(x,y,t-k),…,f(x,y,t-1).
  • We simply store the error e(x,y,t) = f(x,y,t)-g(x,y,t).

97

98 of 114

Differential coding (First order predictive coding)

  • In this method, k = 1, i.e. you encode the differences between consecutive frames, i.e. e(x,y,t) = f(x,y,t) – f(x,y,t-1).
  • For most frames, the errors are highly sparse.

98

99 of 114

99

100 of 114

Errors are not always sparse ☹

  • Exceptions: (1) camera zoom-in and zoom-out, (2) sudden changes in viewpoint or scene content, or fade-in/fade-out effects. In such cases, the errors will be large!

100

101 of 114

Motion compensation

  • For each macro-block (typically 16 x 16 or 8 x 8 in size) in the frame to be encoded, find the most similar macro-block in a reference frame (which could be the previous frame, but not necessarily so).
  • The difference between the pixel locations of the top-left corner of the two macro-blocks is called the motion vector.
  • The motion vector is considered to be constant within a macro-block.
  • Search for similar macro-blocks is restricted to a small search window around the original macro-block.
  • The search window is rectangular – broader than taller (why?).

101

102 of 114

102

103 of 114

Motion Compensation

  • For many macro-blocks, the motion vectors will be 0.
  • The search for similar macro-blocks is performed at a sub-pixel level (1/2 pixel or ¼ pixel) for more accuracy. In this case, image intensity values need to be interpolated.
  • Macro-block similarity measure is one of the following:

103

In color videos, only luminance (i.e. Y) channel is used in similarity measure

104 of 114

Motion compensation

  • Many a time, there are more than one similar block. In such cases, the spatially closest block is chosen.
  • Note: we are not concerned with the accuracy of the motion estimate. It is only a means towards the larger goal – of compression.
  • The inter-frame differences are computed after motion compensation. This hugely improves the sparsity of these differences.
  • In other words, don’t blindly compute the difference between macro-blocks at corresponding locations in frames t and t-1. Compute the difference between the current macro-block in frame t and its most similar match in frame t-1.

104

105 of 114

105

106 of 114

106

107 of 114

107

108 of 114

Motion Compensation

  • For each macro-block (typically 16 x 16 or 8 x 8 in size) in the frame to be encoded, find the most similar macro-block in a reference frame.
  • If the reference frame is the previous one or the previous reference frame (see two slides later for more details), the current frame (the one being encoded) is called the P-frame.
  • If the reference frame is a combination of the previous frame and the next one, the current frame is called the B-frame.
  • A B-frame can use two motion vectors – one for previous and one for next frame.

108

109 of 114

109

Example: why do we need B-frames?

110 of 114

No Motion Compensation here!

  • For some frames that rest on shot-boundaries (i.e. sudden changes in content), there is no advantage to performing motion compensation.
  • Such frames are called I-frames (independent frames). These usually act as reference frames for other frames to be differentially encoded.
  • I-frames can be detected by the presence of very frequently low similarity values during search for macro-blocks.

110

111 of 114

MPEG encoder

111

Note:

  • YUV is a color-space very similar to YCbCr.
  • MV = motion vector

112 of 114

MPEG encoder

  • The I-frames are encoded using JPEG.
  • The B-frames and P-frames are also encoded using JPEG, but the DCT is computed on macro-blocks from the motion-compensated residual image as follows:
  • We have already computed motion vectors w.r.t. the reference frame.
  • Compute a residual image by calculating differences between macro-blocks in the current frame and the most-similar matching macro-blocks (as given by the motion vector) from the reference image. This is called motion-compensated frame differencing.
  • Note: the motion vector for the macro-block also needs to be stored. For this, motion-vectors from several macro-blocks are collected together and Huffman-encoded. Only the non-zero motion vectors are encoded.

112

113 of 114

113

Display Order and Transmission Order (or order in which frames are compressed) may be different!

114 of 114

MPEG decoder

114

Note:

  • YUV is a color-space very similar to YCbCr.