JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 114

Image Compression

CS 663, Ajit Rajwade

2 of 114

Image Compression

Process of converting an image file into another image file that occupies less storage space, without sacrificing its visual content as far as possible.

Useful for saving storage space, and transmission costs.

3 of 114

Types of compression

Lossless: the compressed image can be converted back with zero error.

Lossy: the compressed image cannot be converted back to the original without error. The amount of error is inversely proportional to the storage space (usually) and can be controlled by the user.

4 of 114

Lossless compression - examples

LZW method (used in Winzip)
Huffman encoding (part of the JPEG algorithm, although overall JPEG is lossy)
Run-length encoding (also part of the JPEG algorithm, although JPEG is lossy overall)

5 of 114

Lossy compression

JPEG
MPEG (for video)
MP3 (for audio)
Machine learning based techniques for compression of images or video (not covered in this course).

6 of 114

Lossy image compression

Compression of text files or exe files cannot afford to be lossy.
But some portion of image content is often not very noticeable to the human eye, especially the higher frequencies. Discarding this extraneous information leads to compression without significant loss of visual appeal.

7 of 114

Source: Article on compressive sensing by Candes and Wakin, from IEEE Signal Processing Magazine, 2008

DCT coefficients or

8 of 114

JPEG compression method

JPEG = Joint Photographic Experts Group
One of the most popular standards for compression of photographic images – widely used on the internet.
Widely used in digital cameras.
Implemented in all standard image processing software (MATLAB, OpenCV, etc.)
Essentially lossy (though there are some lossless variants)
Applicable for color as well as grayscale images.

9 of 114

JPEG image compression

User specifies a quality factor (Q) between 0 and 100 (higher Q means better quality)
JPEG algorithm compresses the image based on the user-provided Q.
Higher the Q, less will be the compression rate (but higher image quality). Lower Q will give higher compression rate (but poorer image quality).
JPEG can achieve 1/10 or 1/15 compression rate with little loss of quality.

10 of 114

JPEG image compression

How is the loss of quality measured?
As MSE between original (uncompressed) and reconstructed images:

11 of 114

Q = 100, compression rate = 1/2.6

Q = 50, compression rate = 1/15

Q = 25, compression rate = 1/23

Q = 1, compression rate = 1/144

Q = 10, compression rate = 1/46

http://en.wikipedia.org/wiki/JPEG

12 of 114

Steps of the JPEG algorithm (encoder): Overview (approximate)

Divide the image into non-overlapping 8 x 8 blocks and compute the discrete cosine transform (DCT) of each block. This produces a set of 64 “DCT coefficients” per block.
Quantize these DCT coefficients, i.e. divide by some number and round off to nearest integer (that’s why it is lossy). Many coefficients now become 0 and need not be stored!
Now run a lossless compression algorithm (typically Huffman encoding) on the entire set of integers.

We will go through each step in detail in the several slides to follow.

13 of 114

14 of 114

STEP 1: Discrete Cosine Transform (DCT)

15 of 114

Discrete Cosine Transform (DCT) in 1D

16 of 114

Discrete Cosine Transform (DCT) in 1D

17 of 114

DCT

Expresses a signal as a linear combination of cosine bases (as opposed to the complex exponentials as in the Fourier transform).
The coefficients of this linear combination are called DCT coefficients.
Is real-valued unlike the Fourier transform!
Discovered by Ahmed, Natarajan and Rao (1974)

18 of 114

DCT basis matrix is orthonormal. The dot product of any row (or column) with itself is 1. The dot product of any two different rows (or two different columns) is 0. The inverse is equal to the transpose.

Being orthonormal, it preserves the squared norm, i.e.

DCT is NOT the real part of the Fourier!

DCT basis matrix is NOT symmetric.

Columns of the DCT matrix are called the DCT basis vectors.

19 of 114

Digression: matrix view of a discrete orthonormal transform (Fourier transform used as example here)

Remember:

In matrix form, we write:

Fourier matrix: in any row, the value of x is fixed, the value of u ranges from 0 to M-1

20 of 114

21 of 114

DCT in 2D

The DCT matrix is this case will have size MN x MN, and it will be the Kronecker product of two DCT matrices – one of size M x M, the other of size N x N. The DCT matrix for the 2D case is also orthonormal, it is NOT symmetric and it is NOT the real part of the 2D DFT.

22 of 114

What is a 2D Fourier Matrix?

It is of the following form:

23 of 114

What is a 2D Fourier Matrix?

Consider a matrix A of size N₁ x N₂ and a matrix B of size M₁ x M₂. The size of their Kronecker product is given by N₁ M₁ x N₂ M₂. The Kronecker product is constructed by creating a rectangular grid of size N₁ x N₂ . In each cell of the grid, you place B. The copy of B in the cell at grid location (i,j) is multiplied by A_ij.

24 of 114

This big matrix is nothing but the Kronecker product of two 2 x 2 Fourier matrices.

25 of 114

How do the DCT bases look like? (1D case)

26 of 114

How do the DCT bases look like? (2D-case)

The DCT transforms an 8×8 block of input values to a linear combination of these 64 patterns. The patterns are referred to as the two-dimensional DCT basis vectors, and the output values are referred to as transform coefficients. Here each basis vector is reshaped to form an image.

http://en.wikipedia.org/wiki/JPEG

27 of 114

Again: DCT is NOT the real part of the DFT

http://en.wikipedia.org/wiki/JPEG

Real part of DFT

DCT

28 of 114

DCT on grayscale image patches

The DCT coefficients of natural image patches have an amazing property.
It is observed that most of the signal energy is concentrated in only a small number of coefficients.
This is good news for compression! Store only a few coefficients, and throw away the rest.
The corresponding error will be small, due to the orthonormal nature of the DCT.

29 of 114

149 74 92 74 74 74 149 162

87 74 117 30 74 105 180 130

30 117 105 43 105 130 149 105

74 162 105 74 105 117 105 105

117 149 74 117 74 105 74 149

149 87 74 87 74 74 117 180

105 74 105 43 61 117 180 149

74 74 105 74 105 130 149 105

IMAGE PATCH

828.3750 -106.7827 126.4183 -8.2540 -57.3750 -0.5311 -2.1682 29.8472

-6.0004 2.5328 8.3779 -7.1377 -17.3419 -6.9695 -11.1366 22.7612

-6.5212 -56.2336 23.5930 16.3746 -5.5436 74.2016 23.1543 65.2328

17.2141 29.9058 91.3782 -19.9119 106.2541 37.4804 15.8409 -25.1828

14.1250 53.2562 -30.5477 -0.8891 30.8750 -23.2787 -9.4005 -41.8019

5.7938 -2.9468 10.0191 2.8929 -16.5056 -2.4595 -5.1284 12.7364

-3.6579 2.3417 -14.8457 -0.7304 34.6327 -10.3257 -7.3430 -5.6082

-1.7071 -9.8264 -6.4722 -1.3611 -10.5811 -4.5081 -0.4332 -20.6615

DCT COEFFICIENTS

HISTOGRAM OF DCT COEFFICIENTS

30 of 114

Original image

Image reconstructed after discarding all DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 10, and then computing inverse DCT

Number of DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 10 was 34,377 out of a total of 65536 (64 coefficients for each 8 x 8 patch, totally 1024 such patches). This is more than 50%. Corresponding percentage for DFT was 1%.

Image reconstructed after discarding all DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 20, and then computing inverse DCT

Number of DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 20 was 51,045 out of a total of 65536 (64 coefficients for each 8 x 8 patch, totally 1024 such patches). This is more than 78%. Corresponding percentage for DFT was 7%.

31 of 114

Why DCT? DFT and DCT comparison

	DFT	DCT
Orthonormal	Yes	Yes
Real/complex	Complex	Real
Separable in 2D	Yes	Yes
Norm-preserving	Yes	Yes
Inverse exists	Yes	Yes
Fast implementation	Yes (fft)	Yes (uses fft)
Energy compaction for natural images	Good/Fair	Much Better

32 of 114

DCT has better energy compaction than DFT because…

Recall that the DFT of a sequence is equal to the Discrete Fourier Series (DFS) of a periodic extension of that sequence. In computing the DFT of a signal of length n, there is the implicit extension of several copies of the signal placed one after the other (n-point periodicity). The resultant discontinuities require several frequencies for good representation in the DFS. As against this, the discontinuities are reduced in a DCT because a reflected copy of the signal is appended to it (2n-point periodicity) before computing the DFS.

33 of 114

DCT has better energy compaction than DFT because…

34 of 114

DCT computational complexity

Naïve implementation (matrix times vector) is O(N²) for a vector of N elements.
You can speed this up to O(N log N) using the FFT as shown on next slide.

35 of 114

DCT of f is computed from the DFT of the sequence of double length as f.

Only the first N frequencies are picked.

Reflected version of f, appended to f.

In MATLAB, you have the commands called dct and idct (in 1D) and dct2 and idct2 (in 2D).

(with some caveats – next slide)

36 of 114

You would noticed that the constant factors sqrt(1/N) and sqrt(2/N) are missing in this expression. These factors are essential for the DCT matrix to be orthonormal, but their presence doesn’t allow for this relationship between DCT and DFT.

http://en.wikipedia.org/wiki/Discrete_cosine_transform

http://www.ecsutton.ece.ufl.edu/dip/handouts/dct.pdf

See code in google drive folder

37 of 114

Which is the best orthonormal basis?

Consider a set of M data-points (e.g. image patches in a vectorized form) represented as a linear combination of column vectors of an ortho-normal basis matrix:

Suppose we reconstruct each patch using only a subset of some k coefficients as follows:

38 of 114

Which is the best orthonormal basis?

For which orthonormal basis U is the following error the lowest:

39 of 114

Which is the best orthonormal basis?

The answer is the PCA basis, i.e. the set of k eigenvectors of the correlation matrix C, corresponding to the k largest eigen-values. Here is C is defined as:

40 of 114

PCA: separable 2D version

Find the correlation matrix C_R of row vectors from the patches.
Find the correlation matrix C_C of column vectors from the patches.
The final PCA basis is the Kronecker product of the individual bases:

41 of 114

But PCA is not used in JPEG, because…

It is image-dependent, and the basis matrix would need to be computed afresh for each image.
The basis matrix would need to be stored for each image.
It is expensive to compute – O(n³) for a vector with n elements.

The DCT is used instead!

42 of 114

DCT and PCA

DCT can be computed very fast using fft.
It is universal – no need to store the DCT bases explicitly.
DCT has very good energy compaction properties, only slightly worse than PCA.

43 of 114

Experiment

Suppose you extract M ~ 100,000 small-sized (8 x 8) patches from a set of images.
Compute the column-column and row-row correlation matrices.

Compute their eigenvectors V_R and V_C.
The eigenvectors will be very similar to the columns of the 1D-DCT matrix! (as evidenced by dot product values).
Now compute the Kronecker product of V_R and V_C and call it V. Reshape each column of V to form an image. These images will appear very similar to the DCT bases.

Code: See folder on google drive

44 of 114

0.3536 0.4904 0.4619 0.4157 0.3536 0.2778 0.1913 0.0975

0.3536 0.4157 0.1913 -0.0975 -0.3536 -0.4904 -0.4619 -0.2778

0.3536 0.2778 -0.1913 -0.4904 -0.3536 0.0975 0.4619 0.4157

0.3536 0.0975 -0.4619 -0.2778 0.3536 0.4157 -0.1913 -0.4904

0.3536 -0.0975 -0.4619 0.2778 0.3536 -0.4157 -0.1913 0.4904

0.3536 -0.2778 -0.1913 0.4904 -0.3536 -0.0975 0.4619 -0.4157

0.3536 -0.4157 0.1913 0.0975 -0.3536 0.4904 -0.4619 0.2778

0.3536 -0.4904 0.4619 -0.4157 0.3536 -0.2778 0.1913 -0.0975

DCT matrix: dctmtx command from MATLAB (see code on website)

0.3517 -0.4493 -0.4278 0.4230 0.3754 0.3247 -0.2250 -0.1245

0.3534 -0.4366 -0.2276 -0.0110 -0.3078 -0.4746 0.4732 0.2975

0.3543 -0.3101 0.1728 -0.4830 -0.3989 0.0498 -0.4299 -0.4109

0.3546 -0.1115 0.4799 -0.3005 0.3342 0.4102 0.1856 0.4761

0.3547 0.1141 0.4823 0.2944 0.3301 -0.4182 0.1745 -0.4771

0.3543 0.3104 0.1771 0.4834 -0.3977 -0.0322 -0.4308 0.4103

0.3535 0.4357 -0.2319 0.0143 -0.3009 0.4656 0.4851 -0.2975

0.3520 0.4468 -0.4328 -0.4204 0.3686 -0.3253 -0.2342 0.1261

0.3520 -0.4461 -0.4305 0.4224 0.3696 0.3247 0.2342 0.1283

0.3537 -0.4338 -0.2345 -0.0114 -0.3000 -0.4671 -0.4814 -0.3028

0.3545 -0.3086 0.1662 -0.4896 -0.4007 0.0359 0.4261 0.4102

0.3548 -0.1145 0.4763 -0.3031 0.3339 0.4198 -0.1800 -0.4713

0.3548 0.1056 0.4839 0.2926 0.3349 -0.4194 -0.1766 0.4733

0.3543 0.3043 0.1863 0.4833 -0.4028 -0.0354 0.4269 -0.4097

0.3532 0.4389 -0.2269 0.0180 -0.3008 0.4654 -0.4811 0.3037

0.3512 0.4562 -0.4300 -0.4126 0.3694 -0.3242 0.2335 -0.1319

V_C: Eigenvectors of column-column correlation matrix

V_R: Eigenvectors of row-row correlation matrix

1.0000 0.0007 0.0032 0.0002 0.0013 0.0001 0.0005 0.0000

0.0007 0.9970 0.0097 0.0689 0.0009 0.0322 0.0003 0.0110

0.0033 0.0106 0.9968 0.0118 0.0713 0.0004 0.0334 0.0025

0.0002 0.0718 0.0124 0.9926 0.0007 0.0927 0.0017 0.0276

0.0010 0.0001 0.0737 0.0004 0.9942 0.0008 0.0780 0.0010

0.0000 0.0261 0.0015 0.0962 0.0005 0.9934 0.0011 0.0569

0.0003 0.0007 0.0276 0.0021 0.0802 0.0010 0.9964 0.0013

0.0000 0.0076 0.0026 0.0227 0.0012 0.0596 0.0015 0.9979

1.0000 0.0002 0.0029 0.0001 0.0010 0.0000 0.0004 0.0000

0.0002 0.9965 0.0028 0.0766 0.0005 0.0314 0.0009 0.0107

0.0029 0.0025 0.9969 0.0046 0.0728 0.0017 0.0304 0.0013

0.0001 0.0795 0.0044 0.9923 0.0029 0.0916 0.0015 0.0243

0.0008 0.0003 0.0747 0.0026 0.9948 0.0061 0.0696 0.0004

0.0000 0.0246 0.0021 0.0949 0.0069 0.9940 0.0131 0.0452

0.0003 0.0004 0.0252 0.0003 0.0715 0.0137 0.9970 0.0002

0.0000 0.0076 0.0013 0.0207 0.0001 0.0476 0.0009 0.9986

Absolute value of dot products between the columns of DCT matrix and columns of V_R (left) and V_C (right)

45 of 114

DCT bases

64 columns of V – each reshaped to form an 8 x 8 image, and rescaled to fit in the 0-1 range. Notice the similarity between the DCT bases and the columns of V. Again, V is the Kronecker product of V_R and V_C.

46 of 114

DCT and PCA

DCT is very close to PCA when the patches come from what is called as a stationary first order Markov process, i.e.

47 of 114

DCT and PCA

One can show that the eigenvectors of the correlation matrix of the form seen on the previous slide are the DCT basis vectors!

Natural images approximate this first order Markov model, and hence DCT is almost as good as PCA for compression of a large ensemble of image patches.

DCT has the advantage of being a universal basis and also the DCT coefficients are more efficiently computable than PCA coefficients (because DCT computation uses FFT).

48 of 114

1.0000 0.9902 0.9795 0.9733 0.9682 0.9639 0.9604 0.9570

0.9902 1.0005 0.9908 0.9795 0.9734 0.9684 0.9643 0.9605

0.9795 0.9908 1.0010 0.9908 0.9796 0.9735 0.9689 0.9646

0.9733 0.9795 0.9908 1.0005 0.9904 0.9793 0.9735 0.9686

0.9682 0.9734 0.9796 0.9904 1.0004 0.9903 0.9794 0.9734

0.9639 0.9684 0.9735 0.9793 0.9903 1.0001 0.9903 0.9793

0.9604 0.9643 0.9689 0.9735 0.9794 0.9903 1.0004 0.9904

0.9570 0.9605 0.9646 0.9686 0.9734 0.9793 0.9904 1.0002

1.0000 0.9888 0.9770 0.9704 0.9648 0.9599 0.9554 0.9510

0.9888 1.0004 0.9891 0.9768 0.9703 0.9646 0.9596 0.9548

0.9770 0.9891 1.0004 0.9886 0.9764 0.9698 0.9640 0.9587

0.9704 0.9768 0.9886 0.9994 0.9878 0.9755 0.9687 0.9627

0.9648 0.9703 0.9764 0.9878 0.9986 0.9870 0.9746 0.9676

0.9599 0.9646 0.9698 0.9755 0.9870 0.9978 0.9861 0.9734

0.9554 0.9596 0.9640 0.9687 0.9746 0.9861 0.9967 0.9847

0.9510 0.9548 0.9587 0.9627 0.9676 0.9734 0.9847 0.9951

CR/CR(1,1) -

Notice it can be approximated by the form shown two slides before, with ρ~0.99

CC/CC(1,1) -

Notice it can be approximated by the form shown two slides before, with ρ~0.9888

More results from the previous experiment. See code:

http://www.cse.iitb.ac.in/~ajitvr/CS663_Fall2016/DCT_PCA/

49 of 114

Computation of DCT coefficients in JPEG

Before computation, the value 128 (midpoint of the range 0 to 255) is subtracted from every pixel value.
This changes the range of intensity values from 0 to 255, to -128 to 127.
This also changes the range of DCT coefficient values from 0 to 2048, to -1024 to +1024.

50 of 114

STEP 2: Quantization

51 of 114

Quantization

The DCT coefficients are floating point numbers and storing them in a file will produce no compression. So they need to be quantized.
The human eye is not sensitive to changes in the higher frequency content.
So we can have cruder quantization for the higher frequency coefficients and a finer one for the lower frequency coefficients.

52 of 114

Quantization

Quantization is performed by dividing the DCT coefficient matrix element-wise by a quantization matrix and rounding off to the nearest integer.
The quantization matrix on the next slide is for quality factor Q = 50.
Matrices for lower Q values are obtained by scaling the Q = 50 matrix with a constant 50/Q – which increases the values in the quantization matrix.
Matrices for higher Q values are obtained by scaling the Q = 50 matrix with a constant 50/Q – which decreases the values in the quantization matrix.

53 of 114

Quantization matrix for Q = 50: notice the higher values in the matrix for higher frequency coefficients

Most of the values in B are 0! They need not be stored! Only the non-zero values in B will be stored!

54 of 114

How was this quantization matrix picked?

The quantization error is given by

where .

The maximum possible value of the error is given by M_uv/2.
Psychophysical studies have been performed to find threshold values of DCT coefficients, i.e. for each frequency (u,v), these studies have determined the smallest DCT coefficient value that yielded a visible signal. This threshold is called t_uv.
We set M_uv = 2t_uv so that the errors remain invisible.

55 of 114

STEP 3: Lossless compression steps: Huffman encoding and Run length encoding

56 of 114

57 of 114

Huffman encoding

Input: a set of non-zero quantized DCT coefficients from all the different blocks of the image (values lying between -1024 to +1024).
Output: a set of encoded coefficients with length (in terms of number of bits) less than that of the original set.
Principles behind Huffman encoding:
Encode the more frequently occurring coefficients with fewer bits. Encode the rarely occurring coefficients with more bits. This will reduce the average bit-length.
Ensure that the encoding for no coefficient is a strict prefix of the encoding of any other coefficient (to be explained on next slide). This is called a “prefix-free code”.

58 of 114

Huffman encoding example

Consider a set of alphabets {a,e,q}. Let the frequency of an alphabet x be denoted as p(x).
Assume p(e) > p(a) > p(q) [actually true in the English language].
Consider the following code-word assignment: e – 0, a – 1, q – 01 (note: we assigned more bits for q). Now consider the encoded stream: 001. It can be interpreted as ‘eea’ or ‘eq’.
The reason for this ambiguity is that the code for ‘e’ is a strict prefix of the code for ‘q’.
For unambiguous decoding, we need prefix-free codes. Example e – 0, a – 10, q – 11 is one example of a prefix-free code.

59 of 114

Huffman encoding example

The Huffman encoding algorithm asks the following question:

Given a set of n alphabets A = {a_i} with corresponding frequencies {p(a_i)} (each frequency lies from 0 to 1), what prefix-free encoding yields the least average bit length? That is, which set of code-words {λ(a_i)} will minimize

Length of the code-word λ(a_i)

60 of 114

Algorithm

Sort alphabets in increasing order of frequency. Create a leaf node from each alphabet. These leaf nodes will belong to a binary tree called the Huffman tree.
Combine the two lowest frequency nodes s₁ and s₂ to create a parent node s₁₂. s₁ and s₂ will be the left and right child of s₁₂. The frequency of s₁₂ is given by p(s₁₂) = p(s₁) + p(s₂).
Label the edge from s₁₂ to s₁ with a ‘0’ and the edge from s₁₂ to s₂ with a ‘1’.
Delete s₁ and s₂ from the sorted list of alphabets and insert the node s₁₂, i.e. root node of the tree (s₁₂,s₁,s₂) in the correct place depending on the value of p(s₁₂).
Repeat steps 2 to 4 until there is only one node in the list. This will be the root node of the final Huffman tree.
Traverse the tree from the root node until each leaf and collect all the binary symbols along every edge into a string. This string will form the code word for that symbol.

61 of 114

Example

www.cis.upenn.edu/~matuszek/cit594-2002/Slides/huffman .ppt‎

A = 0�B = 100�C = 1010�D = 1011�R = 11

This is a prefix-free code. No leaf node is on the path to any other node.

0.4

0.2

0.1

0.2

0.4

0.2

0.1

0.2

0.4

0.2

0.1

0.2

0.4

0.2

0.1

0.2

0.4

0.6

1.00

0.4

0.2

0.4

0.6

0.2

62 of 114

0.4

0.2

0.1

0.2

0.4

0.6

1.00

A = 0�B = 100�C = 1010�D = 1011�R = 11

Input string: RABBCDR

Encoded bit stream:

11-0-100-100-1010-1011-11

Decoded string: RABBCDR

To perform encoding, we maintain an initially empty encoded bit stream. Read a symbol from the input, traverse the Huffman tree from root node to the leaf node for that symbol, collecting all the bit labels on the traversed path, and appending them to the encoded bit stream. Repeat this for every symbol from the input. Example: for R, we write 11.

To perform decoding, read the encoded bit stream, and traverse the Huffman tree from the root node toward a leaf node, following the path as indicated by the bit stream. For example, if you read in 11, you would travel to the leaf node R. When you reach a leaf node, append its associated symbol to the decoded output. Go back to the root node and traverse the tree as per the remaining bits from the encoded bit stream.

Average code length here

= 1 * p(A) + 3 * p(B) + 4 * p(C) + 4*p(D) + 2 *p(R) = 0.4 + 0.6 + 0.4 + 0.4 + 0.4 = 2.2.

63 of 114

About the algorithm

This is a greedy algorithm, which is guaranteed to produce the prefix-free code with minimal average length (proof beyond the scope of the course).
There could be multiple sets of code words with the same average bit length. Huffman encoding produces one of them, depending on the order in which the nodes were combined, and the convention for labeling the edges with a 0 or a 1.

64 of 114

Huffman Trees and Entropy

The average code length as computed by this algorithm satisfies

Entropy of the random variable (set of alphabets) – measure of uncertainty of the random variable, or the measure of how much a random variable surprises you, or the average number of bits required to store a random variable.

Recall: Entropy is minimum in the case where the random variable takes on only one value. It is the maximum when it can take on any one of some k different values, each with equal probability.

Note: Entropy is also the average number of bits required to encode the values of the random variable. It is not possible to code the intensity values of an image with fewer bits per pixel than its entropy.

65 of 114

Zig-zag ordering

The quantized DCT coefficients are arranged now in a zigzag order as follows. The zig-zag pattern leaves a bunch of consecutive zeros at the end.

−26
−3	0
−3	−2	−6
2	−4	1	−3
1	1	5	1	2
−1	1	−1	2	0	0
0	0	0	−1	−1	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0
0	0	0	0
0	0	0
0	0
0

66 of 114

Run length encoding

The non-zero re-ordered quantized DCT coefficients (except for the DC coefficient) are written down in the following format:

run-length (number of zeros before this coefficient),
size (no. of bits to store the Huffman code for the coefficient),
actual Huffman code of the coefficient

4 bits each

We refer to the above set as a triple. In case there are more than 15 zeros in between 2 non-zero AC coefficients, a special triple is inserted. That triple is (15,0,0). If there are a large number of trailing zeros at the end of a block, we but in an “end of block” triple given as (0,0).

67 of 114

−26
−3	0
−3	−2	−6
2	−4	1	−3
1	1	5	1	2
−1	1	−1	2	0	0
0	0	0	−1	−1	0	0
0	0	0	0	0	0	0	0
0	0	0	0	0	0	0
0	0	0	0	0	0
0	0	0	0	0
0	0	0	0
0	0	0
0	0
0

(0, 2)(-3); (1, 2)(-3); (0, 2)(-2); (0, 3)(-6); (0, 2)(2); (0, 2)(-4); (0, 1)(1); (0, 2)(-3); (0, 1)(1);

(0, 1)(1); (0, 3)(5); (0, 1)(1); (0, 2)(2); (0, 1)(-1); (0, 1)(1); (0, 2)(2); (5, 1)(-1); (0, 1)(-1); (0, 0).

68 of 114

Encoding DC coefficients

The difference between the DC coefficient of the current and previous patch is encoded and stored.
These difference values are Huffman encoded using a separate table (different from the Huffman table used for AC coefficients).
The DC coefficient of the first patch is stored explicitly.

69 of 114

JPEG encoded file

Begins with a header that contains information such as size of file, whether color or grayscale, the table of different alphabets (i.e. DCT coefficient values and their Huffman codes) and the quantization matrix.
This is followed by a bit stream containing triples of the form: (run-length, the length of the Huffman code of the coefficient, and the Huffman code for the coefficient).

70 of 114

JPEG DECODER

71 of 114

JPEG decoding

Perform Huffman decoding and obtain the DCT coefficients (AC).
Multiply the AC coefficients point-wise with the entries in the quantization matrix.
Compute the DC coefficients for each patch using the differences between the DC coefficients of successive patches. Multiply by the appropriate entry from the quantization matrix.
Reconstruct the image patches of size 8 x 8 using the inverse DCT. Add 128 to the intensity values in the patch.
Note: During JPEG encoding, the round-off errors from the quantization step can never be recovered again. Hence JPEG is overall a lossy algorithm.

72 of 114

JPEG for color images

73 of 114

JPEG for color images

The RGB values are converted to the YCbCr color space using:

Encode the Y, Cb, and Cr channels separately, using the “grayscale” JPEG algorithm on each channel. The Cb and Cr channels (the chrominance channels) are down-sampled by a factor of 2 in X and Y direction to further save storage space.
The Y channel (luminance) is not down-sampled. This is because the human eye is much more sensitive to luminance than to chrominance information.

74 of 114

PCA on RGB values

Why can you not separately compress the R,G,B images – instead of converting to Y, Cb, Cr?
The answer lies in PCA!
Suppose you take N color images and extract RGB values of each pixel (3 x 1 vector at each location).
Now, suppose you build an eigenspace out of this – you get 3 eigenvectors, each corresponding to 3 different eigenvalues.

75 of 114

PCA on RGB values

The eigenvectors will look typically as follows:

0.5952 0.6619 0.4556

0.6037 0.0059 -0.7972

0.5303 -0.7496 0.3961

Exact numbers are not important, but the first eigenvector is like an average of RGB. It is called as the Luminance Channel (Y). It is similar to the intensity in the HSI space.

76 of 114

PCA on RGB values

The second eigenvector is like Y-B, and the third is like Y-G. These are called as the Chrominance Channels.
The Y-Cb-Cr color space is related to this PCA-based space (though there are some details in the relative weightings of RGB to get Luminance and Chrominance – denoted by Cb and Cr).
The values in the three channels Y, Cb and Cr are decorrelated, similar to the values projected onto the PCA-based channels.

77 of 114

PCA on RGB values

Why does PCA produce decorrelated values (i.e. why are the values of the eigencoefficients decorrelated)?

Recall: in PCA, we built the correlation matrix C from N original data-points {x_i}, 1 ≤ i ≤ N.

Recall that .

The eigencoefficients are given as {α_i}, 1 ≤ i ≤ N where x_i = Vα_i where C = VΛV^T.

The correlation matrix of the eigencoefficients is given by:

78 of 114

PCA on RGB values

The correlation matrix of the eigencoefficients is given by:

This is a diagonal matrix (of eigenvalues).

Which means that the different elements of the vector of eigencoefficients are decorrelated.

79 of 114

PCA on RGB values

Why is it important to have decorrelated values in compression?

Because if the variables were not decorrelated, any operations you perform on one variable have an effect on the other variables.

So if you compressed the R image, G image and B image separately, a change in the R values (due to quantization) unintentionally affects the G and B values (and vice versa).

To prevent this, you convert the color values from RGB to a decorrelated color space such as YCbCr.

80 of 114

PCA on RGB values

The luminance channel (Y) carries most information from the point of view of human perception, and the human eye is less sensitive to changes in chrominance.
This fact can be used to assign coarser quantization levels (i.e. fewer bits) for storing or transmitting Cb and Cr values as compared to the Y channel. This improves the compression rate.
The JPEG standard for color image compression uses the YCbCr format. For an image of size M x N x 3, it stores Y with full resolution (i.e. as an M x N image), and Cb and Cr with 25% resolution, i.e. as M/2 x N/2 images.

81 of 114

82 of 114

83 of 114

84 of 114

The variances of the three eigen-coefficient values:

8411, 159.1, 71.7

85 of 114

86 of 114

87 of 114

88 of 114

RGB and its corresponding Y, Cb, Cr channels

89 of 114

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-4.pdf

90 of 114

Down-sampling of Cb and Cr in X and Y directions by a factor of 2

No down-sampling of chrominance or luminance channels

Cb channel under different down-sampling factors

Down-sampling of Cb and Cr only in X direction by a factor of 2

https://en.wikipedia.org/wiki/Chroma_subsampling#/media/File:Colorcomp.jpg

91 of 114

Modes of JPEG compression

Sequential: encoding and decoding of patches takes place in left to right, top to bottom order.
Progressive: encoding and decoding in multiple scans, each one with finer quantization levels.
Hierarchical: encoding and decoding performed at different scales.

92 of 114

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-4.pdf

Commonly seen in web applications (e.g.: Facebook)

93 of 114

JPEG artifacts

Seam artifacts at patch boundaries (more prominent for lower Q values).
Ringing artifacts around edges.
Some loss of edge and textural detail.
Color artifacts

94 of 114

Video Compression

95 of 114

Need for video compression

Huge data – typical HDTV has frames of size 1920 x 1080 and 30 fps frame-rate. That is more than 1 GB per second.
Network channel bandwidths are limited (around 20 Mbps).
We need large rates of compression (around 1:80)!

96 of 114

Motion JPEG

Encode each frame using JPEG.
That will yield only around 1:10 compression.
This makes use of only spatial redundancy, no temporal redundancy.

97 of 114

MPEG (Motion Pictures Expert Group)

Heavy use of temporal redundancy!
Uses a process called predictive coding.
Consider pixel f(x,y,t) in frame t. We try to predict its value, denoted as g(x,y,t), using a linear combinations of the values from previous k frames, i.e. f(x,y,t-k),…,f(x,y,t-1).
We simply store the error e(x,y,t) = f(x,y,t)-g(x,y,t).

98 of 114

Differential coding (First order predictive coding)

In this method, k = 1, i.e. you encode the differences between consecutive frames, i.e. e(x,y,t) = f(x,y,t) – f(x,y,t-1).
For most frames, the errors are highly sparse.

99 of 114

100 of 114

Errors are not always sparse ☹

Exceptions: (1) camera zoom-in and zoom-out, (2) sudden changes in viewpoint or scene content, or fade-in/fade-out effects. In such cases, the errors will be large!

100

101 of 114

Motion compensation

For each macro-block (typically 16 x 16 or 8 x 8 in size) in the frame to be encoded, find the most similar macro-block in a reference frame (which could be the previous frame, but not necessarily so).
The difference between the pixel locations of the top-left corner of the two macro-blocks is called the motion vector.
The motion vector is considered to be constant within a macro-block.
Search for similar macro-blocks is restricted to a small search window around the original macro-block.
The search window is rectangular – broader than taller (why?).

101

102 of 114

102

103 of 114

Motion Compensation

For many macro-blocks, the motion vectors will be 0.
The search for similar macro-blocks is performed at a sub-pixel level (1/2 pixel or ¼ pixel) for more accuracy. In this case, image intensity values need to be interpolated.
Macro-block similarity measure is one of the following:

103

In color videos, only luminance (i.e. Y) channel is used in similarity measure

104 of 114

Motion compensation

Many a time, there are more than one similar block. In such cases, the spatially closest block is chosen.
Note: we are not concerned with the accuracy of the motion estimate. It is only a means towards the larger goal – of compression.
The inter-frame differences are computed after motion compensation. This hugely improves the sparsity of these differences.
In other words, don’t blindly compute the difference between macro-blocks at corresponding locations in frames t and t-1. Compute the difference between the current macro-block in frame t and its most similar match in frame t-1.

104

105 of 114

105

106 of 114

106

107 of 114

107

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-5.pdf

108 of 114

Motion Compensation

For each macro-block (typically 16 x 16 or 8 x 8 in size) in the frame to be encoded, find the most similar macro-block in a reference frame.
If the reference frame is the previous one or the previous reference frame (see two slides later for more details), the current frame (the one being encoded) is called the P-frame.
If the reference frame is a combination of the previous frame and the next one, the current frame is called the B-frame.
A B-frame can use two motion vectors – one for previous and one for next frame.

108

109 of 114

109

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-5.pdf

Example: why do we need B-frames?

110 of 114

No Motion Compensation here!

For some frames that rest on shot-boundaries (i.e. sudden changes in content), there is no advantage to performing motion compensation.
Such frames are called I-frames (independent frames). These usually act as reference frames for other frames to be differentially encoded.
I-frames can be detected by the presence of very frequently low similarity values during search for macro-blocks.

110

111 of 114

MPEG encoder

111

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-5.pdf

Note:

YUV is a color-space very similar to YCbCr.
MV = motion vector

112 of 114

MPEG encoder

The I-frames are encoded using JPEG.
The B-frames and P-frames are also encoded using JPEG, but the DCT is computed on macro-blocks from the motion-compensated residual image as follows:
We have already computed motion vectors w.r.t. the reference frame.
Compute a residual image by calculating differences between macro-blocks in the current frame and the most-similar matching macro-blocks (as given by the motion vector) from the reference image. This is called motion-compensated frame differencing.
Note: the motion vector for the macro-block also needs to be stored. For this, motion-vectors from several macro-blocks are collected together and Huffman-encoded. Only the non-zero motion vectors are encoded.

112

113 of 114

113

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-15-5.pdf

Display Order and Transmission Order (or order in which frames are compressed) may be different!

114 of 114

MPEG decoder

114

Note:

YUV is a color-space very similar to YCbCr.