Image Compression
CS 663, Ajit Rajwade
1
Image Compression
2
Types of compression
3
Lossless compression - examples
4
Lossy compression
5
Lossy image compression
6
7
Source: Article on compressive sensing by Candes and Wakin, from IEEE Signal Processing Magazine, 2008
DCT coefficients or
JPEG compression method
8
JPEG image compression
9
JPEG image compression
10
11
Q = 100, compression rate = 1/2.6
Q = 50, compression rate = 1/15
Q = 25, compression rate = 1/23
Q = 1, compression rate = 1/144
Q = 10, compression rate = 1/46
Steps of the JPEG algorithm (encoder): Overview (approximate)
We will go through each step in detail in the several slides to follow.
12
13
STEP 1: Discrete Cosine Transform (DCT)
14
Discrete Cosine Transform (DCT) in 1D
15
Discrete Cosine Transform (DCT) in 1D
16
n
u
u
n
DCT
17
18
u
n
Digression: matrix view of a discrete orthonormal transform (Fourier transform used as example here)
19
Fourier matrix: in any row, the value of x is fixed, the value of u ranges from 0 to M-1
20
DCT in 2D
21
The DCT matrix is this case will have size MN x MN, and it will be the Kronecker product of two DCT matrices – one of size M x M, the other of size N x N. The DCT matrix for the 2D case is also orthonormal, it is NOT symmetric and it is NOT the real part of the 2D DFT.
What is a 2D Fourier Matrix?
22
What is a 2D Fourier Matrix?
23
Consider a matrix A of size N1 x N2 and a matrix B of size M1 x M2. The size of their Kronecker product is given by N1 M1 x N2 M2. The Kronecker product is constructed by creating a rectangular grid of size N1 x N2 . In each cell of the grid, you place B. The copy of B in the cell at grid location (i,j) is multiplied by Aij.
24
This big matrix is nothing but the Kronecker product of two 2 x 2 Fourier matrices.
How do the DCT bases look like? (1D case)
25
How do the DCT bases look like? (2D-case)
26
The DCT transforms an 8×8 block of input values to a linear combination of these 64 patterns. The patterns are referred to as the two-dimensional DCT basis vectors, and the output values are referred to as transform coefficients. Here each basis vector is reshaped to form an image.
Again: DCT is NOT the real part of the DFT
27
Real part of DFT
DCT
DCT on grayscale image patches
28
29
149 74 92 74 74 74 149 162
87 74 117 30 74 105 180 130
30 117 105 43 105 130 149 105
74 162 105 74 105 117 105 105
117 149 74 117 74 105 74 149
149 87 74 87 74 74 117 180
105 74 105 43 61 117 180 149
74 74 105 74 105 130 149 105
IMAGE PATCH
828.3750 -106.7827 126.4183 -8.2540 -57.3750 -0.5311 -2.1682 29.8472
-6.0004 2.5328 8.3779 -7.1377 -17.3419 -6.9695 -11.1366 22.7612
-6.5212 -56.2336 23.5930 16.3746 -5.5436 74.2016 23.1543 65.2328
17.2141 29.9058 91.3782 -19.9119 106.2541 37.4804 15.8409 -25.1828
14.1250 53.2562 -30.5477 -0.8891 30.8750 -23.2787 -9.4005 -41.8019
5.7938 -2.9468 10.0191 2.8929 -16.5056 -2.4595 -5.1284 12.7364
-3.6579 2.3417 -14.8457 -0.7304 34.6327 -10.3257 -7.3430 -5.6082
-1.7071 -9.8264 -6.4722 -1.3611 -10.5811 -4.5081 -0.4332 -20.6615
DCT COEFFICIENTS
HISTOGRAM OF DCT COEFFICIENTS
30
Original image
Image reconstructed after discarding all DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 10, and then computing inverse DCT
Number of DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 10 was 34,377 out of a total of 65536 (64 coefficients for each 8 x 8 patch, totally 1024 such patches). This is more than 50%. Corresponding percentage for DFT was 1%.
Image reconstructed after discarding all DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 20, and then computing inverse DCT
Number of DCT coefficients of non-overlapping 8 x 8 patches with absolute value less than 20 was 51,045 out of a total of 65536 (64 coefficients for each 8 x 8 patch, totally 1024 such patches). This is more than 78%. Corresponding percentage for DFT was 7%.
Why DCT? DFT and DCT comparison
31
| DFT | DCT |
Orthonormal | Yes | Yes |
Real/complex | Complex | Real |
Separable in 2D | Yes | Yes |
Norm-preserving | Yes | Yes |
Inverse exists | Yes | Yes |
Fast implementation | Yes (fft) | Yes (uses fft) |
Energy compaction for natural images | Good/Fair | Much Better |
DCT has better energy compaction than DFT because…
32
Recall that the DFT of a sequence is equal to the Discrete Fourier Series (DFS) of a periodic extension of that sequence. In computing the DFT of a signal of length n, there is the implicit extension of several copies of the signal placed one after the other (n-point periodicity). The resultant discontinuities require several frequencies for good representation in the DFS. As against this, the discontinuities are reduced in a DCT because a reflected copy of the signal is appended to it (2n-point periodicity) before computing the DFS.
DCT has better energy compaction than DFT because…
33
DCT computational complexity
34
35
DCT of f is computed from the DFT of the sequence of double length as f.
Only the first N frequencies are picked.
Reflected version of f, appended to f.
In MATLAB, you have the commands called dct and idct (in 1D) and dct2 and idct2 (in 2D).
(with some caveats – next slide)
36
1
You would noticed that the constant factors sqrt(1/N) and sqrt(2/N) are missing in this expression. These factors are essential for the DCT matrix to be orthonormal, but their presence doesn’t allow for this relationship between DCT and DFT.
See code in google drive folder
Which is the best orthonormal basis?
37
Which is the best orthonormal basis?
38
Which is the best orthonormal basis?
39
PCA: separable 2D version
40
But PCA is not used in JPEG, because…
The DCT is used instead!
41
DCT and PCA
42
Experiment
43
Code: See folder on google drive
44
0.3536 0.4904 0.4619 0.4157 0.3536 0.2778 0.1913 0.0975
0.3536 0.4157 0.1913 -0.0975 -0.3536 -0.4904 -0.4619 -0.2778
0.3536 0.2778 -0.1913 -0.4904 -0.3536 0.0975 0.4619 0.4157
0.3536 0.0975 -0.4619 -0.2778 0.3536 0.4157 -0.1913 -0.4904
0.3536 -0.0975 -0.4619 0.2778 0.3536 -0.4157 -0.1913 0.4904
0.3536 -0.2778 -0.1913 0.4904 -0.3536 -0.0975 0.4619 -0.4157
0.3536 -0.4157 0.1913 0.0975 -0.3536 0.4904 -0.4619 0.2778
0.3536 -0.4904 0.4619 -0.4157 0.3536 -0.2778 0.1913 -0.0975
DCT matrix: dctmtx command from MATLAB (see code on website)
0.3517 -0.4493 -0.4278 0.4230 0.3754 0.3247 -0.2250 -0.1245
0.3534 -0.4366 -0.2276 -0.0110 -0.3078 -0.4746 0.4732 0.2975
0.3543 -0.3101 0.1728 -0.4830 -0.3989 0.0498 -0.4299 -0.4109
0.3546 -0.1115 0.4799 -0.3005 0.3342 0.4102 0.1856 0.4761
0.3547 0.1141 0.4823 0.2944 0.3301 -0.4182 0.1745 -0.4771
0.3543 0.3104 0.1771 0.4834 -0.3977 -0.0322 -0.4308 0.4103
0.3535 0.4357 -0.2319 0.0143 -0.3009 0.4656 0.4851 -0.2975
0.3520 0.4468 -0.4328 -0.4204 0.3686 -0.3253 -0.2342 0.1261
0.3520 -0.4461 -0.4305 0.4224 0.3696 0.3247 0.2342 0.1283
0.3537 -0.4338 -0.2345 -0.0114 -0.3000 -0.4671 -0.4814 -0.3028
0.3545 -0.3086 0.1662 -0.4896 -0.4007 0.0359 0.4261 0.4102
0.3548 -0.1145 0.4763 -0.3031 0.3339 0.4198 -0.1800 -0.4713
0.3548 0.1056 0.4839 0.2926 0.3349 -0.4194 -0.1766 0.4733
0.3543 0.3043 0.1863 0.4833 -0.4028 -0.0354 0.4269 -0.4097
0.3532 0.4389 -0.2269 0.0180 -0.3008 0.4654 -0.4811 0.3037
0.3512 0.4562 -0.4300 -0.4126 0.3694 -0.3242 0.2335 -0.1319
VC: Eigenvectors of column-column correlation matrix
VR: Eigenvectors of row-row correlation matrix
1.0000 0.0007 0.0032 0.0002 0.0013 0.0001 0.0005 0.0000
0.0007 0.9970 0.0097 0.0689 0.0009 0.0322 0.0003 0.0110
0.0033 0.0106 0.9968 0.0118 0.0713 0.0004 0.0334 0.0025
0.0002 0.0718 0.0124 0.9926 0.0007 0.0927 0.0017 0.0276
0.0010 0.0001 0.0737 0.0004 0.9942 0.0008 0.0780 0.0010
0.0000 0.0261 0.0015 0.0962 0.0005 0.9934 0.0011 0.0569
0.0003 0.0007 0.0276 0.0021 0.0802 0.0010 0.9964 0.0013
0.0000 0.0076 0.0026 0.0227 0.0012 0.0596 0.0015 0.9979
1.0000 0.0002 0.0029 0.0001 0.0010 0.0000 0.0004 0.0000
0.0002 0.9965 0.0028 0.0766 0.0005 0.0314 0.0009 0.0107
0.0029 0.0025 0.9969 0.0046 0.0728 0.0017 0.0304 0.0013
0.0001 0.0795 0.0044 0.9923 0.0029 0.0916 0.0015 0.0243
0.0008 0.0003 0.0747 0.0026 0.9948 0.0061 0.0696 0.0004
0.0000 0.0246 0.0021 0.0949 0.0069 0.9940 0.0131 0.0452
0.0003 0.0004 0.0252 0.0003 0.0715 0.0137 0.9970 0.0002
0.0000 0.0076 0.0013 0.0207 0.0001 0.0476 0.0009 0.9986
Absolute value of dot products between the columns of DCT matrix and columns of VR (left) and VC (right)
45
DCT bases
64 columns of V – each reshaped to form an 8 x 8 image, and rescaled to fit in the 0-1 range. Notice the similarity between the DCT bases and the columns of V. Again, V is the Kronecker product of VR and VC.
DCT and PCA
46
DCT and PCA
47
48
1.0000 0.9902 0.9795 0.9733 0.9682 0.9639 0.9604 0.9570
0.9902 1.0005 0.9908 0.9795 0.9734 0.9684 0.9643 0.9605
0.9795 0.9908 1.0010 0.9908 0.9796 0.9735 0.9689 0.9646
0.9733 0.9795 0.9908 1.0005 0.9904 0.9793 0.9735 0.9686
0.9682 0.9734 0.9796 0.9904 1.0004 0.9903 0.9794 0.9734
0.9639 0.9684 0.9735 0.9793 0.9903 1.0001 0.9903 0.9793
0.9604 0.9643 0.9689 0.9735 0.9794 0.9903 1.0004 0.9904
0.9570 0.9605 0.9646 0.9686 0.9734 0.9793 0.9904 1.0002
1.0000 0.9888 0.9770 0.9704 0.9648 0.9599 0.9554 0.9510
0.9888 1.0004 0.9891 0.9768 0.9703 0.9646 0.9596 0.9548
0.9770 0.9891 1.0004 0.9886 0.9764 0.9698 0.9640 0.9587
0.9704 0.9768 0.9886 0.9994 0.9878 0.9755 0.9687 0.9627
0.9648 0.9703 0.9764 0.9878 0.9986 0.9870 0.9746 0.9676
0.9599 0.9646 0.9698 0.9755 0.9870 0.9978 0.9861 0.9734
0.9554 0.9596 0.9640 0.9687 0.9746 0.9861 0.9967 0.9847
0.9510 0.9548 0.9587 0.9627 0.9676 0.9734 0.9847 0.9951
CR/CR(1,1) -
Notice it can be approximated by the form shown two slides before, with ρ~0.99
CC/CC(1,1) -
Notice it can be approximated by the form shown two slides before, with ρ~0.9888
More results from the previous experiment. See code:
Computation of DCT coefficients in JPEG
49
STEP 2: Quantization
50
Quantization
51
Quantization
52
53
Quantization matrix for Q = 50: notice the higher values in the matrix for higher frequency coefficients
M
Most of the values in B are 0! They need not be stored! Only the non-zero values in B will be stored!
How was this quantization matrix picked?
where .
54
STEP 3: Lossless compression steps: Huffman encoding and Run length encoding
55
56
Huffman encoding
57
Huffman encoding example
58
Huffman encoding example
Given a set of n alphabets A = {ai} with corresponding frequencies {p(ai)} (each frequency lies from 0 to 1), what prefix-free encoding yields the least average bit length? That is, which set of code-words {λ(ai)} will minimize
59
Length of the code-word λ(ai)
Algorithm
60
Example
61
1
2
3
4
1
0
1
0
0
1
0
0
1
1
1
0
1
1
1
1
0
0
0
A = 0�B = 100�C = 1010�D = 1011�R = 11
This is a prefix-free code. No leaf node is on the path to any other node.
0.4
0.2
0.1
0.1
0.2
0.4
0.2
0.1
0.1
0.2
0.4
0.2
0.1
0.1
0.2
0.4
0.2
0.1
0.1
0.2
0.2
0.4
0.6
1.00
0.4
0.2
0.2
0.4
0.6
0
0.2
62
1
1
1
1
0
0
0
0.4
0.2
0.1
0.1
0.2
0.2
0.4
0.6
1.00
A = 0�B = 100�C = 1010�D = 1011�R = 11
Input string: RABBCDR
Encoded bit stream:
11-0-100-100-1010-1011-11
Decoded string: RABBCDR
To perform encoding, we maintain an initially empty encoded bit stream. Read a symbol from the input, traverse the Huffman tree from root node to the leaf node for that symbol, collecting all the bit labels on the traversed path, and appending them to the encoded bit stream. Repeat this for every symbol from the input. Example: for R, we write 11.
To perform decoding, read the encoded bit stream, and traverse the Huffman tree from the root node toward a leaf node, following the path as indicated by the bit stream. For example, if you read in 11, you would travel to the leaf node R. When you reach a leaf node, append its associated symbol to the decoded output. Go back to the root node and traverse the tree as per the remaining bits from the encoded bit stream.
0
Average code length here
= 1 * p(A) + 3 * p(B) + 4 * p(C) + 4*p(D) + 2 *p(R) = 0.4 + 0.6 + 0.4 + 0.4 + 0.4 = 2.2.
About the algorithm
63
Huffman Trees and Entropy
64
Entropy of the random variable (set of alphabets) – measure of uncertainty of the random variable, or the measure of how much a random variable surprises you, or the average number of bits required to store a random variable.
Recall: Entropy is minimum in the case where the random variable takes on only one value. It is the maximum when it can take on any one of some k different values, each with equal probability.
Note: Entropy is also the average number of bits required to encode the values of the random variable. It is not possible to code the intensity values of an image with fewer bits per pixel than its entropy.
Zig-zag ordering
65
−26 | | | | | | | |
−3 | 0 | | | | | | |
−3 | −2 | −6 | | | | | |
2 | −4 | 1 | −3 | | | | |
1 | 1 | 5 | 1 | 2 | | | |
−1 | 1 | −1 | 2 | 0 | 0 | | |
0 | 0 | 0 | −1 | −1 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | | |
0 | 0 | 0 | 0 | 0 | | | |
0 | 0 | 0 | 0 | | | | |
0 | 0 | 0 | | | | | |
0 | 0 | | | | | | |
0 | | | | | | | |
Run length encoding
66
4 bits each
We refer to the above set as a triple. In case there are more than 15 zeros in between 2 non-zero AC coefficients, a special triple is inserted. That triple is (15,0,0). If there are a large number of trailing zeros at the end of a block, we but in an “end of block” triple given as (0,0).
67
−26 | | | | | | | |
−3 | 0 | | | | | | |
−3 | −2 | −6 | | | | | |
2 | −4 | 1 | −3 | | | | |
1 | 1 | 5 | 1 | 2 | | | |
−1 | 1 | −1 | 2 | 0 | 0 | | |
0 | 0 | 0 | −1 | −1 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | | |
0 | 0 | 0 | 0 | 0 | | | |
0 | 0 | 0 | 0 | | | | |
0 | 0 | 0 | | | | | |
0 | 0 | | | | | | |
0 | | | | | | | |
(0, 2)(-3); (1, 2)(-3); (0, 2)(-2); (0, 3)(-6); (0, 2)(2); (0, 2)(-4); (0, 1)(1); (0, 2)(-3); (0, 1)(1);
(0, 1)(1); (0, 3)(5); (0, 1)(1); (0, 2)(2); (0, 1)(-1); (0, 1)(1); (0, 2)(2); (5, 1)(-1); (0, 1)(-1); (0, 0).
Encoding DC coefficients
68
JPEG encoded file
69
JPEG DECODER
70
JPEG decoding
71
JPEG for color images
72
JPEG for color images
73
PCA on RGB values
74
PCA on RGB values
0.5952 0.6619 0.4556
0.6037 0.0059 -0.7972
0.5303 -0.7496 0.3961
75
PCA on RGB values
76
PCA on RGB values
77
PCA on RGB values
78
PCA on RGB values
79
PCA on RGB values
80
81
82
83
84
The variances of the three eigen-coefficient values:
8411, 159.1, 71.7
85
86
87
88
RGB and its corresponding Y, Cb, Cr channels
90
Down-sampling of Cb and Cr in X and Y directions by a factor of 2
No down-sampling of chrominance or luminance channels
Cb channel under different down-sampling factors
Down-sampling of Cb and Cr only in X direction by a factor of 2
Modes of JPEG compression
91
92
Commonly seen in web applications (e.g.: Facebook)
JPEG artifacts
93
Video Compression
94
Need for video compression
95
Motion JPEG
96
MPEG (Motion Pictures Expert Group)
97
Differential coding (First order predictive coding)
98
99
Errors are not always sparse ☹
100
Motion compensation
101
102
Motion Compensation
103
In color videos, only luminance (i.e. Y) channel is used in similarity measure
Motion compensation
104
105
106
Motion Compensation
108
109
Example: why do we need B-frames?
No Motion Compensation here!
110
MPEG encoder
111
Note:
MPEG encoder
112
113
Display Order and Transmission Order (or order in which frames are compressed) may be different!
MPEG decoder
114
Note: