1 of 136

Introduction to digital video technology

2 of 136

Feedbacks

Please take notes about possible improvements.

Feedback are welcome, things like: better naming, shorten Y content, expand X content and etc.

Don’t let the others suffer as you will.

3 of 136

Basic Terminology

4 of 136

What is an image?

100

0

100

100

100

0

0

7

0

7

100

100

0

0

2

0

2

2

100

6

100

6

0

0

2

2

0

0

6

6

R

G

B

2D

3D

color intensity

5 of 136

What is picture element (pixel) ?

R

G

B

100

0

100

100

100

0

0

7

0

7

100

100

0

0

2

0

2

2

100

6

100

6

0

0

2

2

0

0

6

6

6 of 136

What is bit (color) depth?

8R+8G+8B = 24 bits

*it gives you 2^24 different colors

R

G

B

0-255

the range

100

0

100

100

100

0

0

7

0

7

100

100

0

0

2

0

2

2

100

6

100

6

0

0

2

2

0

0

6

6

7 of 136

Color depth

24 bpp

10 bpp

8 bpp

8 of 136

Color depth

9 of 136

All colors from RGB

X

10 of 136

All colors from RGB

https://lumeniquessl.com/2012/03/01/12-in-12-for-2012-the-flicker-indicator-machine/

11 of 136

All colors from RGB

https://lightingstudio.wordpress.com/2012/03/27/week5-light-object-shadow-contrast/

12 of 136

What is resolution?

4

4

width

height

13 of 136

What is display aspect ratio (DAR)?

16:9 (1.7777777778) 4:3 (1.3333333333)

1280/720 (1.7777777778) 1024/768 (1.3333333333)

14 of 136

What is pixel aspect ratio (PAR)?

PAR 1:1

PAR 2:1

15 of 136

DVD display aspect 4:3, pixel aspect: 10:11

Source https://xiph.org/video/vid1.shtml

16 of 136

What is a video?

4D

time

30 frames per sec (FPS)

framerate

a single frame

17 of 136

Interlaced | progressive

18 of 136

What are 480p, 1080i, 1080p formats?

[number][letter]

number is the resolution's height and letter: p means progressive and i means interlaced.

19 of 136

What is bitrate?

30 FPS

WIDTH * HEIGHT * BITS_PER_PIXEL * FPS

4 * 4 * 24 * 30

11,520 bits per second

20 of 136

Constant bitrate (CBR)?

1.2Mbps

time

21 of 136

Variable bitrate (VBR)?

1.2Mbps

time

2.4Mbps

200Kbps

22 of 136

Average bitrate (ABR)?

1.2Mbps

time

2.4Mbps

200Kbps

min

max

400Kbps

1.8Mbps

can be seen as constrained VBR

23 of 136

Space needed to store 1h of video at 720p 30fps

* without any compression technique at all.

WIDTH * HEIGHT * BITS_PER_PIXEL * FPS

1280 * 720 * 24 * 30

663,552,000 (663.552Mb) bits per second

2,388,787,200,000 (278GB)

24 of 136

Review

image, pixel, bit depth, resolution, display aspect ratio, pixel aspect ratio, video, frame rate, interlaced, progressive, bitrate, CBR, VBR, ABR

25 of 136

From the world to the bits

26 of 136

How images are captured? CCD Sensor

27 of 136

How images are captured? CMOS Sensor (APS)

Use less power

Transmit data faster than CCD

Cheaper

Most commonly in cell phone cameras, web cameras

28 of 136

How images are captured?

Color filter: Bayer array

Filters 3 primaries colors

Sensor

29 of 136

Bayer Demosaicing

30 of 136

Bayer Demosaicing

31 of 136

Bayer Demosaicing

32 of 136

Redundancy Removal

33 of 136

What can we do?

compress repetitions within the frame

exploit our vision

reduce repetitions in time

34 of 136

Exploiting our vision

35 of 136

Colors models

36 of 136

Colors

37 of 136

Our eyes - an oversimplification

38 of 136

Our eyes - an oversimplification

39 of 136

We better to see luma than color

40 of 136

Color space YUV (YCbCr, YPbPr)

Y (luma)

U (chroma blue)

V (chroma red)

41 of 136

42 of 136

From RGB to YCbCr

Y = 0.299R + 0.587G + 0.114B

Cb = 0.564(B - Y) | Cr = 0.713(R - Y)

From YCbCr to RGB

R = Y + 1.402Cr | B = Y + 1.772Cb | G = Y - 0.344Cb - 0.714Cr

*ITU-R BT.601-7

43 of 136

From RGB to YCbCr

It depends on the recommendation / standards from groups.

SDTV with BT.601

(Rec. 601)

Y=0.299R+0.587G+0.114B

U=0.492(B-Y)

V=0.877(R-Y)

HDTV with BT.709

(Rec. 709)

Y=0.2126R+0.7152G+0.0722B

...

...

UHDTV with BT.2020

(Rec. 2020)

Y=0.2627R+0.6780G+0.0593B

...

...

Standards

Y (Luma)

Chroma B

Chroma R

44 of 136

groups? ISO/IEC, ITU-R, JVT/JCT, AOM, MPEG-LA

45 of 136

recommendations? Rec. 601, Rec. 709, Rec. 2020

Recommendation

Resolutions

Frame Rate

Bit Depth

Chroma Sub

BT.601 (SDTV)

525i 625i

50 60

8

YCrCb 4:4:2

BT.709 (HDTV)

1080p 1080i

50 60 30 24

8 10

* YCrCb 4:4:2

BT.2020 (UHDTV)

7680p 3840p

120, 100, 60, 50, 30, 24

10 12

4:4:4, 4:2:2, and 4:2:0

Rec. 601

Rec. 709

Rec. 2020

46 of 136

Chroma subsampling YUV 4:4:4 4:2:2 4:2:0

Y (luma)

U (chroma blue)

V (chroma red)

47 of 136

Chroma subsampling YUV 4:4:4 4:2:2 4:2:0

Y (luma)

U

V

48 of 136

Chroma subsampling YUV 4:4:4 4:2:2 4:2:0

49 of 136

Chroma subsampling 4:2:0

50 of 136

Chroma subsampling YUV420

1280

720

180

320

51 of 136

Chroma subsampling

52 of 136

Space needed to store 1h of video at 720p 30fps

with chroma subsampling YUV420

WIDTH * HEIGHT * BITS_PER_PIXEL * FPS

1280 * 720 * 24 * 30

663,552,000 (663.552Mb) bits per second

2,388,787,200,000 (278GB)

WIDTH * HEIGHT * BITS_PER_PIXEL * FPS

1280 * 720 * 12 * 30

331,776,000 (331.776Mb) bits per second

1,194,393,600,000 (139GB)

53 of 136

Correlations in time

54 of 136

Frame types

55 of 136

Temporal redundancy (inter prediction)

original frames

I-frame

P-frame

P-frame

P-frame

I-frame

F1

F0

F2

F3

F4

encoded frames

56 of 136

Temporal redundancy

original frames

|||||||||| (103Kb)

||| (2Kb)

||| (2Kb)

||| (2Kb)

|||||||||| (103Kb)

F1

F0

F2

F3

F4

57 of 136

Temporal redundancy with motion estimation

I-frame

P-frame

F1

F0

motion estimation (motion vector) applied to previous frame = predicted frame

predicted frame - real frame nth = residual (prediction error)

58 of 136

Temporal redundancy (B frames)

original frames

I-frame

P-frame

P-frame

B-frame

I-frame

F1

F0

F2

F3

F4

59 of 136

Correlations in space

60 of 136

Lots of similarities

61 of 136

Lots of similarities

62 of 136

Spatial redundancy (intra prediction)

100

100

100

200

100

???

???

???

100

???

???

???

100

???

???

???

100

100

100

200

100

100

100

200

100

100

100

200

100

100

100

200

100

100

100

200

100

100

100

200

100

100

100

200

100

100

120

210

100

100

100

200

100

0

0

0

100

0

0

0

100

0

20

10

unknown values

direction of the prediction

real values

difference

highly compressible

63 of 136

Spatial redundancy (intra prediction) H264

64 of 136

CODEC - enCOder / DECoder

65 of 136

CODEC

“A codec is a device or computer program for encoding or decoding a digital data stream or signal.”

66 of 136

CODEC (VP9, H265) vs Container (.WEBM,.MP4)

Source https://xiph.org/video/vid1.shtml

67 of 136

Container vs CODEC

Containers

  • OGG
  • MP4
  • WMA
  • AVI
  • MKV, WebM
  • TS
  • MOV

CODEC

  • H264 / AVC
  • H265 / HEVC
  • MPEG-4
  • VP9
  • AV1
  • Theora
  • Daala

68 of 136

69 of 136

History

70 of 136

Patents all around

“Transform Coding of Image Difference Signals

US patent 3679821, filed April 1970 and issued July 1972

Motion vector estimation in television images”

US 4864393

Block transform and quantization for image and video coding”

US 6882685

“Method and apparatus for binarization and arithmetic coding of a data value”

US 6900748

71 of 136

Patents all around (joke)

72 of 136

Alliance for Open Media - AV1

VP10

Thor

Daala

73 of 136

Alliance for Open Media - AV1

  • Interoperable and open;
  • Optimized for the Internet;
  • Scalable to any modern device at any bandwidth;
  • Designed with a low computational footprint and optimized for hardware;
  • Capable of consistent, highest-quality, real-time video delivery; and
  • Flexible for both commercial and non-commercial content, including user-generated content.

74 of 136

Hybrid motion compensated CODEC

picture partitioning

predictions

transform

quantization

entropy coding

redundancy removal

entropy reduction

lossless compression

dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation

linear, logarithm

huffman, lzw ...

75 of 136

CODEC

picture partitioning

predictions

transform

quantization

entropy coding

redundancy removal

entropy reduction

lossless compression

dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation

linear, logarithm

huffman, lzw ...

76 of 136

Frame partitioning

slices

77 of 136

Fixed vs Variable block size

78 of 136

CODEC

picture partitioning

predictions

transform

quantization

entropy coding

redundancy removal

entropy reduction

lossless compression

dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation

linear, logarithm

huffman, lzw ...

79 of 136

Motion estimation Inter|Intra-prediction

I-frame

P-frame

P-frame

B-frame

I-frame

direction of the prediction

100

100

100

200

100

100

100

200

100

100

100

200

100

100

100

200

80 of 136

CODEC

picture partitioning

predictions

transform

quantization

entropy coding

redundancy removal

entropy reduction

lossless compression

dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation

linear, logarithm

huffman, lzw ...

81 of 136

Transform

Double [3] f(x): x + x => [ 6]

Plus10 [3] f(x): x + 10 => [ 13]

Divide2 [3] f(x): x / 2 => [1.5]

82 of 136

Transform (DCT)

83 of 136

https://www.iem.thm.de/telekom-labor/zinke/mk/mpeg2beg/whatisit.htm

84 of 136

Transform (DCT[123], DWT, KLT, FFT, lapped…)

85 of 136

CODEC

picture partitioning

predictions

transform

quantization

entropy coding

redundancy removal

entropy reduction

lossless compression

dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation

linear, logarithm

huffman, lzw ...

86 of 136

Quantization over DCT (uniform, linear, logarithm)

120

40

1

0

45

3

0

0

-5

0

0

1

0

0

-2

0

Qstep (10)

12

4

0

0

4

0

0

0

0

0

0

0

0

0

0

0

120

40

0

0

40

0

0

0

0

0

0

0

0

0

0

0

Qstep (10)

12

4

0

0

5

0

0

0

0

0

0

0

0

0

0

0

87 of 136

CODEC

picture partitioning

predictions

transform

quantization

entropy coding

redundancy removal

entropy reduction

lossless compression

dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation

linear, logarithm

huffman, lzw ...

88 of 136

Entropy coding quantized DCT

000010001110

010111101101

*CAVLC example

frequent symbols table

-1,1

trailing zeroes

...

zig-zag scan

(2D to 1D)

lossless compress

coded block

89 of 136

CODEC

picture partitioning

predictions

transform

quantization

entropy coding

redundancy removal

entropy reduction

lossless compression

dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation

linear, logarithm

huffman, lzw ...

90 of 136

Space needed to store 1h of video at 720p 30fps

with H264 (chroma subsampling, motion estimation, intra prediction, CABAC…)

with chroma subsampling YUV420

WIDTH * HEIGHT * BITS_PER_PIXEL * FPS

1280 * 720 * 0.031 * 30

857,088 (837Kb) bits per second

3,085,516,800 (367.82MB)

WIDTH * HEIGHT * BITS_PER_PIXEL * FPS

1280 * 720 * 12 * 30

331,776,000 (331.776Mb) bits per second

1,194,393,600,000 (139GB)

91 of 136

CODEC and patents

picture partitioning

predictions

transform

quantization

entropy coding

US 4864393

US 6882685

* US 6900748

*it seems to be expired by 2000s

92 of 136

Bitstream format

93 of 136

Bitstream format

94 of 136

Hybrid motion compensated encoding

95 of 136

Hybrid motion compensated decoding

96 of 136

H264 vs H265

97 of 136

HEVC @ 2Mbps

AVC @ 4Mbps

98 of 136

AVC @ 400kbps

HEVC @ 400kbps

99 of 136

100 of 136

Video streaming

101 of 136

General Video Delivery Architecture

ingest

origin

CDN (frontend, caching)

encoder

102 of 136

Content distribution

103 of 136

Progressive download

full_video.mp4

time

Range: bytes=0-299

HTTP 206

Range: bytes=300-499

Range: bytes=500-999

HTTP 206

HTTP 206

104 of 136

Adaptive bitrate streaming

manifest

time

2G

HTTP 200

2G

wifi

HTTP 200

HTTP 200

s480p_01.mp4

s480p_02.mp4

wifi

HTTP 200

s1080p_03.mp4

105 of 136

Adaptive bitrate streaming (hls)

source: https://www.encoding.com/http-live-streaming-hls/

106 of 136

Content protection

107 of 136

Token + CORS + TLS (https)

API

token=CAFE

video=3& cookie

CDN

video=3

HTTP 403

CDN

video=3&t=CAFE

HTTP 200

time

108 of 136

DRM (widevine, playready, fairplay)

new_video.mp4

encoding

DRM servers�Apple, M$, Google

CDN

dash_encrypted_new_video.mp4

109 of 136

Encoding parameters: the whys

110 of 136

CBR vs VBR

LIVE: CBR, the biggest problem is bandwidth, VBR might cause lots of rebufferings, latency is critical, small hiccups are acceptable.

VOD progressive download: “Constrained” VBR - min 50% of TARGET max 200%.

VOD adaptive streaming: “Constrained” VBR - min 85-100% of TARGET max 125-150%

111 of 136

112 of 136

Profiles for iOS-like >= High 3.1

113 of 136

Profiles for Android-like >= High 3.1

114 of 136

Frame types and GOP

P and B are lighter but they require search for frames back or forward.

Adjust keyframe (I) interval insertion to 2,3,4 or 5 seconds, otherwise in VBR you’re just wasting resources

Adjust your keyframes considering your chunk size, it should be a multiple of it. Ex: chunk of 6s therefore I-Frame each 1s or 2s

Turn off ‘keyframe scene detection”

Yes to B frame, “magic number” between 3-4

115 of 136

B-Frame magic

116 of 136

Bits per pixel

What is the resolution for 2.5Mbps?

Let’s try:

Height: 393, Width: 720 Pixels: 282960 Bitrate: 2500 FPS: 30

BitsPerPixel: Bitrate/(Pixels*FPS)

1500/282960*30 = 0.2650551315

117 of 136

lossless compression - entropy encoding

CABAC - more efficient, CPU intensive (battery and so on) [main, high profile]

CAVLC - less efficient, CPU less intensive

118 of 136

Apple TN2224

119 of 136

Bonus: audio codec

120 of 136

Analog audio conversion

121 of 136

Sampling (8,11, 32, 44.1, 48, 50, 88, 96, 192... kHz)

122 of 136

Bit depth (16, 24, 32, 64... bits) quantization

123 of 136

Channels 2

124 of 136

Channels 16.2

125 of 136

PCM encoder

126 of 136

AAC CODEC block

127 of 136

References

128 of 136

Links

129 of 136

Links

130 of 136

Links

131 of 136

Links

132 of 136

Links

133 of 136

Links

134 of 136

Links

135 of 136

Links

136 of 136

Links