Introduction to digital video technology
Feedbacks
Please take notes about possible improvements.
Feedback are welcome, things like: better naming, shorten Y content, expand X content and etc.
Don’t let the others suffer as you will.
Basic Terminology
What is an image?
100
0
100
100
100
0
0
7
0
7
100
100
0
0
2
0
2
2
100
6
100
6
0
0
2
2
0
0
6
6
R
G
B
2D
3D
color intensity
What is picture element (pixel) ?
R
G
B
100
0
100
100
100
0
0
7
0
7
100
100
0
0
2
0
2
2
100
6
100
6
0
0
2
2
0
0
6
6
What is bit (color) depth?
8R+8G+8B = 24 bits
*it gives you 2^24 different colors
R
G
B
0-255
the range
100
0
100
100
100
0
0
7
0
7
100
100
0
0
2
0
2
2
100
6
100
6
0
0
2
2
0
0
6
6
Color depth
24 bpp
10 bpp
8 bpp
Color depth
All colors from RGB
X
All colors from RGB
https://lumeniquessl.com/2012/03/01/12-in-12-for-2012-the-flicker-indicator-machine/
All colors from RGB
https://lightingstudio.wordpress.com/2012/03/27/week5-light-object-shadow-contrast/
What is resolution?
4
4
width
height
What is display aspect ratio (DAR)?
16:9 (1.7777777778) 4:3 (1.3333333333)
1280/720 (1.7777777778) 1024/768 (1.3333333333)
What is pixel aspect ratio (PAR)?
PAR 1:1
PAR 2:1
DVD display aspect 4:3, pixel aspect: 10:11
Source https://xiph.org/video/vid1.shtml
What is a video?
4D
time
30 frames per sec (FPS)
framerate
a single frame
Interlaced | progressive
What are 480p, 1080i, 1080p formats?
[number][letter]
number is the resolution's height and letter: p means progressive and i means interlaced.
What is bitrate?
30 FPS
WIDTH * HEIGHT * BITS_PER_PIXEL * FPS
4 * 4 * 24 * 30
11,520 bits per second
Constant bitrate (CBR)?
1.2Mbps
time
Variable bitrate (VBR)?
1.2Mbps
time
2.4Mbps
200Kbps
Average bitrate (ABR)?
1.2Mbps
time
2.4Mbps
200Kbps
min
max
400Kbps
1.8Mbps
can be seen as constrained VBR
Space needed to store 1h of video at 720p 30fps
* without any compression technique at all.
WIDTH * HEIGHT * BITS_PER_PIXEL * FPS
1280 * 720 * 24 * 30
663,552,000 (663.552Mb) bits per second
2,388,787,200,000 (278GB)
Review
image, pixel, bit depth, resolution, display aspect ratio, pixel aspect ratio, video, frame rate, interlaced, progressive, bitrate, CBR, VBR, ABR
From the world to the bits
How images are captured? CCD Sensor
How images are captured? CMOS Sensor (APS)
Use less power
Transmit data faster than CCD
Cheaper
Most commonly in cell phone cameras, web cameras
How images are captured?
Color filter: Bayer array
Filters 3 primaries colors
Sensor
Bayer Demosaicing
Bayer Demosaicing
Bayer Demosaicing
Redundancy Removal
What can we do?
compress repetitions within the frame
exploit our vision
reduce repetitions in time
Exploiting our vision
Colors models
Colors
Our eyes - an oversimplification
Our eyes - an oversimplification
We better to see luma than color
Color space YUV (YCbCr, YPbPr)
Y (luma)
U (chroma blue)
V (chroma red)
From RGB to YCbCr
Y = 0.299R + 0.587G + 0.114B
Cb = 0.564(B - Y) | Cr = 0.713(R - Y)
From YCbCr to RGB
R = Y + 1.402Cr | B = Y + 1.772Cb | G = Y - 0.344Cb - 0.714Cr
*ITU-R BT.601-7
From RGB to YCbCr
It depends on the recommendation / standards from groups.
SDTV with BT.601 (Rec. 601) | Y=0.299R+0.587G+0.114B | U=0.492(B-Y) | V=0.877(R-Y) |
HDTV with BT.709 (Rec. 709) | Y=0.2126R+0.7152G+0.0722B | ... | ... |
UHDTV with BT.2020 (Rec. 2020) | Y=0.2627R+0.6780G+0.0593B | ... | ... |
Standards | Y (Luma) | Chroma B | Chroma R |
groups? ISO/IEC, ITU-R, JVT/JCT, AOM, MPEG-LA
recommendations? Rec. 601, Rec. 709, Rec. 2020
Recommendation | Resolutions | Frame Rate | Bit Depth | Chroma Sub |
BT.601 (SDTV) | 525i 625i | 50 60 | 8 | YCrCb 4:4:2 |
BT.709 (HDTV) | 1080p 1080i | 50 60 30 24 | 8 10 | * YCrCb 4:4:2 |
BT.2020 (UHDTV) | 7680p 3840p | 120, 100, 60, 50, 30, 24 | 10 12 | 4:4:4, 4:2:2, and 4:2:0 |
Rec. 601
Rec. 709
Rec. 2020
Chroma subsampling YUV 4:4:4 4:2:2 4:2:0
Y (luma)
U (chroma blue)
V (chroma red)
Chroma subsampling YUV 4:4:4 4:2:2 4:2:0
Y (luma)
U
V
Chroma subsampling YUV 4:4:4 4:2:2 4:2:0
Chroma subsampling 4:2:0
Chroma subsampling YUV420
1280
720
180
320
Chroma subsampling
Space needed to store 1h of video at 720p 30fps
with chroma subsampling YUV420
WIDTH * HEIGHT * BITS_PER_PIXEL * FPS
1280 * 720 * 24 * 30
663,552,000 (663.552Mb) bits per second
2,388,787,200,000 (278GB)
WIDTH * HEIGHT * BITS_PER_PIXEL * FPS
1280 * 720 * 12 * 30
331,776,000 (331.776Mb) bits per second
1,194,393,600,000 (139GB)
Correlations in time
Frame types
Temporal redundancy (inter prediction)
original frames
I-frame
P-frame
P-frame
P-frame
I-frame
F1
F0
F2
F3
F4
encoded frames
Temporal redundancy
original frames
|||||||||| (103Kb)
||| (2Kb)
||| (2Kb)
||| (2Kb)
|||||||||| (103Kb)
F1
F0
F2
F3
F4
Temporal redundancy with motion estimation
I-frame
P-frame
F1
F0
motion estimation (motion vector) applied to previous frame = predicted frame
predicted frame - real frame nth = residual (prediction error)
Temporal redundancy (B frames)
original frames
I-frame
P-frame
P-frame
B-frame
I-frame
F1
F0
F2
F3
F4
Correlations in space
Lots of similarities
Lots of similarities
Spatial redundancy (intra prediction)
100 | 100 | 100 | 200 |
100 | ??? | ??? | ??? |
100 | ??? | ??? | ??? |
100 | ??? | ??? | ??? |
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
100 | 100 | 120 | 210 |
100 | 100 | 100 | 200 |
100 | 0 | 0 | 0 |
100 | 0 | 0 | 0 |
100 | 0 | 20 | 10 |
unknown values
direction of the prediction
real values
difference
highly compressible
Spatial redundancy (intra prediction) H264
CODEC - enCOder / DECoder
CODEC
“A codec is a device or computer program for encoding or decoding a digital data stream or signal.”
CODEC (VP9, H265) vs Container (.WEBM,.MP4)
Source https://xiph.org/video/vid1.shtml
Container vs CODEC
Containers
CODEC
History
Patents all around
“Transform Coding of Image Difference Signals”
US patent 3679821, filed April 1970 and issued July 1972
“Motion vector estimation in television images”
US 4864393
“Block transform and quantization for image and video coding”
US 6882685
“Method and apparatus for binarization and arithmetic coding of a data value”
US 6900748
Patents all around (joke)
Alliance for Open Media - AV1
VP10
Thor
Daala
Alliance for Open Media - AV1
Hybrid motion compensated CODEC
picture partitioning
predictions
transform
quantization
entropy coding
redundancy removal
entropy reduction
lossless compression
dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation
linear, logarithm
huffman, lzw ...
CODEC
picture partitioning
predictions
transform
quantization
entropy coding
redundancy removal
entropy reduction
lossless compression
dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation
linear, logarithm
huffman, lzw ...
Frame partitioning
slices
Fixed vs Variable block size
CODEC
picture partitioning
predictions
transform
quantization
entropy coding
redundancy removal
entropy reduction
lossless compression
dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation
linear, logarithm
huffman, lzw ...
Motion estimation Inter|Intra-prediction
I-frame
P-frame
P-frame
B-frame
I-frame
direction of the prediction
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
100 | 100 | 100 | 200 |
CODEC
picture partitioning
predictions
transform
quantization
entropy coding
redundancy removal
entropy reduction
lossless compression
dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation
linear, logarithm
huffman, lzw ...
Transform
Double [3] f(x): x + x => [ 6]
Plus10 [3] f(x): x + 10 => [ 13]
Divide2 [3] f(x): x / 2 => [1.5]
Transform (DCT)
https://www.iem.thm.de/telekom-labor/zinke/mk/mpeg2beg/whatisit.htm
Transform (DCT[123], DWT, KLT, FFT, lapped…)
CODEC
picture partitioning
predictions
transform
quantization
entropy coding
redundancy removal
entropy reduction
lossless compression
dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation
linear, logarithm
huffman, lzw ...
Quantization over DCT (uniform, linear, logarithm)
120 | 40 | 1 | 0 |
45 | 3 | 0 | 0 |
-5 | 0 | 0 | 1 |
0 | 0 | -2 | 0 |
Qstep (10)
12 | 4 | 0 | 0 |
4 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
120 | 40 | 0 | 0 |
40 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
Qstep (10)
12 | 4 | 0 | 0 |
5 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
CODEC
picture partitioning
predictions
transform
quantization
entropy coding
redundancy removal
entropy reduction
lossless compression
dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation
linear, logarithm
huffman, lzw ...
Entropy coding quantized DCT
000010001110
010111101101
*CAVLC example
frequent symbols table
-1,1
trailing zeroes
...
zig-zag scan
(2D to 1D)
lossless compress
coded block
CODEC
picture partitioning
predictions
transform
quantization
entropy coding
redundancy removal
entropy reduction
lossless compression
dct, dwt, intra-prediction, inter-prediction, motion estimation / compensation
linear, logarithm
huffman, lzw ...
Space needed to store 1h of video at 720p 30fps
with H264 (chroma subsampling, motion estimation, intra prediction, CABAC…)
with chroma subsampling YUV420
WIDTH * HEIGHT * BITS_PER_PIXEL * FPS
1280 * 720 * 0.031 * 30
857,088 (837Kb) bits per second
3,085,516,800 (367.82MB)
WIDTH * HEIGHT * BITS_PER_PIXEL * FPS
1280 * 720 * 12 * 30
331,776,000 (331.776Mb) bits per second
1,194,393,600,000 (139GB)
CODEC and patents
picture partitioning
predictions
transform
quantization
entropy coding
US 4864393
US 6882685
* US 6900748
*it seems to be expired by 2000s
Bitstream format
Bitstream format
Hybrid motion compensated encoding
Hybrid motion compensated decoding
H264 vs H265
HEVC @ 2Mbps
AVC @ 4Mbps
AVC @ 400kbps
HEVC @ 400kbps
Video streaming
General Video Delivery Architecture
ingest
origin
CDN (frontend, caching)
encoder
Content distribution
Progressive download
full_video.mp4
time
Range: bytes=0-299
HTTP 206
Range: bytes=300-499
Range: bytes=500-999
HTTP 206
HTTP 206
Adaptive bitrate streaming
manifest
time
2G
HTTP 200
2G
wifi
HTTP 200
HTTP 200
s480p_01.mp4
s480p_02.mp4
wifi
HTTP 200
s1080p_03.mp4
Adaptive bitrate streaming (hls)
source: https://www.encoding.com/http-live-streaming-hls/
Content protection
Token + CORS + TLS (https)
API
token=CAFE
video=3& cookie
CDN
video=3
HTTP 403
CDN
video=3&t=CAFE
HTTP 200
time
DRM (widevine, playready, fairplay)
new_video.mp4
encoding
DRM servers�Apple, M$, Google
CDN
dash_encrypted_new_video.mp4
Encoding parameters: the whys
CBR vs VBR
LIVE: CBR, the biggest problem is bandwidth, VBR might cause lots of rebufferings, latency is critical, small hiccups are acceptable.
VOD progressive download: “Constrained” VBR - min 50% of TARGET max 200%.
VOD adaptive streaming: “Constrained” VBR - min 85-100% of TARGET max 125-150%
Profiles for iOS-like >= High 3.1
Profiles for Android-like >= High 3.1
Frame types and GOP
P and B are lighter but they require search for frames back or forward.
Adjust keyframe (I) interval insertion to 2,3,4 or 5 seconds, otherwise in VBR you’re just wasting resources
Adjust your keyframes considering your chunk size, it should be a multiple of it. Ex: chunk of 6s therefore I-Frame each 1s or 2s
Turn off ‘keyframe scene detection”
Yes to B frame, “magic number” between 3-4
B-Frame magic
Bits per pixel
What is the resolution for 2.5Mbps?
Let’s try:
Height: 393, Width: 720 Pixels: 282960 Bitrate: 2500 FPS: 30
BitsPerPixel: Bitrate/(Pixels*FPS)
1500/282960*30 = 0.2650551315
lossless compression - entropy encoding
CABAC - more efficient, CPU intensive (battery and so on) [main, high profile]
CAVLC - less efficient, CPU less intensive
Apple TN2224
Bonus: audio codec
Analog audio conversion
Sampling (8,11, 32, 44.1, 48, 50, 88, 96, 192... kHz)
Bit depth (16, 24, 32, 64... bits) quantization
Channels 2
Channels 16.2
PCM encoder
AAC CODEC block
References
Links
Links
Links
Links
Links
Links
Links
Links