1 of 105

Making video traffic a friendlier internet neighbor

Bruce Spang - April 5, 2023

2 of 105

Wow, I love legally downloading movies!!

(Bruce as a young computer scientist, c. 2007)

3 of 105

Wow, I love legally downloading movies!!

Hey Bruce, quit hogging all the bandwidth!

4 of 105

I’m not hogging the bandwidth! Everyone knows that TCP Reno fairly splits bandwidth among all competing flows [Chiu-Jain 1989]

5 of 105

6 of 105

That’s great and all, but the internet isn’t working

7 of 105

(I can just download the video during school tomorrow!)

8 of 105

There are things one application to be friendlier or less friendly to all its neighboring applications sharing the same networks

The internet is a shared resource

9 of 105

9

My thesis

Smoothing video traffic to make it friendlier

Running experiments in congested networks

Sizing router buffers

(not today)

There are things one application to be friendlier or less friendly to all its neighboring applications sharing the same networks

Has three parts, today we will discuss two:

10 of 105

Sammy: smoothing video traffic to be a friendly internet neighbor

Bruce Spang, Shravya Kunamalla, Renata Teixeira, Te-Yuan Huang, Grenville Armitage, Ramesh Johari, Nick McKeown

In submission to SIGCOMM 2023

11 of 105

11

76% in the Americas

69% in the Asia-Pacific region

65% in Europe, the Middle East, and Africa

(Source: Sandvine Global Internet Phenomena Report, January 2023)

Video traffic is most of the internet

12 of 105

12

Let’s watch a video

Video by Rob Dooley

13 of 105

13

Video traffic is bursty

Video by Rob Dooley

14 of 105

14

Video traffic is bursty

First observed by Rao et al. 2011

15 of 105

15

Video traffic is bursty

First observed by Rao et al. 2011

16 of 105

16

Video is streamed at two rates

Bitrate: the rate we watch video

17 of 105

17

Higher quality

Bitrate: ~1Mbps

Lower quality

Bitrate: 10kbps (100x smaller)

The more bits you have, the better the video looks

18 of 105

18

Video is streamed at two rates

Bitrate: the rate we watch video

Throughput: the rate video is being downloaded to our computer �(at a short timescale)

19 of 105

The two rates are chosen by two different algorithms

Throughput: Congestion control algorithms

  • Classic area of networking research, dates back to the 80s
  • Send as fast as possible without overloading the network

Bitrate: Adaptive bitrate (ABR) algorithms

20 of 105

If we download slower than we watch, things get worse

21 of 105

21

Quality of Experience (QoE) is measured by three main parts:

Play Delay

(How long it takes to start)

Rebuffers

(Interruptions)

Bitrate

(How good the video looks)

22 of 105

22

High bitrate

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

...

Mid bitrate

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

...

Low bitrate

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

...

Time

Adaptive bitrate (ABR) algorithms adapt bitrates to optimize QoE

23 of 105

23

High bitrate

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

...

Mid bitrate

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

...

Low bitrate

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

...

Time

ABR algorithms adapt bitrates to optimize QoE

Chunk 1

Chunk 4

Chunk 5

Chunk 3

Chunk 2

24 of 105

24

Resulting experience

...

Time

ABR algorithms adapt bitrates to optimize QoE

...

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

25 of 105

25

From Akamai whitepaper: rebuffering increases negative emotions (e.g. disgust, sadness) in lab settings

Experiments [Dobrian et al. 11], [Krishnan, Sitaraman 12]:

  • Each additional second of play delay decreases viewing by 5.8%
  • Rebuffering for 1% of video duration decreases viewing by 5%

Industry experiments: improving QoE increases customer retention

Good QoE matters for users and for streaming services

26 of 105

26

ABR algorithms exist for when throughput is low

But what about when throughput is higher than bitrate?

27 of 105

27

Videos are bursty when we download faster than we watch

Start

Time

Bitrate

Throughput

Start

Playback buffer

Time

Throughput is higher than bitrate, buffer grows

(on period)

Buffer is full, downloading stops

(off period)

28 of 105

28

Video is becoming more bursty

Throughputs�(avg. US internet speeds)

~200 Mbps

~5 Mbps

29 of 105

29

Video is becoming more bursty

Throughputs�(avg. US internet speeds)

Typical bitrates

~200 Mbps

~10 Mbps

Throughputs are ~20x higher than bitrates

~5 Mbps

30 of 105

30

Burstiness is bad.

31 of 105

31

Networks are congested if we send faster than capacity

Data queues at the slowest part of the network

What results is congestion:

  1. Delay: data takes longer to get through the network
  2. Loss: queues fill up, data is discarded and resent later
  3. Less available bandwidth for neighbors

32 of 105

32

Congestion control works by causing a little bit of congestion

Congestion control algorithms typically only know about:

  • How fast they are sending
  • Whether they are causing congestion

Most algorithms send faster until they cause some congestion, then slow down and repeat.

33 of 105

33

During on periods:

  • Higher queueing delay/packet loss for all traffic.
  • Less bandwidth for neighbors

Burstiness leads to congestion

Start

Time

Capacity

Bitrate

Throughput

Start

Playback buffer

Time

Start

“Congestion”

Time

34 of 105

34

Smoothing video throughput below capacity reduces congestion

Time

Time

Capacity

Throughput

“Congestion”

Bitrate

Time

Time

Capacity

Throughput

“Congestion”

Bitrate

Time

Time

Capacity

Bitrate

Throughput

“Congestion”

[Ghobadi et al 2012, Satoda et al 2012, Mansy et al 2013, Akshabi et al. 2013, Bentaleb et al. 2021, etc…]

35 of 105

35

The challenge: making sure video still works well

36 of 105

36

Smoothing below bitrate makes QoE worse

Play Delay

(How long it takes to start)

Rebuffers

(Interruptions)

Bitrate

(How good the video looks)

37 of 105

37

A simple ABR algorithm:

  1. Keep an average of chunk throughputs
  2. Pick the highest bitrate < 0.5*average throughput

Smoothing above bitrate can make ABR algorithms worse

...

Time

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

38 of 105

38

Next bitrate

Measured throughput

A simple ABR algorithm:

  • Keep an average of chunk throughputs
  • Pick the highest bitrate < 0.5*average throughput

Say we reduce chunk throughput to 1.5x bitrate, then we switch down:

Smoothing can make ABR algorithms perform worse

Time

Next bitrate

Measured throughput

Next bitrate (0.5 x throughput = 0.5 x 15 Mbps = 7.5 Mbps)

Measured throughput (1.5x bitrate = 15 Mbps)

Next bitrate

Measured throughput

Bitrate (10 Mbps)

39 of 105

39

Observation: video traffic can be smoother without sacrificing QoE

Video QoE is the same ✅

  • Start play delay is the same
  • No rebuffers
  • Bitrate is the same

Congestion is reduced ✅

Bursty�(Video traffic today)

Time

Start

Time

Capacity

Bitrate

Throughput

Smoother

Start

Playback buffer

Time

Start

Time

Capacity

Throughput

Start

Playback buffer

Bitrate

40 of 105

40

In this paper, we…

  1. Introduce a mechanism to let ABR algorithms smooth video traffic
  2. Design an algorithm called Sammy that picks bitrates and smooths video traffic
  3. Implement and experiment with our algorithm at Netflix, where it reduces chunk throughput while slightly improving video QoE.

41 of 105

41

How to smooth traffic: application-informed pacing

“Give me the chunk”

Without pacing: server sends data as fast as congestion control allows

“Give me the chunk, no faster than 1 Mbps”

With pacing: server sends data no faster than the requested rate

42 of 105

42

Application-informed pacing is based on TCP pacing

TCP Pacing: adds delay between packets to reduce bursts

  • Old idea [Hoe 1995], now widely used
  • Rate is picked by congestion control algorithms
  • Goal is to keep sending rate above capacity

Application-informed pacing:

  • Rate is picked by applications, ideally below capacity
  • Challenge is picking the rate
  • Very deployable: supported by Linux and major CDNs today.

43 of 105

43

Some example algorithms:

  • Pick the highest bitrate < 0.5*throughput
  • Estimate throughput, run some simulations, pick something that gives good qoe

Approach seems to require precise throughput estimates

Typical ABR algorithms estimate available bandwidth

44 of 105

44

No longer measuring available bandwidths

With pacing, ABR algorithms measure pace rates

45 of 105

45

Say the bitrates options are 0.5 Mbps, 1 Mbps.

Do you need to know the exact throughput of the network?

  • > 100 Mbps – 1 second downloads in < 10ms

Probably not, just pick 1 Mbps.

Our idea: ABR algorithms often don’t need precise throughput estimates

46 of 105

46

Decision problem: Is the throughput high enough to pick a bitrate, or not?

ABR algorithms implicitly solve a decision problem

47 of 105

47

Consequence: to make the same decision, we should pace above this threshold

Decision problem: Is the throughput high enough to pick a bitrate, or not?

ABR algorithms implicitly solve a decision problem

Estimation version

Decision version

Pick highest bitrate ≤ 0.5*average throughput

Is average throughput ≥ 2*bitrate

48 of 105

48

Sammy overview

49 of 105

49

Initial phase:

  • Balance initial quality and start play delay
  • Order of a few seconds

Two different algorithms for different QoE goals

Playing phase:

  • Balance quality and rebuffers
  • Order of minutes to hours

50 of 105

50

Do not pace:

  • At most a few seconds (out of an hour session), not a large part of traffic

Need to make initial bitrate selection:

  • Use historical throughput estimate for first bitrate selection
  • Do not use estimates from playing phases, which are lower because of pacing

Initial phase

51 of 105

51

Pick a high enough pace rate for ABR algorithm to pick highest quality.

Pace higher when buffer is empty, lower when buffer is full.

Playing phase

Pace rate

Buffer size

Min. throughput required by ABR algorithm

~2.5x bitrate

~3.5x bitrate

52 of 105

52

Lab experiments

53 of 105

53

  • Run a standard Netflix session (control), a session with Sammy, compare results
  • 40 Mbps link, 3BDP queue size, 10ms RTT, TCP Reno

Lab setup

54 of 105

54

Sammy reduces throughput and delay in the lab

Neighboring traffic

Neighboring traffic

Neighboring traffic

Neighboring traffic

55 of 105

55

Sammy is friendlier to neighboring traffic in the lab

Improves UDP delay

Improves TCP throughput

Improves HTTP response time

Improves video play delay

56 of 105

56

  • Start a Netflix video
  • Switch to Zoom, watch a video streamed on Zoom
  • Network queue is very, very large (~10s)
  • Without Sammy we get occasional bursts of 1s delay
    • Snail is going to move jerkily
  • Sammy avoids large queue increases
    • Snail is going to move smoothly

Lab experiment with Zoom

57 of 105

57

58 of 105

58

Production experiments

59 of 105

59

Ran experiments with Netflix traffic

Netflix is the largest application by volume (about 10% of internet traffic)

  • Lots of data
  • If we can make it friendlier, this is a good thing

60 of 105

60

  1. Randomly assign traffic to treatment/control�(users, sessions, servers, etc…)

We ran A/B tests to measure Sammy’s performance

🙂

🙂

🙂

🙂

🙂

🙂

🙂

🙂

Control

Treatment:

New algorithm (Sammy)

Better QoE

Good QoE

“Sammy improves performance!”

  1. Collect data
  • Compare outcomes

61 of 105

61

Used large experiments to tune parameters

  • Allocated a small fraction of Netflix traffic (< 1%)
  • Ran for about two weeks
  • Tried ~100 combinations of parameters
  • Total of roughly 11,500 years of video watched across all experiments

62 of 105

62

Sammy reduces congestion-related metrics

Metric

Results

Chunk Throughput

62% lower

Retransmissions

50% lower

Round-trip times

19% lower

63 of 105

63

Sammy slightly improves video QoE metrics

Metric

Results

Video quality (VMAF)

0.03% higher

Initial video quality (VMAF)

0.09% higher

Play delay

1% lower

Rebuffers

Did not change

64 of 105

64

Future work

Measure the impact of Sammy on other internet traffic

  • Look at impact on non-Netflix traffic (e.g. Zoom)
  • But how can we do this?

65 of 105

Unbiased Experiments in Congested Networks

Bruce Spang, Veronica Hannan, Shravya Kunamalla, Te-Yuan Huang, Nick McKeown, Ramesh Johari

Presented at Internet Measurement Conference 2021

Awarded the 2022 IRTF Applied Networking Research Prize

66 of 105

66

  • Randomly assign traffic to treatment/control�(users, sessions, servers, etc…)

What is an A/B test?

🙂

🙂

🙂

🙂

🙂

🙂

🙂

🙂

Control

Treatment:

New algorithm

Better performance

Good performance

“Algorithm improves performance!”

  • Collect data
  • Compare outcomes

67 of 105

67

We use A/B tests to see if an algorithm works in practice

68 of 105

68

We make decisions about deploying algorithms based on small A/B tests:

“This algorithm improves performance by 10%”

A/B tests are used to generalize

This is called interference

This assumes that the outcome of one unit does not depend on other units

69 of 105

69

If the treatment algorithm uses more bandwidth or increases queueing delay, this impacts control traffic sharing the same network

Interference exists in congested networks

Wow, this new algorithm works great!!

Hey Bruce, quit hogging all the bandwidth!

Treatment algorithm

70 of 105

70

Interference can make A/B tests extremely misleading

We ran an experiment which demonstrates this.

71 of 105

71

In response to COVID-19, streaming services reduced their internet traffic by 25% by capping bitrates.

By reducing traffic by 25% the hope was to reduce internet congestion.

Treatment: capping bitrate to reduce traffic

72 of 105

72

When capping bitrates in this A/B test,

Round Trip Time got 5-15% worse

Chunk Throughput got 5% lower

Retransmits got 10% worse

Bitrate got 35% worse

Rebuffers, play delay did not change

Does sending less data make congestion worse??

73 of 105

73

A/B tests results do not reveal what happens when we cap traffic

What could A/B tests look like with bitrate capping?

Originally:

Network is congested

Control

Congested Network

Capping causes:

  • Less bandwidth used
  • Less congestion

With Capping:

Network is not congested

Capped

Uncongested network

One possibility:

Bitrate capping reduces congestion

A/B test results:

Capped uses less bandwidth

Level of congestion is the same (no congestion)

Capped

Control

Uncongested Network

Another possibility:

Control traffic increases, network stays congested

A/B test results:

Capped uses less bandwidth�

Level of congestion is the same (some congestion)

Capped

Control

Congested Network

🤔

74 of 105

74

Found two reliably congested networks

  • Similar traffic, servers
  • Connected to the same internet provider (congested peering link)
  • No pre-experiment difference

Run an A/B tests on each network and compare:

  • Network 1: 95% capped, 5% uncapped
  • Network 2: 5% capped, 95% uncapped

Comparing A/B tests with a pair of congested links

75 of 105

75

A/B tests results are misleading

Metric

Our Experiment

A/B Test

Round Trip Time

25% better

5-15% worse

Chunk Throughput

12% better

5% worse

Play Delay

10% better

Did not change

and more in the paper…

76 of 105

76

This is concerning.

A/B tests are biased when run in congested networks

77 of 105

77

A cautionary tale

In 2016, Google released the congestion control algorithm BBR to much fanfare

  • “Congestion-based congestion control”
  • “Sidestepping impossibility results”

Advertised less congestion, higher throughput

Validated with A/B tests:

  • Throughput is much higher than Cubic (2-25x higher, 133x higher in one setting)
  • Reduces median RTT by 53%-80%

78 of 105

78

BBR was not fair [Hock et al. 2017]

When competing with 16 Cubic algorithms (standard at the time), one BBR algorithm got 40% of network bandwidth

79 of 105

79

This is concerning.

We can run experiments that remove bias

A/B tests are biased when run in congested networks

80 of 105

80

Experiment designs that would have avoided interference:

Event study

Switch to treatment and compare before/after

See paper for more information

Thu.�(Control)

Fri.

(Control)

Sat.

(Treatment)

Sun.

(Treatment)

Switchback

Switch back and forth between treatment/control

Thu.�(Control)

Fri.

(Treatment)

Sat.

(Control)

Sun.

(Treatment)

81 of 105

81

Lots more to be done!

Lots of possible future work on experiments:

  • Designing experiments
  • Testing new/old algorithms

We could look at how Sammy impacts other traffic on the internet:

  • Run an event study at Netflix, look at the impact on Zoom, YouTube, etc…

82 of 105

Conclusions

83 of 105

83

There are things one application to be friendlier or less friendly to all its neighboring applications sharing the same networks

Has three parts, today we will discuss two:

My thesis

Running experiments in congested networks

Sizing router buffers

(not today)

Smoothing video traffic to make it friendlier

84 of 105

84

We’ve seen two ways of reducing congestion

  1. Smoothing video traffic
  2. Capping video bitrates

These are pretty unusual for congestion control research…

85 of 105

85

What are we trying to do here?

Send data over the internet.

86 of 105

86

What is the congestion control problem?

At the core of the internet is a thorny challenge: �“how the internet’s resources are best allocated to all the competing interests trying to use it”

87 of 105

87

The best allocation maximizes throughput

“E.g., you could have been sharing the path with someone else and converged to a window that gives you each half the available bandwidth. If she shuts down, 50% of the bandwidth will be wasted unless your window size is increased” [Jacobson, Karels 1988]

“Assume that the utility U(x) is an increasing, strictly concave and continuously differentiable function of x” [Kelly 1997]

“Clearly, we want as much throughput and as little delay as possible” [Peterson, Brakmo, Davie 2022]

88 of 105

88

Congestion control algorithms compete for scarce bandwidth in a zero-sum game

A congestion control algorithm (like BBR) is unfair if it uses more bandwidth than other algorithms.

Being unfair is bad: using more throughput means I am harming my competitors

Congestion control algorithms are competing

89 of 105

89

What are we trying to do here?

Send data over the internet.

90 of 105

90

Send data over the internet.

What are we trying to do here?

Watch video (75% of internet traffic), buy things (6%), play games (6%), talk to friends (5%), etc…

91 of 105

91

For video, “good performance” means good QoE.

Our work starts from video traffic

Play Delay

(How long it takes to start)

Rebuffers

(Interruptions)

Bitrate

(How good the video looks)

92 of 105

92

Our work breaks typical congestion control assumptions

Lower throughput: video does not need maximal throughput

Not zero sum: when video gives throughput to neighbors, QoE is not worse.

Shifting the mindset from competition to friendliness

93 of 105

93

Being a friendly neighbor is good for everyone

Non-video neighbors get better with friendly video traffic:

  • Lab experiments with generic UDP/TCP
  • Web browsing
  • Zoom

94 of 105

94

  1. Video QoE doesn’t get worse (there is no cost to friendliness)
  2. Neighboring video sessions from the same service get better

Streaming services are incentivized to be friendly

95 of 105

95

We can make other applications friendlier: Zoom, web browsing, etc…

Make video traffic friendlier

  • In slower networks, mobile networks, etc…
  • Design one algorithm for video traffic that does congestion control, picks bitrates, and is friendly

Let’s make internet traffic friendlier!

The future is friendlier

96 of 105

Acknowledgements

97 of 105

97

Committee

Peter

Keith

Renata

Ramesh

Nick

98 of 105

98

Collaborators at Netflix

99 of 105

99

Collaborators at Stanford

100 of 105

100

Friends

101 of 105

101

Friends, cont.

102 of 105

102

Friends, cont., cont.

103 of 105

103

Family

104 of 105

104

Up next

Closed session

  • Gates 498 (around the corner)
  • 1-2 hours

Reception

  • Fujitsu (here)
  • Food + drinks
  • Starting somewhere between 5-6pm, depending on closed session

105 of 105

105

We can make other applications friendlier: Zoom, web browsing, etc…

Make video traffic friendlier

  • In slower networks, mobile networks, etc…
  • Design one algorithm for video traffic that does congestion control, picks bitrates, and is friendly

Let’s make internet traffic friendlier!

The future is friendlier

Questions?