1 of 25

CS 176C

Advanced Topics in Internet Computing

Arpit Gupta

04/23/20

Computer Science

Office/Department/Division Name

2 of 25

Learning Objectives

Voice over IP

Understand the characteristics and basic requirements of sending voice data over IP
Learn about different strategies to determine the playout delays at the receiver
Understand various challenges Skype (and similar apps) address for real-world operations

Real-time Transport protocols

Learn about RTP (header formats, QoS support, etc.)
Learn about RTCP (synchronization, bandwidth scaling etc.)

3 of 25

Voice over IP

4 of 25

Voice-over-IP (VoIP)

VoIP end-end-delay requirement: needed to maintain “conversational” aspect

higher delays noticeable, impair interactivity
< 150 msec: good
> 400 msec bad
What factors contribute to end-2-end delay?

Processing time (analog-2-digital)
Delay at routers (network devices)
Transmission delay
Propagation delay

session initialization: how does callee advertise IP address, port number, encoding algorithms?

5 of 25

VoIP characteristics

speaker’s audio: alternating talk spurts, silent periods.

64 kbps during talk spurt
pkts generated only during talk spurts
20 msec chunks at 8 Kbytes/sec: 160 bytes of data

application-layer header added to each chunk
chunk+header encapsulated into UDP or TCP segment
application sends segment into socket every 20 msec during talkspurt

6 of 25

VoIP: packet loss, delay

network loss: IP datagram lost due to network congestion (router buffer overflow)
delay loss: IP datagram arrives too late for playout at receiver

delays: processing, queueing in network; end-system (sender, receiver) delays
typical maximum tolerable delay: 400 ms

loss tolerance: depending on voice encoding, loss concealment, packet loss rates between 1% and 20% can be tolerated

Why do we not use TCP for VoIP traffic?

Loss could be eliminated by sending the packets over TCP (which provides for reliable data transfer) rather than over UDP. However, retransmission mechanisms are often considered unacceptable for conversational real-time audio applications such as VoIP, because they increase end-to-end delay [Bolot 1996]. Furthermore, due to TCP congestion control, packet loss may result in a reduction of the TCP sender’s transmission rate to a rate that is lower than the receiver’s drain rate, possibly leading to buffer starvation. This can have a severe impact on voice intelligibility at the receiver. For these reasons, most existing VoIP applications run over UDP by default. [Baset 2006] reports that UDP is used by Skype unless a user is behind a NAT or firewall that blocks UDP segments (in which case TCP is used).

7 of 25

Packet Jitter

What is packet jitter?

Deviation from periodicity

Example

Sender sends chunk every 20 ms
Can jitter at receiver be greater than 20 ms? How?
Can it be smaller than 20 ms? How?

Handling jitter

Do we need to remove jitter at receiver?
How can we remove jitter?

Add Sequence number + Timestamps
Delay playout at receiver. How?

8 of 25

Fixed playout delay

receiver attempts to playout each chunk exactly q msecs after chunk was generated.

chunk has time stamp t: play out chunk at t+q
chunk arrives after t+q: data arrives too late for playout: data “lost”

How to choose q:

large q: less packet loss
small q: better interactive experience

9 of 25

VoIP: fixed playout delay

sender generates packets every 20 msec during talk spurt.
first packet received at time r
first playout schedule: begins at p (q = p-t)
second playout schedule: begins at p’ (q = p’ - t)

10 of 25

Adaptive playout delay (1)

goal: low playout delay, low late loss rate
approach: adaptive playout delay adjustment:

estimate network delay, adjust playout delay at beginning of each talk spurt
silent periods compressed and elongated
chunks still played out every 20 msec during talk spurt

adaptively estimate packet delay: (EWMA - exponentially weighted moving average, recall TCP RTT estimate):

d_i = (1−α)d_i-1+ α (r_i – t_i)

delay estimate after ith packet

small constant, e.g. 0.1

time received -

time sent (timestamp)

measured delay of ith packet

11 of 25

Adaptive playout delay (2)

estimates d_i, v_i calculated for every received packet, but used only at start of talk spurt

for first packet in talk spurt, playout time is:

remaining packets in talkspurt are played out periodically

also useful to estimate average deviation of delay, v_i:

v_i = (1−β)v_i-1+ β |r_i – t_i – d_i|

playout-time_i = t_i+ d_i + Kv_i

12 of 25

Adaptive playout delay (3)

Q: How does receiver determine whether packet is first in a talkspurt?

if no loss, receiver looks at successive timestamps

difference of successive stamps (IATs) > 20 msec -->talk spurt begins.

with loss possible, receiver must look at both time stamps and sequence numbers

difference of successive stamps > 20 msec and sequence numbers without gaps --> talk spurt begins.

13 of 25

Real-time Transport Protocol

14 of 25

Real-Time Transport Protocol (RTP)

RTP specifies packet structure for packets carrying audio, video data
RFC 3550
RTP packet provides

payload type identification
packet sequence numbering
time stamping

RTP runs in end systems
RTP packets encapsulated in UDP segments
interoperability: if two VoIP applications run RTP, they may be able to work together

Multimedia Networking

9-14

15 of 25

RTP runs on top of UDP

Multimedia Networking

9-15

RTP libraries provide transport-layer interface

that extends UDP:

port numbers, IP addresses
payload type identification
packet sequence numbering
time-stamping

16 of 25

RTP example

example: sending 64 kbps PCM-encoded voice over RTP

application collects encoded data in chunks, e.g., every 20 msec = 160 bytes in a chunk
audio chunk + RTP header form RTP packet, which is encapsulated in UDP segment

RTP header indicates type of audio encoding in each packet

sender can change encoding during conference

RTP header also contains sequence numbers, timestamps

Multimedia Networking

9-16

17 of 25

RTP and QoS

RTP does not provide any mechanism to ensure timely data delivery or other QoS guarantees
RTP encapsulation only seen at end systems (not by intermediate routers)

routers provide best-effort service, making no special effort to ensure that RTP packets arrive at destination in timely manner

Multimedia Networking

9-17

18 of 25

RTP header

Multimedia Networking

9-18

payload type (7 bits): indicates type of encoding currently being �used. If sender changes encoding during call, sender

informs receiver via payload type field

Payload type 0: PCM mu-law, 64 kbps

Payload type 3: GSM, 13 kbps

Payload type 7: LPC, 2.4 kbps

Payload type 26: Motion JPEG

Payload type 31: H.261

Payload type 33: MPEG2 video

sequence # (16 bits): increment by one for each RTP packet sent

detect packet loss, restore packet sequence

payload type

sequence number type

time stamp

Synchronization

Source ID

Miscellaneous fields

19 of 25

RTP header

timestamp field (32 bits long): sampling instant of first byte in this RTP data packet

for audio, timestamp clock increments by one for each sampling period (e.g., each 125 usecs for 8 KHz sampling clock)
if application generates chunks of 160 encoded samples, timestamp increases by 160 for each RTP packet when source is active. Timestamp clock continues to increase at constant rate when source is inactive.�

SSRC field (32 bits long): identifies source of RTP stream. Each stream in RTP session has distinct SSRC

Multimedia Networking

9-19

payload type

sequence number type

time stamp

Synchronization

Source ID

Miscellaneous fields

In the Real-time Transport Protocol (RTP), the timestamp field in the RTP header is a 32-bit value that plays a crucial role in synchronizing audio or video streams. This field represents the sampling instant of the first byte of the RTP data packet. It is typically represented in units of the sampling clock frequency, which is often based on the media's sampling rate (e.g., for audio, it could be 8 kHz, 16 kHz, 44.1 kHz, etc.).

The timestamp allows receivers to reconstruct the timing of media packets relative to each other and synchronize their playback. By comparing the timestamp of received packets, receivers can determine the relative timing of when packets were sent and adjust their playback accordingly to maintain smooth audio or video playback.

The timestamp field is essential for maintaining synchronization in real-time communication applications, such as VoIP (Voice over Internet Protocol) and video conferencing, where maintaining timing accuracy is crucial for a seamless user experience.

20 of 25

Real-Time Control Protocol (RTCP)

works in conjunction with RTP
each participant in RTP session periodically sends RTCP control packets to all other participants

each RTCP packet contains sender and/or receiver reports

report statistics useful to application: # packets sent, # packets lost, interarrival jitter

feedback used to control performance

sender may modify its transmissions based on feedback

Multimedia Networking

9-20

The Real-Time Control Protocol (RTCP) works alongside the Real-Time Transport Protocol (RTP) to facilitate communication in real-time multimedia applications, such as voice and video conferencing over IP networks. While RTP handles the actual transmission of audio and video data, RTCP is responsible for providing feedback and control information to participants in the communication session. Here's how RTCP functions in the context of RTP:

1. **Feedback Mechanism**: RTCP provides a feedback mechanism for participants in a multimedia session. This feedback includes information about packet loss, jitter, round-trip time (RTT), and other quality-related metrics. Receivers periodically send RTCP packets to the sender(s) to convey this information.

2. **Quality of Service (QoS) Monitoring**: RTCP enables monitoring of the quality of service experienced by participants in the session. By analyzing RTCP reports, participants can assess the network conditions and the quality of received media streams.

3. **Synchronization**: RTCP aids in synchronizing participants' media streams. It helps maintain timing synchronization by conveying timing information, such as the synchronization source (SSRC) identifier and the RTP timestamp, allowing receivers to synchronize their playback of audio and video streams.

4. **Congestion Control**: RTCP can assist in congestion control by providing feedback on network congestion levels. Participants can adjust their transmission rates based on RTCP reports to alleviate network congestion and optimize bandwidth utilization.

5. **Session Control**: RTCP can also be used for session control purposes, such as participant discovery, session initialization, and session termination. It may convey information about session membership, participant addresses, and other session-related parameters.

6. **Participant Identification**: RTCP allows participants to identify each other within a session. Each participant in an RTP session is assigned a unique synchronization source (SSRC) identifier, which is included in RTCP packets to facilitate identification and communication between participants.

Overall, RTCP complements RTP by providing essential feedback and control mechanisms that contribute to the efficient and reliable transmission of real-time multimedia data over IP networks. Together, RTP and RTCP form the foundation for real-time communication applications, ensuring a smooth and synchronized multimedia experience for participants.

21 of 25

RTCP: multiple multicast senders

Multimedia Networking

9-21

each RTP session: typically a single multicast address; all RTP /RTCP packets belonging to session use multicast address
RTP, RTCP packets distinguished from each other via distinct port numbers
to limit traffic, each participant reduces RTCP traffic as number of conference participants increases

RTCP

RTP

RTCP

sender

receivers

22 of 25

RTCP: packet types

receiver report packets:

fraction of packets lost, last sequence number, average interarrival jitter

sender report packets:

SSRC of RTP stream, current time, number of packets sent, number of bytes sent

source description packets:

e-mail address of sender, sender's name, SSRC of associated RTP stream
provide mapping between the SSRC and the user/host name

Multimedia Networking

9-22

23 of 25

RTCP: stream synchronization

RTCP can synchronize different media streams within a RTP session
e.g., videoconferencing app: each sender generates one RTP stream for video, one for audio.
timestamps in RTP packets tied to the video, audio sampling clocks

not tied to wall-clock time

each RTCP sender-report packet contains (for most recently generated packet in associated RTP stream):

timestamp of RTP packet
wall-clock time for when packet was created

receivers uses association to synchronize playout of audio, video

Multimedia Networking

9-23

24 of 25

RTCP: bandwidth scaling

RTCP attempts to limit its traffic to 5% of session bandwidth

example : one sender, sending video at 2 Mbps

RTCP attempts to limit RTCP traffic to 100 Kbps
RTCP gives 75% of rate to receivers; remaining 25% to sender

75 kbps is equally shared among receivers:

with R receivers, each receiver gets to send RTCP traffic at 75/R kbps.

sender gets to send RTCP traffic at 25 kbps.
participant determines RTCP packet transmission period by calculating avg RTCP packet size (across entire session) and dividing by allocated rate

Multimedia Networking

9-24

25 of 25

Summary

Voice over IP

Characteristics, requirements
How to set playout delay

Real-time Transport Protocol

Why do we need RTP/RTCP?
Packet header fields

Next Class

Network support for multimedia applications

Extra Notes: �WebRTC (Web Real-Time Communication) is an open-source project that enables real-time communication capabilities directly within web browsers and other web applications. It allows for peer-to-peer communication, including audio, video, and data sharing, without the need for third-party plugins or software installations. Here's an in-depth review covering its purpose, functionality, key components, and historical context:

### Purpose and Usage:

WebRTC serves a variety of purposes across different industries and applications:

1. **Real-Time Communication**: Its primary use case is enabling real-time communication between users, such as voice and video calls, video conferencing, and live streaming.

2. **Collaborative Applications**: WebRTC powers collaborative applications like online gaming, file sharing, screen sharing, and remote desktop access.

3. **Customer Service and Support**: It facilitates real-time customer service interactions, including live chat, co-browsing, and screen sharing for remote assistance.

4. **IoT and Embedded Systems**: WebRTC is also used in Internet of Things (IoT) applications for real-time monitoring, control, and communication between devices.

### How WebRTC Works:

WebRTC is based on a set of open standards and APIs (Application Programming Interfaces) that enable real-time communication directly between web browsers or other devices. Here's how it typically works:

1. **Media Capture**: WebRTC allows web applications to access audio and video input devices (such as microphones and cameras) using the getUserMedia API. This enables users to capture audio and video from their devices.

2. **Peer Connection Establishment**: To establish a connection between peers, WebRTC utilizes a signaling mechanism. Signaling protocols (such as WebSocket or HTTP) are used to exchange signaling messages between peers. These messages contain session description information (SDP) that includes details about media codecs, network addresses, and ICE candidates.

3. **NAT Traversal and Connectivity Establishment**: WebRTC incorporates techniques like Interactive Connectivity Establishment (ICE), STUN (Session Traversal Utilities for NAT), and TURN (Traversal Using Relays around NAT) to traverse firewalls and NATs and establish direct peer-to-peer connections between devices.

4. **Media Streaming with RTP and RTCP**: Once a connection is established, WebRTC uses RTP (Real-Time Transport Protocol) for transmitting audio and video data between peers. RTP provides packetization, transport, and delivery of media streams, while RTCP (Real-Time Control Protocol) is used for quality monitoring and control.

### Key Components of WebRTC:

- **getUserMedia API**: Allows web applications to access audio and video input devices.

- **RTCPeerConnection API**: Facilitates the establishment of peer-to-peer connections and enables the transmission of audio, video, and data streams.

- **RTCDataChannel API**: Provides a peer-to-peer communication channel for transmitting arbitrary data between peers.

- **Signaling Mechanism**: Enables peers to exchange signaling messages for session establishment and management.

- **ICE, STUN, and TURN**: Techniques for NAT traversal and connectivity establishment.

### Relationship with RTP, RTCP, and UDP:

WebRTC builds upon several underlying protocols and technologies, including RTP, RTCP, and UDP:

- **RTP**: WebRTC uses RTP for transmitting real-time audio and video data between peers. RTP provides mechanisms for packetization, transport, and delivery of media streams.

- **RTCP**: RTCP complements RTP by providing feedback and control mechanisms for monitoring and managing the quality of communication sessions.

- **UDP**: WebRTC typically utilizes UDP (User Datagram Protocol) as the transport protocol for transmitting RTP and RTCP packets due to its low-latency and real-time nature.

### Historical Context:

- **Origins**: WebRTC originated from Google's acquisition of the company Global IP Solutions (GIPS) in 2010, which had developed voice and video processing technologies. Google open-sourced the project and began integrating it into web browsers.

- **Standardization**: WebRTC underwent standardization efforts within the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C) to ensure interoperability and widespread adoption.

- **Adoption**: WebRTC gained significant traction due to its potential for enabling real-time communication directly within web browsers, leading to its integration into major browsers such as Chrome, Firefox, Edge, and Safari.

- **Evolving Ecosystem**: Over the years, WebRTC has evolved with improvements in security, performance, and functionality. It has become a foundational technology for a wide range of real-time communication applications on the web and beyond.

In summary, WebRTC is a versatile technology that enables real-time communication directly within web browsers and other web applications. It leverages standards like RTP, RTCP, and UDP to transmit audio and video data, incorporates NAT traversal techniques for connectivity establishment, and has undergone significant development and adoption since its inception, becoming a key enabler of real-time communication on the web.