1 of 31

TCP: overview RFCs: 793,1122, 2018, 5681, 7323

Transport Layer: 3-1

  • cumulative ACKs
  • pipelining:
      • TCP congestion and flow control set window size
  • connection-oriented:
    • handshaking (exchange of control messages) initializes sender, receiver state before data exchange
  • flow controlled:
    • sender will not overwhelm receiver
  • point-to-point:
      • one sender, one receiver
  • reliable, in-order byte steam:
      • no “message boundaries"
  • full duplex data:
    • bi-directional data flow in same connection
    • MSS: maximum segment size

2 of 31

TCP Send and Receive buffers

3-2

3 of 31

Dividing file data into TCP segments

Transport Layer: 3-3

4 of 31

TCP segment structure

Transport Layer: 3-4

source port #

dest port #

32 bits

not

used

receive window

flow control: # bytes receiver willing to accept

sequence number

segment seq #: counting bytes of data into bytestream (not segments!)

application

data

(variable length)

data sent by application into TCP socket

A

acknowledgement number

ACK: seq # of next expected byte; A bit: this is an ACK

options (variable length)

TCP options

head

len

length (of TCP header)

checksum

Internet checksum

RST, SYN, FIN: connection management

F

S

R

Urg data pointer

P

U

C

E

C, E: congestion notification

5 of 31

TCP sequence numbers, ACKs

Transport Layer: 3-5

Sequence numbers:

    • byte stream “number” of first byte in segment’s data

source port #

dest port #

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from receiver

A

sent

ACKed

sent, not-yet ACKed

(“in-flight”)

usable

but not

yet sent

not

usable

window size

N

sender sequence number space

source port #

dest port #

sequence number

acknowledgement number

checksum

rwnd

urg pointer

outgoing segment from sender

Acknowledgements:

    • seq # of next byte expected from other side
    • cumulative ACK

Q: how receiver handles out-of-order segments

    • A: TCP spec doesn’t say, - up to implementor

6 of 31

TCP sequence numbers, ACKs

Transport Layer: 3-6

host ACKs receipt of echoed ‘C’

host ACKs receipt of‘C’, echoes back ‘C’

simple telnet scenario

Host B

Host A

User types‘C’

Seq=42, ACK=79, data = ‘C’

Seq=79, ACK=43, data = ‘C’

Seq=43, ACK=80

7 of 31

TCP round trip time, timeout

Transport Layer: 3-7

Q: how to set TCP timeout value?

  • longer than RTT, but RTT varies!
  • too short: premature timeout, unnecessary retransmissions
  • too long: slow reaction to segment loss

Q: how to estimate RTT?

  • SampleRTT:measured time from segment transmission until ACK receipt
    • ignore retransmissions
  • SampleRTT will vary, want estimated RTT “smoother
    • average several recent measurements, not just current SampleRTT

8 of 31

TCP round trip time, timeout

Transport Layer: 3-8

EstimatedRTT = (1- α)*EstimatedRTT + α*SampleRTT

  • exponential weighted moving average (EWMA)
  • influence of past sample decreases exponentially fast
  • typical value: α = 0.125

RTT (milliseconds)

RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

sampleRTT

EstimatedRTT

time (seconds)

9 of 31

TCP round trip time, timeout

Transport Layer: 3-9

  • timeout interval: EstimatedRTT plus “safety margin”
    • large variation in EstimatedRTT: want a larger safety margin

TimeoutInterval = EstimatedRTT + 4*DevRTT

estimated RTT

“safety margin”

* Check out the online interactive exercises for more examples: http://gaia.cs.umass.edu/kurose_ross/interactive/

DevRTT = (1-β)*DevRTT + β*|SampleRTT-EstimatedRTT|

(typically, β = 0.25)

  • DevRTT: EWMA of SampleRTT deviation from EstimatedRTT:

10 of 31

TCP Sender (simplified)

Transport Layer: 3-10

event: data received from application

  • create segment with seq #
  • seq # is byte-stream number of first data byte in segment
  • start timer if not already running
    • think of timer as for oldest unACKed segment
    • expiration interval: TimeOutInterval

event: timeout

  • retransmit segment that caused timeout
  • restart timer

event: ACK received

  • if ACK acknowledges previously unACKed segments
    • update what is known to be ACKed
    • start timer if there are still unACKed segments

11 of 31

TCP Receiver: ACK generation [RFC 5681]

Transport Layer: 3-11

Event at receiver

arrival of in-order segment with

expected seq #. All data up to

expected seq # already ACKed

arrival of in-order segment with

expected seq #. One other

segment has ACK pending

arrival of out-of-order segment

higher-than-expect seq. # .

Gap detected

arrival of segment that

partially or completely fills gap

TCP receiver action

delayed ACK. Wait up to 500ms

for next segment. If no next segment,

send ACK

immediately send single cumulative

ACK, ACKing both in-order segments

immediately send duplicate ACK,

indicating seq. # of next expected byte

immediate send ACK, provided that

segment starts at lower end of gap

12 of 31

TCP: retransmission scenarios

Transport Layer: 3-12

lost ACK scenario

Host B

Host A

Seq=92, 8 bytes of data

Seq=92, 8 bytes of data

ACK=100

X

ACK=100

timeout

premature timeout

Host B

Host A

Seq=92, 8

bytes of data

ACK=120

timeout

ACK=100

ACK=120

SendBase=100

SendBase=120

SendBase=120

Seq=92, 8 bytes of data

Seq=100, 20 bytes of data

SendBase=92

send cumulative

ACK for 120

13 of 31

TCP: retransmission scenarios

Transport Layer: 3-13

cumulative ACK covers for earlier lost ACK

Host B

Host A

Seq=92, 8 bytes of data

Seq=120, 15 bytes of data

Seq=100, 20 bytes of data

X

ACK=100

ACK=120

14 of 31

TCP fast retransmit

Transport Layer: 3-14

Host B

Host A

timeout

ACK=100

ACK=100

ACK=100

ACK=100

X

Seq=92, 8 bytes of data

Seq=100, 20 bytes of data

Seq=100, 20 bytes of data

Receipt of three duplicate ACKs indicates 3 segments received after a missing segment – lost segment is likely. So retransmit!

if sender receives 3 additional ACKs for same data (“triple duplicate ACKs”), resend unACKed segment with smallest seq #

    • likely that unACKed segment lost, so don’t wait for timeout

TCP fast retransmit

15 of 31

Chapter 3: roadmap

  • Transport-layer services
  • Multiplexing and demultiplexing
  • Connectionless transport: UDP
  • Principles of reliable data transfer
  • Connection-oriented transport: TCP
    • segment structure
    • reliable data transfer
    • flow control
    • connection management
  • Principles of congestion control
  • TCP congestion control

Transport Layer: 3-15

16 of 31

TCP flow control

Transport Layer: 3-16

application

process

TCP socket

receiver buffers

TCP

code

IP

code

receiver protocol stack

Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?

Network layer delivering IP datagram payload into TCP socket buffers

from sender

Application removing data from TCP socket buffers

17 of 31

TCP flow control

Transport Layer: 3-17

application

process

TCP socket

receiver buffers

TCP

code

IP

code

receiver protocol stack

Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?

Network layer delivering IP datagram payload into TCP socket buffers

from sender

Application removing data from TCP socket buffers

18 of 31

TCP flow control

Transport Layer: 3-18

application

process

TCP socket

receiver buffers

TCP

code

IP

code

receiver protocol stack

Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?

from sender

Application removing data from TCP socket buffers

receive window

flow control: # bytes receiver willing to accept

19 of 31

TCP flow control

Transport Layer: 3-19

application

process

TCP socket

receiver buffers

TCP

code

IP

code

receiver protocol stack

Q: What happens if network layer delivers data faster than application layer removes data from socket buffers?

receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast

flow control

from sender

Application removing data from TCP socket buffers

20 of 31

TCP flow control

Transport Layer: 3-20

  • TCP receiver “advertises” free buffer space in rwnd field in TCP header
    • RcvBuffer size set via socket options (typical default is 4096 bytes)
    • many operating systems autoadjust RcvBuffer
  • sender limits amount of unACKed (“in-flight”) data to received rwnd
  • guarantees receive buffer will not overflow

buffered data

free buffer space

rwnd

RcvBuffer

TCP segment payloads

to application process

TCP receiver-side buffering

21 of 31

TCP flow control

Transport Layer: 3-21

  • TCP receiver “advertises” free buffer space in rwnd field in TCP header
    • RcvBuffer size set via socket options (typical default is 4096 bytes)
    • many operating systems autoadjust RcvBuffer
  • sender limits amount of unACKed (“in-flight”) data to received rwnd
  • guarantees receive buffer will not overflow

flow control: # bytes receiver willing to accept

receive window

TCP segment format

22 of 31

TCP connection management

Transport Layer: 3-22

before exchanging data, sender/receiver “handshake”:

  • agree to establish connection (each knowing the other willing to establish connection)
  • agree on connection parameters (e.g., starting seq #s)

connection state: ESTAB

connection variables:

seq # client-to-server

server-to-client

rcvBuffer size

at server,client

application

network

connection state: ESTAB

connection Variables:

seq # client-to-server

server-to-client

rcvBuffer size

at server,client

application

network

Socket clientSocket =

newSocket("hostname","port number");

Socket connectionSocket = welcomeSocket.accept();

23 of 31

Agreeing to establish a connection

Transport Layer: 3-23

Q: will 2-way handshake always work in network?

  • variable delays
  • retransmitted messages (e.g. req_conn(x)) due to message loss
  • message reordering
  • can’t “see” other side

2-way handshake:

Let’s talk

OK

ESTAB

ESTAB

choose x

req_conn(x)

ESTAB

ESTAB

acc_conn(x)

24 of 31

2-way handshake scenarios

Transport Layer: 3-24

connection

x completes

choose x

req_conn(x)

ESTAB

ESTAB

acc_conn(x)

data(x+1)

accept

data(x+1)

ACK(x+1)

No problem!

25 of 31

2-way handshake scenarios

Transport Layer: 3-25

ESTAB

retransmit

req_conn(x)

req_conn(x)

client terminates

server

forgets x

connection

x completes

choose x

req_conn(x)

ESTAB

ESTAB

acc_conn(x)

acc_conn(x)

Problem: half open connection! (no client)

26 of 31

2-way handshake scenarios

client terminates

ESTAB

choose x

req_conn(x)

ESTAB

acc_conn(x)

data(x+1)

accept

data(x+1)

connection

x completes

server

forgets x

Problem: dup data

accepted!

data(x+1)

retransmit

data(x+1)

accept

data(x+1)

retransmit

req_conn(x)

ESTAB

req_conn(x)

27 of 31

TCP 3-way handshake

Transport Layer: 3-27

SYNbit=1, Seq=x

choose init seq num, x

send TCP SYN msg

ESTAB

SYNbit=1, Seq=y

ACKbit=1; ACKnum=x+1

choose init seq num, y

send TCP SYNACK

msg, acking SYN

ACKbit=1, ACKnum=y+1

received SYNACK(x)

indicates server is live;

send ACK for SYNACK;

this segment may contain

client-to-server data

received ACK(y)

indicates client is live

SYNSENT

ESTAB

SYN RCVD

Client state

LISTEN

Server state

LISTEN

clientSocket = socket(AF_INET, SOCK_STREAM)

serverSocket = socket(AF_INET,SOCK_STREAM)

serverSocket.bind((‘’,serverPort))

serverSocket.listen(1)

connectionSocket, addr = serverSocket.accept()

clientSocket.connect((serverName,serverPort))

28 of 31

A human 3-way handshake protocol

Transport Layer: 3-28

1. On belay?

2. Belay on.

3. Climbing.

29 of 31

TCP three-way handshake: segment exchange

Transport Layer: 3-29

30 of 31

Closing a TCP connection

Transport Layer: 3-30

  • client, server each close their side of connection
    • send TCP segment with FIN bit = 1
  • respond to received FIN with ACK
    • on receiving FIN, ACK can be combined with own FIN
  • simultaneous FIN exchanges can be handled

31 of 31

Closing a TCP connection Time

Transport Layer: 3-31