1 of 25

Window of Uncertainty

Ahmad Byagowi

2 of 25

Motivation

  • Distributed applications may require to know the time sync error bound in order adjust their performance accordingly.
  • Linearizability as a requirement
  • Parallel to serial pipelines
  • One way latency
  • Precision Time Based Scheduling and Routing
  • Realtime Indication of Precision Time Sync System Performance
  • Precision Time Sync [components] Fault Detection

3 of 25

Overview

  • PTP delivers time to the end nodes from the Time Server [aka GM]
  • End nodes need to estimate/predict the error bound
  • Time Server provides the end nodes with the error bound of its service
  • Error bound can be separated in two parts; precision and accuracy
  • Precision is based on the offset variance perceived by the end node
  • Accuracy is based on the agreement with other peers

4 of 25

Quick Reminder on Precision vs. Accuracy

Not Accurate

Not Precise

Accurate

Not Precise

Not Accurate

Precise

Accurate

Precise

Reference

5 of 25

Formulation

  • Clock synchronization requires processes (p1, p2, p3, … pn) to bring their clocks as close as possible together by using communication between them.
  • For process pi, the adjusted clock of a process pi AC(t)i is a function of the hardware clock HC(t)i and a variable adji
  • The synchronization process in pi adjusts the value of adji and thus changes the value of AC(t)i
  • Error bound of γ is defined by achieving |AC(t)i-AC(t)j| ≤ γ for any given i and j representing processes pi and pj
  • For every pi participating in clock synchronization, γ is at least ε(1-1/n) where ε is the uncertainty in the message delay [Lundelius and Lynch 84]
  • Assuming symmetry for the error bound γ we can write γ = (2(ε/2)+(n-2)ε)/n

6 of 25

The Challenge in Sync Over the Network

  • Synchronizing Clocks in the Network over the noisy process of timing the packets.

Dij = Estimated difference between the physical clocks of pi and pj as estimated by pj

Δrx = True difference between a process px and the Time Server (or reference)

Show |ACi(t)-ACj(t)| ≤ ε(1-1/n)

|ACi(t)-ACj(t)| = |(HCi(t) + adji) – (HCj(t) + adjj)|

= (1/n)|Σ((Δri - Δrj) – (Dri – Drj))| ≤ (1/n) Σ |((Δri - Δrj) – (Dri – Drj))|

≤ (1/n) (2ε/2 + (n-2)ε) = ε(1-1/n)

7 of 25

Estimation of Precision

Not Accurate

Not Precise

Accurate

Not Precise

Not Accurate

Precise

Accurate

Precise

Reference

8 of 25

Precision

  • Precision is based on the sync variance perceived by the end node
  • End node is performing sync using a servo via the fabric of the network
  • Past offset changes determine/estimate the precision

9 of 25

Schematic of Time Sync across the Network

Open Time Server (a.k.a. GM)

NIC

TC

TC

TC

TC

TC

TC

TC

End node

NIC

Network Fabric

Servo

Oscillator

App

10 of 25

Law of total variance

Open Time Server (a.k.a. GM)

NIC

TC

TC

TC

TC

TC

TC

TC

End node

NIC

Network Fabric

Servo

Oscillator

App

11 of 25

Open Time Server Error Bound

Open Time Server (a.k.a. GM)

NIC

GNSS

MAC

12 of 25

Experiment

Front End NIC

Network (TCs)

End Node

NIC

Ideally!

STD[E2E]≃90ns

13 of 25

Estimation of Variance and Stationarity

14 of 25

Running Variance

15 of 25

Estimation of error bound (WOU)

16 of 25

Temperature Aware Error Bound

  • Monitor frequency adjust versus present oscillator temperature
  • Monitor temperature changes
  • Estimate upcoming changes of frequency based on temperature changes
  • Adjust the ratio between relying on existing model vs observation
    • Model in this case is the hold over
    • Observation in this case is a single clock sync episode

17 of 25

Estimation of Accuracy

Not Accurate

Not Precise

Accurate

Not Precise

Not Accurate

Precise

Accurate

Precise

Reference

18 of 25

Estimation of Accuracy

  • Based on Linearizability test
  • Estimation of statistical specificity
    • How specific is it to find a machine that is not linearizable
  • This method is optimized for a TC only implementation
    • Can be extended to BC included implementations
  • Each node seeks feedback from a random peer
    • Decentralized
  • A scrolling history of linearizability tests with random peers determines the current estimate of accuracy

19 of 25

Estimation of Accuracy

Time Server

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

OC

OC

OC

OC

OC

OC

20 of 25

Rogue Transparent Clock

  • Not applying CF
  • Issues with LO
  • All the subsequent chain gets affected

Time Server

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

TC

OC

OC

OC

OC

OC

OC

21 of 25

Accuracy Scenarios

22 of 25

Accuracy Scenarios

23 of 25

Accuracy Scenarios

24 of 25

Conclusion

  • Time Sync Error Bound is an estimation process
  • The current state is determined by scrolling over the past observations
  • Sensory information like temperature monitoring can improve the estimation
  • Precision and Accuracy should be identified and calculated separately

25 of 25

Questions?