1 of 27

Precision Frequency Measurement (PFM)

Julian St. James (Meta)

Ahmad Byagowi (Meta)

Connect. Collaborate. Accelerate.

2 of 27

What is a clock?

  • Two properties
    1. Periodic System
      • Frequency , the rate at which it runs
      • Example: Speedometer
    2. Measures Time
      • Time, A single number at a specific instant representing that Time
      • Example: Odometer
  • Examples of adjusting clocks
    • Car clock, compare versus your cell phone clock, then adjust time not frequency
    • Music, Beats per minute, adjusting frequency, not time

Connect. Collaborate. Accelerate.

3 of 27

How to transfer Frequency

  • Two ways to transfer a frequency
  • 1. One way, directly send the frequency
    • Example: Metronome, all players in orchestra listen to the same metronome
  • 2. Common-view
    • Two devices measure the same reference, but can’t measure each other
    • They communicate between each other

Source: https://www.nict.go.jp/publication/shuppan/kihou-journal/journal-vol50no1.2/0401.pdf

Connect. Collaborate. Accelerate.

4 of 27

One-way transfer

  • Follower gets frequency directly from source
    • Follower knows what frequency to expect and considers the source as 100% accurate
    • Common example: PLL or DPLL,
      • Provided a 10MHz reference
      • Follows that 10MHz
      • Generates clocks based on multiplying and dividing from references
  • Downside
    • The source has to connect directly to every user
    • Lots of connections, impossible to scale

Source: https://www.nict.go.jp/publication/shuppan/kihou-journal/journal-vol50no1.2/0401.pdf

Connect. Collaborate. Accelerate.

5 of 27

Common-view Transfer

  • Frequency source and Follower device watch a common reference
  • Follower device communicates data with the source to compare versus reference
  • Example:
    • GPS, common view of all satellites
  • Example operation:
    • Source A measures its frequency as 5.5X speed of Reference R
    • Follower measures its frequency as 5.3X speed of Reference R
    • Follower asks Source A what it measured, and then tries to match Source A’s measurement
  • Upside: Relies only on data and one common clock source

Source: https://www.nict.go.jp/publication/shuppan/kihou-journal/journal-vol50no1.2/0401.pdf

Connect. Collaborate. Accelerate.

6 of 27

The goal

  • Make it easy to build frequency locked systems using as little custom equipment as possible
  • Make it easy to scale up (add more capacity) and scale out (add more units)
  • TLDR: Use 100MHz PCIe clock for common view reference clock

Connect. Collaborate. Accelerate.

7 of 27

Modern DPLLs

  • Powerful clocking chips with multiple clocking inputs and outputs
  • Can generate multiple output frequencies from known input frequencies
  • Key features for PFM:
    • Hitless switchover
      • When switching between inputs, output phase is slowly adjusted to prevent glitching
    • Digitally Controlled Oscillator (DCO)
      • Generate output frequency by writing through software measured error versus an input

Connect. Collaborate. Accelerate.

8 of 27

Server Clocking

  • Servers have CPUs , peripherals, and PCIe Slots
  • Every PCIe slot gets a 100MHz reference clock from motherboard
    • That 100M reference clock is almost always shared between the CPU and the PCIe devices from one source, commonly called Common Clock architecture

Connect. Collaborate. Accelerate.

9 of 27

Typical PCIe endpoint architecture

 

OPEN POSSIBILITIES

  • Most PCIe endpoint designs, like PCIe NICs, have a local reference and PLL generating Ethernet clock, like 156.25MHz
  • The 100MHz clock from the baseboard is used to drive the PCIe core, making PCIe interface and Ethernet core asynchronous from each other

 

Ethernet Chipset

Local Oscillator

Ethernet Clocks

PCIe + 100MHz

PCIe Card

Host CPU

PLL

Connect. Collaborate. Accelerate.

10 of 27

Precision Frequency Measurement

 

OPEN POSSIBILITIES

  • Add DPLL circuitry onto the PCIe endpoint clock tree, and tie the common PCIe 100MHz to the DPLL
    • 100MHz acts as common frequency reference for all cards inside the chassis
    • All clocks to Ethernet chipset from same source, making it synchronous

Ethernet Chipset

Local Oscillator

100M

PCIe

PCIe Card

Host CPU

100M

DPLL

156M

Connect. Collaborate. Accelerate.

11 of 27

Frequency between two cards

 

  • Card 1 is the source, Card 2 is the follower
  • Card 1 measures local oscillator 1 versus 100MHz with DPLL1. Software on Host CPU software reads this value
  • Host CPU software gives this measurement to DPLL2 and DPLL2 adjusts its output clocks to match Card1 Measurement
  • Process repeats continuously to keep DPLL2 tracking DPLL1
  • PCIe Card 2 clocks are now tracking the frequency of PCIe Card 1

Ethernet Chipset 2

Local Oscillator 2

100M

PCIe

PCIe Card 2

Host CPU

100M

DPLL 2

100M

156M

Data

 

Ethernet Chipset 1

Local Oscillator 1

100M

PCIe

PCIe Card 1

DPLL 1

156M

Connect. Collaborate. Accelerate.

12 of 27

Frequency between systems Today

  • 1. Sync-E
    • Clock frequency recovered from an Ethernet link
    • Ethernet devices recover clock frequency, and provide to a DPLL
    • DPLL generates same clock to another Ethernet device to propagate
  • 2. White Rabbit
    • Uses Sync-E and improves upon it to add high precision and Time
  • All architectures require direct connections between the source and the exact node that requires frequency

Connect. Collaborate. Accelerate.

13 of 27

Scale-up

 

  • What about more than one Chassis?
    • Use Sync-E capability to recover the clock in Ethernet Chipset2, send it to the DPLL2, and measure the relationship between the Sync-E clock and the 100MHz in Server 2
    • Communicate this frequency relationship to other PCIe endpoints over PCIe in server 2
    • Creates a Frequency Boundary Clock, it follows, but also provides frequency

Ethernet Chipset 1

Local Oscillator 1

100M

156M

PCIe

PCIe Card 1

Server 1

100M

DPLL 1

 

Ethernet Chipset 2

Local Oscillator 2

PCIe

PCIe Card 2

Server 2

100M

DPLL 2

Sync-E

Ethernet

SyncE Clock

100M

156M

Connect. Collaborate. Accelerate.

14 of 27

How about time?

  • PTM (Precision Time Measurement) = PTP over PCIe, how to enable Precision Time over PCIe.
    • System in question needs PTM capability, and the PCIe endpoints support PTM
    • Best case, PTM can sync two endpoints within approximately 15-30ns over PCIe
  • 1PPS (Pulse Per Second) as a measurement of time on PCIe endpoint with DPLL, either input or output, is needed to establish exact times
  • With 15-30ns accurate time, together with a high stability frequency lock, two devices can average the 15-30ns accurate signal down to achieve < 1ns time error

Connect. Collaborate. Accelerate.

15 of 27

PFM + PTM

 

  • PTM synchronizes the PPS between the two cards to within 15-30ns , and provides that PPS to the DPLL on each card
  • PFM lets DPLL2 generate clocks based on the frequency measured by DPLL 1 of Local Oscillator 1
  • DPLL2 can average the PPS error over time and generate PPS signals with <1ns error between the two cards

Ethernet Chipset 2

Local Oscillator 2

100M

PCIe + PTM

PCIe Card 2

Host CPU

100M

DPLL 2

100M

156M

Data

 

Ethernet Chipset 1

Local Oscillator 1

100M

PCIe + PTM

PCIe Card 1

DPLL 1

156M

PPS Output or Input

PPS Output or Input

Connect. Collaborate. Accelerate.

16 of 27

What can I do with this?

  • With PFM , commercial servers from any vendor can source and distribute frequency to any PCIe card installed
  • With PTM + PFM, commercial servers from any vendor can source and distribute frequency and time to any PCIe card installed

Connect. Collaborate. Accelerate.

17 of 27

Application 1: 5G O-RAN

Connect. Collaborate. Accelerate.

18 of 27

5G O-RAN Timing requirements

 

OPEN POSSIBILITIES

  • 5G O-RAN requires tight timing throughout the chain
    • Multiple network hops
    • Devices need to operate as Class-C Boundary clocks with Time Error less than 10ns , and frequency stability requirements

Connect. Collaborate. Accelerate.

19 of 27

Current O-RAN architecture

 

OPEN POSSIBILITIES

  • To meet these timing requirements, O-RAN DU style servers are typically built with integrated NICs onto the baseboard, with integrated clocking circuits like DPLLs with GPS and OCXO built into the motherboard

Connect. Collaborate. Accelerate.

20 of 27

Time Card

https://engineering.fb.com/2016/02/18/core-data/netnorad-troubleshooting-networks-via-end-to-end-probing/

GNSS Receiver

High Stability Oscillator

PPS

PPS

ToD

Discipline

PCIe

Time Card

Clock Processing FPGA

Antenna

Connect. Collaborate. Accelerate.

21 of 27

Example DU as GM with Time Card

 

  • Time Card acts as frequency source
    • Locked to GPS
    • Atomic clock as stable frequency
  • PCIe NIC acts as frequency follower
  • Multiple PCIe NICs can be installed depending on interfaces needed
  • Since all clocks on PCIe NIC can now follow Time Card frequency, the NIC’s Ethernet clocking is backed (traceable) to a GPS disciplined Atomic clock

Ethernet NIC

Local Oscillator

100M

PPS Output or Input

PCIe + PTM

PCIe Endpoint

Host CPU

100M

DPLL

GNSS Receiver

High Stability Oscillator

PPS

ToD

Discipline

Time Card

Clock Processing FPGA + DPLL

Antenna

100M

10MHz

156M

Sync-E out clock traceable to 10MHz

Data

Connect. Collaborate. Accelerate.

22 of 27

What can I do with this?

  • With PTM + PFM, commercial servers from any vendor can source and distribute frequency and time to any PCIe card installed, creating 5G compliant devices

Connect. Collaborate. Accelerate.

23 of 27

Application 2: Distributed AI

Connect. Collaborate. Accelerate.

24 of 27

Distributed AI Application

 

  • This architecture applies for non-Ethernet Devices, like GPUs
    • Even without PTM, this will frequency sync GPUs within a system
    • With PTM , time and frequency can be distributed across an AI cluster, from the front-end network, to the CPU, to the GPUs, and to the back-end network

Ethernet NIC

Local Oscillator

100M

PPS Output or Input

PCIe + PTM

PCIe Endpoint

Host CPU

100M

DPLL

 

GPU

Local Oscillator

100M

PPS Output or Input

DPLL

PCIe + PTM

100M

GPU PCIe Card

GPU

GPU

GPU

Data

Connect. Collaborate. Accelerate.

25 of 27

What can I do with this?

  • With PFM , commercial servers from any vendor can create frequency synchronized AI clusters
  • With PTM + PFM, commercial servers from any vendor can create frequency and time synchronized AI clusters

Connect. Collaborate. Accelerate.

26 of 27

Initial Prototype

  • Initial architecture centered around Renesas DPLL and TI ARM microprocessor

Connect. Collaborate. Accelerate.

27 of 27

Thank You

Connect. Collaborate. Accelerate.