1 of 23

Corundum status updates

Alex Forencich

4/3/2024

2 of 23

Agenda

  • Announcements
  • Status updates

3 of 23

Announcements

  • Next dev meeting: May 1 at 9 AM and 9 PM PDT
  • Next switch meeting: April 10 at 9 AM PDT

  • Corundum developer meetings:
    • 1st Wednesday of each month, 9 AM PDT
    • Dev meetings: May 1, June 5, July 3, Aug 7, Sept 4, Oct 2, Nov 6, Dec 4
  • Switch meetings:
    • 2nd and 4th Wednesday of each month, 9 AM PDT
  • Google calendar linked on wiki page

4 of 23

Status update summary

  • Potential license change – CERN-OHL-W
  • Potential language change – System Verilog
  • WR roadmap

5 of 23

Potential license change to CERN-OHL-W

  • CERN OHL v2 licenses released in 2020
    • 3 variants, Strict (GPL equiv.), Weak (LGPL eqiv), Permissive
  • Verilog libraries date back to 2014 or so
    • Used MIT license as GPL/LGPL are not appropriate for hardware
  • Considering switching everything to CERN-OHL-W
    • LGPL equivalent, not viral, only need to release modifications
    • Both Corundum and libraries
    • May leave select files under permissive license (app template, etc.)
  • Potential pitfalls?

6 of 23

Potential Language Change to System Verilog

  • System Verilog brings interfaces and other features that can significantly improve the project
    • Not just Corundum itself, but also libraries
  • SV support in open source tools has been improving
  • Should nail down common interfaces early (AXI, AXI stream)
  • Potential pitfalls?
  • What about legacy device support (4/5/6 series, etc.)?
    • Drop it completely? Vivado synth -> ISE? sv2v?
  • Stick with git subtrees and makefiles?

7 of 23

White Rabbit Roadmap

  • Integrate WR functionality directly into Corundum core logic
    • Likely will need to rework some of the PTP TD, MAC, and PCS logic
    • Also need some sort of timing I/O subsystem
  • Should be able to “easily” add support for quite a few boards
    • Corundum currently supports ~30 boards spanning multiple board vendors and device families

8 of 23

WR device support

  • Serdes, PHY, and MAC configuration is specific to device family
  • WR requires deterministic latency and precision timestamping
    • Mitigate latency variance in serdes and gearbox/PCS/MAC/EMIB
    • Hard MAC timestamping must be correct (CMAC, E/F tiles, etc.)
  • AMD/Xilinx GTX/GTH/GTY should work well
    • Used by current WR switch and other WR hardware
  • Other hardware will require characterization

9 of 23

WR device support

  • Hard logic (CMAC, E-tile, F-tile, etc.)
    • Requires case-by-case analysis and characterization
  • Soft MAC + GTH/GTY
    • Should be doable for 10G/25G, adding 1G may be more complex
  • Soft MAC + H/L-tile PMA
    • Enough PLLs for concurrent 1G/10G/25G
    • Need to investigate latency, etc.
  • Soft MAC + E-tile/F-tile PMA
    • Need to investigate

10 of 23

1G on GTH/GTY

  • GTH/GTY has 1 CPLL per channel + 2 QPLL per quad
  • Need one QPLL for 10G and one QPLL for 25G
  • CPLL only usable for 1.25 Gbps when ref clock is 156.25 MHz
  • Possible to oversample 1.25 at 10.3125?
    • Oversampling ratio of 8.25 (33/4)
    • Unclear how CDR will handle this
    • CDR hold and barrel shift
    • May be possible to override CDR logic and adjust RX PI via DRP
    • Will need to evaluate latency, etc.

11 of 23

WR board support

  • FPGA is part of the picture, board-level clocking is the rest
  • White rabbit requires tunable Ethernet reference clock and “helper” clock with small offset for DDMTD
    • Original WR hardware uses two VCOs + DACs
  • Helper clock can potentially be generated by (ab)using internal PLLs
  • Ethernet reference clock can be provided by VCO, DCO, or Fractional-N PLL
    • DCOs and Frac-N PLLs are actually rather common (Si570, Si5341, etc.)

12 of 23

WR board support

  • Corundum currently supports ~30 different FPGA boards
  • Board clocking configurations fall into 3 general categories

ADM-PCIE-9V3

K35-S

K3P-S

K3P-Q

fb2CG@KU15P

fb4CGg3

SUME

250-SoC

XUPP3R

XUSP3S

520N-MX

IA-420F

S10MX DK

S10DX DK

Agilex F DK

Agilex I DK

DE10-Agilex

Alveo U45N

Alveo U50

Alveo U55C

Alveo U55N

Alveo U200

Alveo U250

Alveo U280

KR260

VCU108

VCU118

VCU1525

ZCU102

ZCU106

13 of 23

WR board support

  • Boards with insufficiently tunable oscillator
    • Fixed osc, integer-N PLL, etc.

ADM-PCIE-9V3

K35-S

K3P-S

K3P-Q

fb2CG@KU15P

fb4CGg3

SUME

250-SoC

XUPP3R

XUSP3S

520N-MX

IA-420F

S10MX DK

S10DX DK

Agilex F DK

Agilex I DK

DE10-Agilex

Alveo U45N

Alveo U50

Alveo U55C

Alveo U55N

Alveo U200

Alveo U250

Alveo U280

KR260

VCU108

VCU118

VCU1525

ZCU102

ZCU106

14 of 23

WR board support

  • Boards with tunable oscillator behind BMC
    • May need to modify BMC firmware to support tuning

ADM-PCIE-9V3

K35-S

K3P-S

K3P-Q

fb2CG@KU15P

fb4CGg3

SUME

250-SoC

XUPP3R

XUSP3S

520N-MX

IA-420F

S10MX DK

S10DX DK

Agilex F DK

Agilex I DK

DE10-Agilex

Alveo U45N

Alveo U50

Alveo U55C

Alveo U55N

Alveo U200

Alveo U250

Alveo U280

KR260

VCU108

VCU118

VCU1525

ZCU102

ZCU106

15 of 23

WR board support

  • Boards with directly-connected tunable oscillator
    • Clocking network ready for white rabbit

ADM-PCIE-9V3

K35-S

K3P-S

K3P-Q

fb2CG@KU15P

fb4CGg3

SUME

250-SoC

XUPP3R

XUSP3S

520N-MX

IA-420F

S10MX DK

S10DX DK

Agilex F DK

Agilex I DK

DE10-Agilex

Alveo U45N

Alveo U50

Alveo U55C

Alveo U55N

Alveo U200

Alveo U250

Alveo U280

KR260

VCU108

VCU118

VCU1525

ZCU102

ZCU106

16 of 23

Boards without tunable PLLs

  • Possibly internal features can be abused to tune the transmit frequency
    • MLE did some exploration on this front, looking at MMCMs and QPLLs
    • US/US+: can tune some combination of QPLL and TX PI
    • Intel devices: need to explore

17 of 23

Multiple PCS/PMA clocks

  • 1G: 125 MHz
  • 10G: 156.25 MHz (async), 161.1328125 (sync) (64-bit)
  • 25G: 390.625 MHz (async), 402.83203125 (sync) (64-bit)
  • 100G: 322.265625 MHz (sync)
  • What to use as a common reference clock?
  • How to use DDMTD with different frequencies?
  • How to transfer time reference across clock domains with ps precision?
  • How to handle PCIe clock (unrelated freq)?
  • Should core clock be derived from Eth ref clock?

18 of 23

Other clocking considerations

  • Need “helper” clock with small offset
    • Possibly can use phase adjustment capabilities of internal PLLs to synthesize this
    • PLLs in both US/US+ and Stratix 10 appear to be able to do this
    • Need to characterize DDMTD performance due to jitter, etc.

19 of 23

Calibration

  • WR is highly sensitive to place and route
    • Calibration is required after each build for maximum accuracy
  • Possible to re-use calibrated timestamping logic?
    • Partial reconfiguration? Could also be useful for faster PCIe link initialization (similar to tandem PROM)

20 of 23

Linux DPLL subsystem

  • New DPLL subsystem to support SyncE, etc.
    • Potentially useful for WR
  • Added in kernel 6.7
    • Latest LTS is 6.6
    • Possibly too new to be a good target

21 of 23

Corundum + WR Status

  • Working on high-level architecture
    • Need to handle multiple PCS clock frequencies
    • Likely need to significantly rework PTP CDC logic and MAC+PCS logic
    • Likely will need board-level management core to control clocking
  • OCP Time Appliances Project White Rabbit NIC
    • Goal is to build a relatively low-cost open-source white rabbit NIC
    • Custom PCIe form-factor carrier board for Xilinx Kria K26 SoM
    • Renesas/IDT 8A34002 PLL
    • “Stock” corundum operating on initial hardware
    • Eventual goal is to support WR + PTM

22 of 23

PTP Time Distribution Subsystem

  • Packet timestamping requires PTP time reference
    • Timestamping logic located near serdes and uses separate clock domains
    • Time from single PHC must be distributed across device to leaf clocks
  • Serial protocol to distribute time from PHC
    • Single wire to reduce congestion
    • Protocol supports use of pipeline registers to cover long distances
  • ToD timestamp derived from relative timestamp
    • Reduce logic resources by using truncated 32.16 relative timestamps

23 of 23

PTP Time Distribution Subsystem

ptp_td_phc

ptp_td_leaf

PPS

Ethernet MAC

ptp_td_leaf

TX data FIFO

RX FIFO

TX TS FIFO

SLR0

SLR1

Ctrl

PTP ref clock

AXI stream

PTP TD data

TX clk

TX PTP time

RX clk

RX PTP time

48

48

1

Pipeline FFs

PTP ref clock (e.g. Eth TX ref clk)

rel2tod

48

96