1 of 13

Corundum status updates

Alex Forencich

1/29/2024

2 of 13

Agenda

  • Announcements
  • Status updates

3 of 13

Announcements

  • Survey results:
    • 10 responses
    • Consensus seems to be that most of the options are good, except alternating 9AM/9PM
    • 8 votes for Monday, 7 votes for Wednesday
  • Meeting plan:
    • 1st Wednesday of each month, 9 AM and 9 PM
    • Dev meetings: Feb 7, Mar 6, April 3, May 1, June 5, July 3, Aug 7, Sept 4, Oct 2, Nov 6, Dec 4

4 of 13

Status update summary

  • MAC/PHY optimizations
  • Updated scheduler and internal flow control
  • Reworked TDMA BER module, added userspace tool
  • Additional app section passthrough
  • Testbed status

5 of 13

MAC/PHY optimizations

  • Reworked termination control character detection
    • 3+1 bits instead of 8 bits, reduces fanin
  • Reworking framing error detection
  • Adding some additional tests (underrun, tuser assert)
    • Found a bug in the Gigabit MAC related to underrun

6 of 13

Updated scheduler and internal flow control

  • Multiple ports and multiple priorities
    • Need an internal queue per priority per port
    • Use linked lists for efficient storage
    • Multiple entries per list element to hide pipeline delays
  • Internal flow control
    • Quota for in-flight transmit operations, per-port and per-priority
    • Prevent head-of-line blocking
  • Status: in progress

7 of 13

New scheduler

  • Multiple levels of scheduling
    • Round robin across ports
    • Priority across TCs on each port
    • Round robin on scheduled queues on each TC
    • Queues can be scheduled on multiple ports, but only one TC per port
  • Need internal flow control to manage buffer space
    • Operations can only start if there is at least 1 MTU of available buffer
    • One scheduler “channel” per TC, per port
    • Flow control configured and tracked per channel

8 of 13

Internal flow control

  • Need to enforce byte limit and possibly packet limit
  • Don’t know the size of a packet until it’s being sent
    • Have to initially assume worst-case size (MTU)
    • But, always reserving an MTU-sized block is inefficient
  • High-level idea: it works like a credit card
    • Have a credit limit of the buffer size
    • Place a “hold” for an MTU-sized block
    • Once we know how big the packet is, release the hold and charge the actual amount
    • Pay it off when the operation completes
    • (Not particularly accurate – no late fees, no interest, no miles, …..)

9 of 13

Current plan: split credit generation

  • FC credits, 1 credit = 1 packet
  • Credits generated for each MTU in the buffer
    • Buf_sz and buf_lim
    • If buf_sz + MTU <= buf_lim, buf_sz += MTU, generate 1 FC credit
    • On TX start, buf_sz -= MTU - pkt_sz
    • On TX complete, buf_sz -= pkt_sz
    • On failure, recycle FC credit (already paid 1 MTU)
  • Advantage: simple, can set buf size and MTU size in bytes directly, can change buf size at any time
  • Disadvantage: haven’t found an obvious problem yet…

10 of 13

TDMA BER module

  • Time-resolved BER measurement
    • Intended for characterizing layer 1 switches
  • Test setup:
    • Cycle switch in periodic schedule derived from PTP time
    • Place PHYs in PRBS31 mode
    • Collect bit errors, bin based on PTP time
  • TDMA BER measurement module + userspace tool
    • Old module has been in the repo for a long time
    • Cleaned up and optimized module, added control utility

11 of 13

mqnic-bert

  • Specify schedule start time, schedule period, timeslot period, timeslot active period
    • Measure BER in one shot across active timeslot period
    • Split each timeslot into slices, can measure 32 slices concurrently
  • Time-resolved BER data can be plotted as a heat map
  • Characterize switch, synchronization, transceivers, etc.

12 of 13

App section passthrough

  • Some applications require more “stuff” to get passed through to application section
  • MLE put together a pull request to add some macro-magic for this
    • Need to add testbenches

13 of 13

Potential shared development testbed

  • Hardware:
    • Several host machines
    • Various NICs and PCIe-form-factor FPGA boards
    • 2x HTG-9200 boards (9x QSFP28)
    • ONT-603 100G network tester, possibly other test equipment
    • Arista 7060CX 32 port 100G packet switch
    • 1x 32x32 + 2x 16x16 Polatis optical switches as scriptable patch panel
  • Software:
    • Less clear at the moment
    • Current idea: diskless hosts, users can set up their own images and boot them on the hosts (tools shared via NFS)