1 of 24

Linux Clusters Institute:�Current Storage Infrastructure & Trends

J.D. Maloney | Lead HPC Storage Engineer

Storage Enabling Technologies Group (SET)

National Center for Supercomputing Applications (NCSA)

malone12@illinois.edu

University of Oklahoma, May 13th – 17th 2024

2 of 24

2

January 23rd-25th 2023

Brief Bio

NCSA Publicly Announced Compute Resources

Delta (ACCESS allocated) ** HOLL-I (Cerebras CS-2) ** HAL (AI Focused Cluster) ** Nightingale (HIPAA) ** Illinois Campus Cluster (Illinois HPC Cluster) ** Radiant (Openstack Platform)

  • Lead HPC Storage Engineer @ NCSA for 8 years
  • Taught/Created Intermediate Course for 4 years

NCSA’s Storage Environment (at present)

  • High Performance FS: 65+ PB Usable (~4PB is flash)
  • Archive Capacity: 60+ PB Usable
  • 45+ PB Unique Data

3 of 24

Baseline

  • Look over slides from Intro talks at the beginner workshop in Feb 2024
  • Have a grasp on:
    • Drive classes and their characteristics (HDD, SSD, NVME, etc.)
    • Storage connectivity options (IB, Ethernet, SAS, Fiber Channel, etc.)
    • RAID types and their overhead
    • Difference between vendor marketed space, and what shows up in a ‘df’
    • The definitions of bandwidth and latency

3

May 13th-17th 2024

4 of 24

Topic Coverage

  • There is a lot of technology related to storage out there
    • We could go on for hours
  • Going to touch on some highlights, and talk about how certain recent technology advancements impact HPC storage
  • Storage is (in my opinion) undergoing a period rapid growth and improvement across many fronts
    • Both hardware and software – advancement/change in one, impacts the other

4

May 13th-17th 2024

5 of 24

Latest Storage Related Hardware Technologies

5

May 13th-17th 2024

6 of 24

Underpinning Tech - Networks

    • HPC Storage is served over the network (typically), thus this piece is important
    • Current relevant networks and their speeds:
      • NDR Infiniband – 400Gbps (next gen is XDR – 800Gbps)
      • Ethernet – 400Gbps (next gen is 800Gbps)
      • Slingshot – 200Gbps
    • Other up-and-coming
      • Cornelius Networks (formerly Omnipath)
      • Rockport
      • Ultra Ethernet

6

May 13th-17th 2024

7 of 24

Underpinning Tech - Networks

    • Networks aren’t just speeds and feeds either; protocol enhancements can be important for storage
      • RoCE – RDMA over Converged Ethernet – lessens latency over traditional Ethernet improving performance for storage
      • GPUDirect (Magnum IO) – Method to allow storage traffic to traverse from the host NIC straight to GPU memory, bypass CPU and its memory
      • Congestion control and QoS enhancements allow for better storage traffic routing on networks

7

May 13th-17th 2024

8 of 24

Underpinning Tech - Networks

Plots of Ethernet performance with TCP vs. RoCE

8

May 13th-17th 2024

Image Credit: Alibaba Tech

9 of 24

Underpinning Tech - Networks

GPU Direct (Magnum IO) illustration of simplified I/O path for getting data to and from GPUs faster

9

May 13th-17th 2024

10 of 24

Underpinning Tech - PCIe

    • There has been some (relatively) recent jumps here in PCIe technology and performance
    • Not long ago (early 2019) we were on PCIe Gen3 (~16GB/s per x16 slot)
      • Had been since 2011
    • In two years we jumped to Gen4 (~32GB/s per x16 slot) and now to Gen5 ( ~64GB/s per x 16 slot)
      • Sept. 2019 AMD announces Rome (has PCIe Gen 4)
      • April 2021 Intel announces Ice Lake (has PCIe Gen 4)
      • Q4 2022 – Q1 2023 Intel/AMD will have PCIe Gen 5 server platforms – Intel already on Gen 5 for desktop

10

May 13th-17th 2024

11 of 24

Underpinning Tech - PCIe

    • PCIe Gen 6 is already ratified and Gen 7 is in the works
    • Why is this all of the sudden moving so fast??
      • CXL (Compute eXpress Link) is here/maturing and it rides on and relies upon PCIe, there is a lot of development going on here and PCIe key for that tech
      • As network speeds increase we need faster PCIe slots for the NICs that connect servers to these networks
      • Bandwidth is becoming more important than ever before as parallelism continues to expand

11

May 13th-17th 2024

12 of 24

Underpinning Tech - PCIe

12

May 13th-17th 2024

Image Credit: nexplatform.com

13 of 24

Underpinning Tech - SAS

    • While an aging and more legacy protocol, it’s still prevalent in a few spaces…namely in connecting HDD JBODs to controllers
    • SAS 24G is now out and vendors are starting to release products based on the standard
    • Mainly at the aggregation level so far, less at the drive level
      • Though I’d expect SAS SSDs to pick up the new standard relatively quickly; HDDs not so much, there isn’t a need

13

May 13th-17th 2024

14 of 24

Underpinning Tech - SAS

14

May 13th-17th 2024

15 of 24

Rotating Media (HDD) Technology

    • Heat Assisted Magnetic Recording (HAMR)
      • Heats platters with laser to change their properties to allow for higher density storage
    • Dual Actuator
      • Drives with 2 actuator arms, can double throughput and IOPs of an HDD

15

May 13th-17th 2024

Image Credit: storagenewsletter.com

    • Shingled Magnetic Recording (SMR)
      • Tracks on HDD are partially overlayed, like shingles on a roof; increases density at cost of performance
    • Ethernet Connected Drives
      • Drives connect via Ethernet instead of SATA/SAS; onboard PCB SoC can run small OS to put drive on network

16 of 24

Flash Media Technology

    • Faster PCIe connections, controllers that go with them
      • More power, more heat though too
    • New interfaces and form factors, more options than before
      • E1.L and E1.S, E3.S, U.3, M.2, etc.
    • More bits per cell – QLC is prevalent (quad level) and PLC (penta level) is coming
      • Drives down $/GB but lowers endurance even further
    • Optane (3DXpoint) was on here, but that has been discontinued by Intel/Micron
      • Gap being filled by SLC and MLC flash that has improved in durability

16

May 13th-17th 2024

17 of 24

Tape Technology –LTO9

    • Mass availability began in 2021
    • Up to 18TB of data per tape uncompressed
    • Great for low power archive storage systems
    • For tape to be cost effective, one needs to reach a given amount of data for economies of scale to make sense
    • Use cases need to match the

media’s capability

17

May 13th-17th 2024

Image Credit: lto.org

18 of 24

System Technology – Burst Buffer

    • Phasing out of popularity to a degree
      • Improvements happening elsewhere mitigating the need
      • Still useful, mostly in-file system implementations
    • Layer of flash above the file system that absorbs high I/O bursts, draining data to disk during lower demand periods
      • Can also re-align I/O to make it faster to spool out to disk
    • Either in node (DataWarp), outside of node (IME), or in file system (Hot Pools in Lustre, Spectrum Scale has equivalent)

18

May 13th-17th 2024

19 of 24

Future Trends in HPC Storage Hardware

19

May 13th-17th 2024

20 of 24

More flash in more places

    • Flash is getting more and more cost effective as time progresses
      • Especially when looking at TCO and some other coming enhancements
      • However, as of 2024 there are areas where HDD storage is still relevant and the cost leader for active storage
    • Flash coupled with tape for archive is an area that is gaining decent traction

20

May 13th-17th 2024

21 of 24

Higher Density Deployments

    • Both HDD and SSD densities are increasing leading to more storage in less space
      • Less kW per TB also
    • This trend is obvious, has been going on since forever but has notable implications
      • Handling fault tolerance in denser deployments can be more challenging at small scale
      • More capacity behind fewer machines drives need for faster switches for storage to connect to, at a pace that may outstrip compute’s need

21

May 13th-17th 2024

22 of 24

New File Systems

Going to touch on file systems in the next talk but want to mention a few things:

    • Hardware advancements that we’ve discussed enable new file system features/designs
    • The changes mentioned in prior slides are pretty major, storage is at a pretty critical point and new software is being/has been developed to start taking advantage of new hardware capabilities

22

May 13th-17th 2024

23 of 24

Acknowledgements

  • Members of the SET group at NCSA for slide review
  • Members of the LCI Steering Committee for slide review

23

May 13th-17th 2024

24 of 24

Questions

24

May 13th-17th 2024