NCAR HPC Users Group (NHUG)
January 2023
January 3, 2023
Hosted by
NCAR’s Computational & Information Systems Lab
High Performance Computing Division
Consulting Services Group
This material is based upon work supported by the National Center for Atmospheric Research, which is a major facility sponsored by the National Science Foundation under Cooperative Agreement No. 1852977.
Agenda
2
NCAR HPC Users Group - Agenda
NHUG Communication Channels
3
NCAR HPC Users Group
https://bit.ly/3Bc04Wh
Upcoming Events
NCAR’s HPC resources will be unavailable to users on January 17-20 while CISL staff perform network reconfiguration for the Derecho supercomputer, and urgent maintenance on Cheyenne’s cooling infrastructure. Progress will be communicated through Notifier emails during the week.
CISL’s high-performance network must be reconfigured to accommodate Derecho installation. The network maintenance will make all HPC Resources - Cheyenne, Casper, Campaign Storage, GLADE, Quasar, Stratus, and JupyterHub - temporarily unavailable during the beginning of the maintenance window. This network reconfiguration is estimated to require the first day of the outage, after which the majority of services will be restored, with the exception of the Cheyenne compute nodes.
The remainder of the outage will focus on replacing the cooling infrastructure working fluid inside the Cheyenne compute node racks. This outage is expected to have minimal user-level impact (we are intentionally minimizing any software changes), and CISL staff will verify resource functionality before releasing the systems to the user community.
https://arc.ucar.edu/articles/362
4
NCAR HPC Users Group - Upcoming Events
Upcoming Events
Planning underway for a joint CISL/ESDS sponsored Dask workshop introducing basic and advanced usage, troubleshooting, best practices, etc…
NCAR, NOAA, NVIDIA and OpenACC.org are hosting an Open Hackathon on February 21 (an online-only introduction) and February 28 through March 2 at the NCAR Mesa Lab in Boulder (hybrid format).
Details: https://arc.ucar.edu/articles/359
5
NCAR HPC Users Group - Upcoming Events
NWSC-3 Project Status
6
NCAR HPC Users Group - NWSC-3 Project Status
Derecho CPU Rack Installation - December 2022
7
Derecho - Production & Deployment Schedule
8
Delivery/Task Item | Ship Date/Start Date |
Four Compute Rack Hardware | 11/18/2022 |
Remaining Production System Hardware | 12/16/2022 |
Production System - Factory Trial | 12/20/2022 |
Production System Installation | 01/03/2023 |
Production System - Acceptance Testing | Jan & Feb 2023 |
Solution Acceptance | 04/27/2023 |
ASD Project | 05/01/2023 |
Open Derecho for Production | 07/01/2023 |
NCAR HPC Users Group - NWSC-3 Project Status
Virtual Consulting: New for 2023
9
Many thanks to ESDS for sharing their virtual office hours experience & infrastructure
GPU Accounting
10
Casper gpgpu 2022 usage
11
SAM: GPU Tables
12
Casper V100 Usage: % of All GPUs (2022)
13
Casper V100 Usage: % of Each Node’s GPUs (2022)
14
Questions, Comments, Feedback??��Thank You!!
15
NCAR HPC Users Group - Wrap Up
Backup
16
NCAR HPC Users Group - Wrap Up
Casper V100 Usage: % of Each GPU (2022)
17
NCAR’s High-Performance Computing, Data, & Analysis Resources: 2023
2017
2023
HPC Systems
mid-2023
SGI/HPE
4032 Nodes, 145,152 Cores, 313 TB total memory, 4.79 PFlop/s
#21 Supercomputer in the world at debut, #109 presently
Cray/HPE
2570 Nodes, 323,712 CPU Cores, 680 TB total memory, 3.5X performance vs Cheyenne
328 NVidia A100 GPUs providing 20% of overall performance, 19.87 PFlop/s (projected)
Data Analysis & Visualization
High Performance Storage
Casper: heterogeneous system for data analysis & viz.
- 75 High-Throughput Computing nodes
- 9 visualization nodes with accelerated graphics
- 10 dense GPU nodes for AI/ML, Code Development
- 4 nodes for Research Data processing
- 2 1.5TB large memory nodes
GLADE & Campaign Storage
- 132PB long-term, online storage
- 17,464 hard drives
- 56 servers
CISL develops specialized visualization software & services for Earth Science applications
Derecho ‘scratch’ Storage
- 60PB short-term storage
- Principally supports HPC jobs
- 5,088 hard drives
- 24 servers
Stratus Object Storage
- 5PB object storage system
- 588 hard drives
- 6 servers
https://geocat.ucar.edu
http://projectpythia.org
Quasar Tape Library
- 35PB long term archival storage
- 22 IBM TS1160 tape drives
- 1774 20TB tape cartridges
- 216 hard drives
- 2PB disk cache
- 5 data mover servers
NCAR HPC Users Group - 2023 HPC Resource Overview