NERSC Overview
Julia Tutorial at SC25
July 22nd 2025
Johannes Blaschke
Data Science Engagement Group
NERSC
1
NERSC: Mission HPC for DOE Office of Science Research
Biological and Environmental Research
Computing
High Energy Physics
Largest funder of physical science research in the U.S.
Nuclear Physics
Basic Energy Sciences
Fusion Energy, Plasma Physics
2
NERSC Directly Supports Office of Science Priorities
2023 Allocation Breakdown
(Hours Millions)
Distributed by DOE �Office of Science �program managers
Competitive awards run �by DOE Advanced Scientific �Computing Research Office
Strategic awards from NERSC
3
NERSC Turns 50 This Year!
CDC 6600 at LLNL 1974
4
Success Is Depth and Breadth of Scientific Impact
An Accelerating Universe�Saul Perlmutter, Berkeley Lab
Perlmutter’s Nobel winning team is believed to have been the first to use supercomputers to analyze and validate observational data in cosmology, contributing to the discovery of the accelerating expansion of the universe.
Oscillating Neutrinos�Sudbury Neutrino Observatory (SNO)
Data from SNO was transferred to NERSC and analyzed in what became known as the “West Coast Analysis” leading to the discovery of neutrino oscillations and a Nobel Prize.
New Approach to Water Desalinization �Jeff Grossman, MIT
One of Smithsonian Magazine’s Top 5 Most Surprising Milestones of 2012 was the computationally driven discovery of an approach to desalination of water that is more efficient and less expensive than existing systems.
NERSC has been acknowledged in 8,790 refereed scientific publications & high profile journals since 2020
5
We Accelerate Scientific Discovery for Thousands of Office of Science Users with 3 Advanced Capability Thrusts
Large-scale applications for simulation, modeling and data analysis
Complex experimental and AI driven workflows
Time-sensitive and interactive computing
The NERSC workload is diverse with growing emphasis on integrated research workflows
6
Responding to the DOE Mission
NERSC-8: Cori
Manycore CPU
architectures
2016
2026
NERSC-9: Perlmutter
CPU and GPU nodes
Expanded Simulation, Learning & Data
2020
NERSC-10:�Accelerating �end-to-end �workflows
2030+
NERSC-11:
Beyond Moore
AI
Simulation & Modeling
Expt
Data
Simulation & Modeling
Expt
AI
Training &�Inference
Simulation & Modeling
Experiment
Data Analysis
HPC
Workflows Running Seamlessly in IRI
Quantum�Computing
Pervasive AI
7
NERSC Systems Ecosystem
100 GB/s
5 GB/s
edge services
2 x 400 Gb/s
2 x 100 Gb/s
50 GB/s
HPSS Tape Archive ~300 PB
35 PB�All-Flash
Scratch
>5 TB/s
1.6 TB/s
Common File System 130 PB
/home 450 TB
Experimental Facility
ASCR Facility
Home Institution
Cloud
Edge
Off-Platform Storage
DTNs, Gateways
1,792 GPU-accelerated nodes�4 NVIDIA A100 GPUs+1 AMD “Milan” CPU�448 TB (CPU) + 320 TB (GPU) memory
�3,072 CPU-only nodes�2 AMD “Milan” CPUs�1,536 TB CPU memory
HPE Slingshot 11 interconnect
4 NICs/GPU node, �1 NIC/CPU node
#25 (#6), 93.8PF Peak
Ethernet
Science Friendly Security
Production Monitoring
Power Efficiency
LAN
8
NERSC Ecosystem in 2027
Quality of Service Storage System (QSS)
Platform �Storage System (PSS)
> 800 GB/s
> 10 GB/s
container services
200 GB/s
HPSS Tape Archive >1 EB
35 PB�All-Flash
Scratch
>5 TB/s
1.6 TB/s
Community File System 240 PB
/home 450 TB
Experimental Facility
ASCR Facility
Home Institution
Cloud
Edge
Workflow Environment Management Environment
NERSC-10
Off Platform Storage
DTNs, Gateways
3.25 TB/s
�(26 Tbps)
1,792 GPU-accelerated nodes� 4 NVIDIA A100 GPUs+1 AMD “Milan” CPU� 448 TB (CPU) + 320 TB (GPU) memory�3,072 CPU-only nodes� 2 AMD “Milan” CPUs� 1,536 TB CPU memory
Ethernet
Science Friendly Security
Production Monitoring
Power Efficiency
LAN
HPE Slingshot 11 ethernet-compatible interconnect
4 NICs/GPU node, �1 NIC/CPU node
#25 (#6), 93.8PF Peak
2 x 400 Gb/s
2 x 100 Gb/s
9
Perlmutter system configuration
AMD "Milan" CPU Node
2x CPUs
> 256 GiB DDR4
1x 200G "Slingshot" NIC
NVIDIA "Ampere" GPU Nodes
4x GPU + 1x CPU
40 GiB HBM + 256 GiB DDR
4x 200G "Slingshot" NICs
Compute racks
64 blades
Blades
2x GPU nodes or 4x CPU nodes
Centers of Excellence
Network
Storage
App. Readiness
System SW
Perlmutter system
GPU racks
CPU racks
~6 MW
10
Perlmutter Node Configuration
Partition | Nodes | CPU | RAM | GPU | NIC |
GPU | 1536 | 256GB | 4x NVIDIA A100 (40GB) | ||
| 256 | 256GB | 4x NVIDIA A100 (80GB) | ||
CPU | 3072 | 512GB | – | ||
Login | 40 | 512GB | 4x NVIDIA A100 (40GB) | – | |
Large Memory | 4 | 1TB | 4x NVIDIA A100 (40GB) |
GPU Node Architecture:
11
Simplified NERSC File Systems
Memory
Burst Buffer
Scratch
Community
HPSS
Performance
Capacity
Global Common
Global Home
35 PB Flash Scratch
Lustre >5 TB/s
temporarily (purge)
xGB (x=Total RAM) In-Memory Burst Buffer
/tmp RamFS
157 PB HDD Community
Spectrum Scale (GPFS)
150 GB/s, permanent
150 PB Tape Archive
HPSS Forever
20 TB SSD Software
Spectrum Scale
Permanent
Faster compiling / Source Code
12
Global File Systems
Global Home
Community File System (CFS)
13
Local File Systems
Scratch
Burst Buffer
14
There are many different ways to access NERSC. To use our resources, you need to either:
15
Connecting with SSH
16
Connecting with SSH
After successfully logging on, you�be greeted by the terms of use, and�the command-line input prompt:
From here you can interact with�perlmutter…
17
Submitting Jobs
18
Submitting Jobs
srun
sbatch
or
salloc
Login Node
Head Compute Node
Other Compute Nodes allocated to the job
Head compute node:
Login node:
*figure courtesy Helen (2020 NERSC Training)
19
My First “Hello World” Job Script
To run via batch queue
% sbatch my_batch_script.sh
To run via interactive batch
% salloc -N 2 -q interactive -C gpu -t 10:00 �<wait_for_session_prompt. Land on a compute node>
% srun -n 64 ./helloWorld
debug queue
2 nodes
10 min “walltime”
run on haswell partition
use SCRATCH file system
run 64 processes in parallel
20
Accessing NERSC
and submitting your first Job
Access to Perlmutter and Use Julia Module
22
Compute Node Reservations
23
Make Sure to Clone The Tutorial Repo
https://github.com/JuliaParallel/julia-hpc-tutorial-juliacon25
Pro tip: $HOME is not the best file system for running jobs at scale. For large-scale jobs, add
export JULIA_DEPOT_PATH=$SCRATCH/depot
to you scripts for optimal performance (remember to do this in the config script, and the job scrip) … or use a container!
24
Logging into Jupyter
25
Logging into Jupyter
For now, let’s use the “Shared GPU Node” – or the “Login Node”
26
Logging into Jupyter
Later we’ll be using the “Exclusive GPU Node” or reservations (using the “Configurable Job”)
27
Getting a Terminal in Jupyter
28
Getting a Terminal in Jupyter
29
Setup
30
Pro Tip: Projects handle dependencies
31
Pro Tip: Sanity Check
32
Using Reservations in Tutorials
33
Jupyter�Options:
Leave defaults, except:
Account:
trn013
Reservation
juliacon_1
Time
180
34
Jupyter on Bridges2
35
36
37
38
Building HPC Julia Workflows
Building a HPC Workflow in Julia
WF node
High-speed network
Compute 1
Compute 2
Compute 3
Compute 4
41
Building a HPC Workflow in Julia
WF node
High-speed network
Compute 1
Compute 2
Compute 3
Compute 4
User SW
User SW
User SW
User SW
User WF
42
Building a HPC Workflow in Julia
WF node
High-speed network
Compute 1
Compute 2
Compute 3
Compute 4
Jupyter / Pluto
Distributed.jl / MPI.jl
CUDA.jl
CUDA.jl
CUDA.jl
CUDA.jl
Vendor SW
Vendor SW
Vendor SW
Vendor SW
User SW
User SW
User SW
User SW
User WF
Dagger.jl / ImplicitGlobalGrid.jl / ParallelStencil.jl
43
Building a HPC Workflow in Julia
WF node
High-speed network
Compute 1
Compute 2
Compute 3
Compute 4
Jupyter / Pluto
Distributed.jl / MPI.jl
CUDA.jl
CUDA.jl
CUDA.jl
CUDA.jl
Vendor SW
Vendor SW
Vendor SW
Vendor SW
User SW
User SW
User SW
User SW
User WF
Dagger.jl / ImplicitGlobalGrid.jl / ParallelStencil.jl
Possibly also login node, or head node
JACC.jl and KernelAbstractions.jl provide portability
High-level abstraction
Low-level communications
Interactivity
44