1 of 44

NERSC Overview

Julia Tutorial at SC25

July 22nd 2025

Johannes Blaschke

Data Science Engagement Group

NERSC

1

2 of 44

NERSC: Mission HPC for DOE Office of Science Research

Biological and Environmental Research

Computing

High Energy Physics

Largest funder of physical science research in the U.S.

Nuclear Physics

Basic Energy Sciences

Fusion Energy, Plasma Physics

2

3 of 44

NERSC Directly Supports Office of Science Priorities

  • The distribution of time to Office of Science Programs is set by DOE
  • Percentages change infrequently
  • Roughly follows program budgets

2023 Allocation Breakdown

(Hours Millions)

Distributed by DOE �Office of Science �program managers

Competitive awards run �by DOE Advanced Scientific �Computing Research Office

Strategic awards from NERSC

3

4 of 44

NERSC Turns 50 This Year!

CDC 6600 at LLNL 1974

4

5 of 44

Success Is Depth and Breadth of Scientific Impact

An Accelerating Universe�Saul Perlmutter, Berkeley Lab

Perlmutter’s Nobel winning team is believed to have been the first to use supercomputers to analyze and validate observational data in cosmology, contributing to the discovery of the accelerating expansion of the universe.

Oscillating Neutrinos�Sudbury Neutrino Observatory (SNO)

Data from SNO was transferred to NERSC and analyzed in what became known as the “West Coast Analysis” leading to the discovery of neutrino oscillations and a Nobel Prize.

New Approach to Water Desalinization �Jeff Grossman, MIT

One of Smithsonian Magazine’s Top 5 Most Surprising Milestones of 2012 was the computationally driven discovery of an approach to desalination of water that is more efficient and less expensive than existing systems.

NERSC has been acknowledged in 8,790 refereed scientific publications & high profile journals since 2020

  • Nature [47]
  • Nature Family of Journals [463]
  • Proc. of the National Academy of Sciences [222]
  • Science [36]
  • Monthly Notices of the Royal Astron. Society [397]
  • Physical Review* [4,170]
  • Astrophysical Journal [792]
  • Physics of Plasmas [685]

5

6 of 44

We Accelerate Scientific Discovery for Thousands of Office of Science Users with 3 Advanced Capability Thrusts

Large-scale applications for simulation, modeling and data analysis

Complex experimental and AI driven workflows

Time-sensitive and interactive computing

The NERSC workload is diverse with growing emphasis on integrated research workflows

6

7 of 44

Responding to the DOE Mission

NERSC-8: Cori

Manycore CPU

architectures

2016

2026

 

NERSC-9: Perlmutter

CPU and GPU nodes

Expanded Simulation, Learning & Data

2020

NERSC-10:�Accelerating �end-to-end �workflows

2030+

NERSC-11:

Beyond Moore

AI

Simulation & Modeling

Expt

Data

Simulation & Modeling

Expt

AI

Training &�Inference

Simulation & Modeling

Experiment

Data Analysis

HPC

Workflows Running Seamlessly in IRI

Quantum�Computing

Pervasive AI

7

8 of 44

NERSC Systems Ecosystem

100 GB/s

5 GB/s

edge services

2 x 400 Gb/s

2 x 100 Gb/s

50 GB/s

HPSS Tape Archive ~300 PB

35 PB�All-Flash

Scratch

>5 TB/s

1.6 TB/s

Common File System 130 PB

/home 450 TB

Experimental Facility

ASCR Facility

Home Institution

Cloud

Edge

Off-Platform Storage

DTNs, Gateways

1,792 GPU-accelerated nodes�4 NVIDIA A100 GPUs+1 AMD “Milan” CPU�448 TB (CPU) + 320 TB (GPU) memory

3,072 CPU-only nodes�2 AMD “Milan” CPUs�1,536 TB CPU memory

HPE Slingshot 11 interconnect

4 NICs/GPU node, �1 NIC/CPU node

#25 (#6), 93.8PF Peak

Ethernet

Science Friendly Security

Production Monitoring

Power Efficiency

LAN

8

9 of 44

NERSC Ecosystem in 2027

Quality of Service Storage System (QSS)

Platform �Storage System (PSS)

> 800 GB/s

> 10 GB/s

container services

200 GB/s

HPSS Tape Archive >1 EB

35 PB�All-Flash

Scratch

>5 TB/s

1.6 TB/s

Community File System 240 PB

/home 450 TB

Experimental Facility

ASCR Facility

Home Institution

Cloud

Edge

Workflow Environment Management Environment

NERSC-10

Off Platform Storage

DTNs, Gateways

3.25 TB/s

(26 Tbps)

1,792 GPU-accelerated nodes� 4 NVIDIA A100 GPUs+1 AMD “Milan” CPU� 448 TB (CPU) + 320 TB (GPU) memory�3,072 CPU-only nodes� 2 AMD “Milan” CPUs� 1,536 TB CPU memory

Ethernet

Science Friendly Security

Production Monitoring

Power Efficiency

LAN

HPE Slingshot 11 ethernet-compatible interconnect

4 NICs/GPU node, �1 NIC/CPU node

#25 (#6), 93.8PF Peak

2 x 400 Gb/s

2 x 100 Gb/s

9

10 of 44

Perlmutter system configuration

AMD "Milan" CPU Node

2x CPUs

> 256 GiB DDR4

1x 200G "Slingshot" NIC

NVIDIA "Ampere" GPU Nodes

4x GPU + 1x CPU

40 GiB HBM + 256 GiB DDR

4x 200G "Slingshot" NICs

Compute racks

64 blades

Blades

2x GPU nodes or 4x CPU nodes

Centers of Excellence

Network

Storage

App. Readiness

System SW

Perlmutter system

GPU racks

CPU racks

~6 MW

10

11 of 44

Perlmutter Node Configuration

  • Each Milan CPU has 64 2.45 GHz cores with two hardware threads
  • Access is managed via Slurm partitions
  • Login nodes are GPU-enabled

Partition

Nodes

CPU

RAM

GPU

NIC

GPU

1536

256GB

4x NVIDIA A100 (40GB)

256

256GB

4x NVIDIA A100 (80GB)

CPU

3072

512GB

Login

40

512GB

4x NVIDIA A100 (40GB)

Large Memory

4

1TB

4x NVIDIA A100 (40GB)

GPU Node Architecture:

11

12 of 44

Simplified NERSC File Systems

Memory

Burst Buffer

Scratch

Community

HPSS

Performance

Capacity

Global Common

Global Home

35 PB Flash Scratch

Lustre >5 TB/s

temporarily (purge)

xGB (x=Total RAM) In-Memory Burst Buffer

/tmp RamFS

157 PB HDD Community

Spectrum Scale (GPFS)

150 GB/s, permanent

150 PB Tape Archive

HPSS Forever

20 TB SSD Software

Spectrum Scale

Permanent

Faster compiling / Source Code

12

13 of 44

Global File Systems

Global Home

  • Permanent, relatively small storage
  • Mounted on all platforms
  • NOT tuned to perform well for parallel jobs
  • Quota cannot be changed
  • Snapshot backups (7-day history)
  • Perfect for storing data such as source code, shell scripts
  • Addressed using $HOME

Community File System (CFS)

  • Permanent, larger storage
  • Mounted on all platforms
  • Medium performance for parallel jobs
  • Quota can be changed
  • Snapshot backups (7-day history)
  • Perfect for sharing data within research group
  • Addressed using $CFS

13

14 of 44

Local File Systems

Scratch

  • Large, temporary storage
  • Optimized for read/write operations, NOT storage
  • Not backed up
  • Purge policy (8 weeks)
  • Perfect for staging data and performing computations
  • Addressed using $SCRATCH

Burst Buffer

  • Temporary storage
  • High-performance in-memory file system
  • Perfect for getting good performance in I/O-constrained codes
  • Not shared between nodes

14

15 of 44

There are many different ways to access NERSC. To use our resources, you need to either:

  1. Log into the login nodes and interact with Slurm (covered here)
  2. Use services (eg. Jupyter) that expose web interfaces (covered later)
  3. Interact via our REST API – called the “Superfacility API”�(see: https://docs.nersc.gov/services/sfapi/)

15

16 of 44

Connecting with SSH

  • To access Perlmutter, use:�saul.nersc.gov�perlmutter.nersc.gov

  • To be able to open GUI�applications on the login �nodes use: ssh -Y

16

17 of 44

Connecting with SSH

After successfully logging on, you�be greeted by the terms of use, and�the command-line input prompt:

From here you can interact with�perlmutter…

17

18 of 44

Submitting Jobs

  • Jobs can be submitted to queueing�system through sbatch or salloc:
    • sbatch <my_job_script>
    • salloc <options>

  • The above methods list details�about resources needed for a job�and for how long

  • Eg.: a request for a cpu node to be�allocated for 5 mins:
    • salloc -N 1 -C cpu -q debug -t 5 -A <project>

18

19 of 44

Submitting Jobs

srun

sbatch

or

salloc

Login Node

Head Compute Node

Other Compute Nodes allocated to the job

Head compute node:

  • Runs commands in batch script
  • Issues job launcher “srun” to start parallel jobs on all compute nodes (including itself)

Login node:

  • Submit batch jobs via sbatch or salloc

*figure courtesy Helen (2020 NERSC Training)

19

20 of 44

My First “Hello World” Job Script

To run via batch queue

% sbatch my_batch_script.sh

To run via interactive batch

% salloc -N 2 -q interactive -C gpu -t 10:00<wait_for_session_prompt. Land on a compute node>

% srun -n 64 ./helloWorld

debug queue

2 nodes

10 min “walltime”

run on haswell partition

use SCRATCH file system

run 64 processes in parallel

20

21 of 44

Accessing NERSC

and submitting your first Job

22 of 44

Access to Perlmutter and Use Julia Module

  • NERSC users have been added to trn013 project
  • Non-users were sent the instruction to get a training account
    • Account valid through July 29
  • Login to Perlmutter: ssh username@perlmutter.nersc.gov
  • Julia modules:
    • % module load julia
  • Running Jobs examples:
    • https://docs.nersc.gov/jobs/

22

23 of 44

Compute Node Reservations

  • GPU node reservation:
    • To use 1 GPU only (sample flags for sbatch or salloc):
      • -A trn013 --reservation=juliacon_1 -C gpu -N 1 -c 32 -G 1 -t 30:00 -q shared
    • To use multiple nodes (sample flags for sbatch or salloc):
      • -A trn013 --reservation=juliacon_1 -C gpu -N 2 -t 30:00 -q regular
  • Outside of reservation, use:
    • To use 1 GPU only (sample flags for sbatch or salloc):
      • -A <project> -C gpu -N 1 -c 32 -G 1 -t 30:00 -q shared
    • To use multiple nodes (sample flags for sbatch or salloc):
      • -A <project> -C gpu -N 2 -t 30:00 -q regular (or -q interactive for salloc)

23

24 of 44

Make Sure to Clone The Tutorial Repo

https://github.com/JuliaParallel/julia-hpc-tutorial-juliacon25

Pro tip: $HOME is not the best file system for running jobs at scale. For large-scale jobs, add

export JULIA_DEPOT_PATH=$SCRATCH/depot

to you scripts for optimal performance (remember to do this in the config script, and the job scrip) … or use a container!

24

25 of 44

Logging into Jupyter

  • Go to https://jupyter.nersc.gov
  • Sign in using your training account credentials
  • Select your preferred jupyter instance:

25

26 of 44

Logging into Jupyter

  • Go to https://jupyter.nersc.gov
  • Sign in using your training account credentials
  • Select your preferred jupyter instance:

For now, let’s use the “Shared GPU Node” – or the “Login Node”

26

27 of 44

Logging into Jupyter

  • Go to https://jupyter.nersc.gov
  • Sign in using your training account credentials
  • Select your preferred jupyter instance:

Later we’ll be using the “Exclusive GPU Node” or reservations (using the “Configurable Job”)

27

28 of 44

Getting a Terminal in Jupyter

  • Jupyter should take a minute to start:

28

29 of 44

Getting a Terminal in Jupyter

  • (If you don’t see a terminal), select “+” followed by “Terminal”

29

30 of 44

Setup

  • Once started you should see a terminal

30

31 of 44

Pro Tip: Projects handle dependencies

  • Run: import Pkg; Pkg.activate(“path/to/the/repo”); Pkg.instantiate() to insure all dependencies are installed:

31

32 of 44

Pro Tip: Sanity Check

  • The versioninfo() function can be used to check if any backends are configured correctly. Eg. for MPI.jl and CUDA.jl:

32

33 of 44

Using Reservations in Tutorials

  • Go to https://jupyter.nersc.gov and select:� “Configurable GPU” in the “Perlmutter” row

33

34 of 44

Jupyter�Options:

Leave defaults, except:

Account:

trn013

Reservation

juliacon_1

Time

180

34

35 of 44

Jupyter on Bridges2

35

36 of 44

36

37 of 44

37

38 of 44

38

39 of 44

Building HPC Julia Workflows

40 of 44

41 of 44

Building a HPC Workflow in Julia

WF node

High-speed network

Compute 1

Compute 2

Compute 3

Compute 4

41

42 of 44

Building a HPC Workflow in Julia

WF node

High-speed network

Compute 1

Compute 2

Compute 3

Compute 4

User SW

User SW

User SW

User SW

User WF

42

43 of 44

Building a HPC Workflow in Julia

WF node

High-speed network

Compute 1

Compute 2

Compute 3

Compute 4

Jupyter / Pluto

Distributed.jl / MPI.jl

CUDA.jl

CUDA.jl

CUDA.jl

CUDA.jl

Vendor SW

Vendor SW

Vendor SW

Vendor SW

User SW

User SW

User SW

User SW

User WF

Dagger.jl / ImplicitGlobalGrid.jl / ParallelStencil.jl

43

44 of 44

Building a HPC Workflow in Julia

WF node

High-speed network

Compute 1

Compute 2

Compute 3

Compute 4

Jupyter / Pluto

Distributed.jl / MPI.jl

CUDA.jl

CUDA.jl

CUDA.jl

CUDA.jl

Vendor SW

Vendor SW

Vendor SW

Vendor SW

User SW

User SW

User SW

User SW

User WF

Dagger.jl / ImplicitGlobalGrid.jl / ParallelStencil.jl

Possibly also login node, or head node

JACC.jl and KernelAbstractions.jl provide portability

High-level abstraction

Low-level communications

Interactivity

44