1 of 12

Update: Using large-sample machine learning emulators for continental-scale land/hydrology model calibration and regionalization

Andy Wood, Guoqiang Tang, Mozhgan Farahani, Sean Swenson, Naoki Mizukami

Climate and Global Dynamics, National Center for Atmospheric Research

April 17, 2025

2 of 12

Traditional model calibration is often based on single sites -- watersheds

2

Individual basin calibration:

Each basin is trained separately

Large-sample basin calibration: All basins are trained together.

This approach has been less effective than individual basin calibration for complex PB models

For parameter regionalization,

(1) hydrologic models are trained on gauged basins; then,

(2) parameters are transferred to ungauged regions for application.

Note: This is similar in concept from the LM community practice of calibrating at flux towers (eg NEON, Plumber) before applying the parameters globally

671 watersheds

from CAMELS

3 of 12

Workflow of the emulator-based optimization

3

  • 627 headwater basins from CAMELS
  • EM-Earth and ERA5-Land provide forcing
  • 200 model simulations from Latin Hypercube Sampling (LHS) of parameters
  • Sensitivity analysis using Pyviscous supports parameter selection
  • Parameters are selected for each basin clusters (15 in total)
  • Emulator: RF for LSE; RF+GPR for SSE
  • Optimization: GA for single-objective; NSGA-II for multi-objective
  • Iterative process: All samples for current and previous iterations are used to train a new emulator
  • Iteration numbers and trials in each iteration: varies by application

4 of 12

LSE calibration performance for CTSM

4

Large-scale hydrology researchers often summarize performance across sites using CDFs of results across all sites.

In this case, we use a common informal likelihood metric, KGE’ (Kling-Gupta Efficiency score)

better results

performance across all basins

WRR Preprint: Guoqiang Tang, Andrew W. Wood, Sean Swenson, 2024. On AI-based large-sample emulators for land/hydrology model calibration and regionalization. ESS Open Archive (DOI pending).

5 of 12

Application to a simpler process-based hydrologic model gives better results

5

We also applied the LSE-based calibration approach to the Structure for Unifying Multiple Modeling Alternatives (SUMMA) model over the same CAMELS basins.

paper in review

6 of 12

CTSM parameter optimization

6

Iterative sequential optimization and refinement of the emulator not only constrains but also shifts a priori parameters to new, better values

Example of outcomes from calibrated CTSM

7 of 12

Parameter spatial transfer

7

The LSE relationship between geo-attributes, parameters, and performance gives a basis for regionalization

example – SUMMA US-wide model for USACE and Reclamation water security study

We’re currently working on improving the emulator design to enhance parameter transfer performance

8 of 12

Code repos and datasets update

8

3+ code repositories for calibration, still private until papers accepted (hopefully soon)

Conceptual model LSE

SUMMA LSE

CTSM LSE

Data resources (currently being staged)

  • The entire CAMELS CTSM setup (forcings, inputs, geo-attributes, streamflow)
  • The parameter results (~1000 sets per basin), and associated performance metrics
      • This is essentially a hydrology–tailored PPE
  • The EM-Earth+ERA5Land CTSM forcing dataset (all of NLDAS area at 0.1 degrees)
  • The SUMMA CAMELS setup …

Contacts related to calibration effort

  • Guoqiang Tang, Wuhan University
  • Andy Wood, NCAR TSS
  • Mozhgan Farahani, NCAR TSS
  • Naoki Mizukami, NCAR RAL

Contacts also related to modeling effort

  • Sean Swenson

9 of 12

Extra slides

9

10 of 12

HPC workload

A big challenge: how to run multiple one-grid CTSM cases on one node?

Unfortunately, although the solutions seem simple, they are obtained after extensive experiments and discussions with many people

Cheyenne (NCAR’s old HPC)

- GNU parallel + “MPI_DSM_DISTRIBUTE=0”

Derecho (NCAR’s new HPC)

- GNU parallel + cpu-bind

- GNU parallel + mpi-serial

Both approach are successful after extensive tests. Mpi-serial is simpler and better, but if it is not supported in the future, the cpu-bind approach would still work.

Migration

11 of 12

HPC workload: MO-ASMO job submission on Derecho

Target basins

(TBN, 627)

Create cases for each basin (compiling, forcing, namelist, etc..)

Iteratio-0: generate initial parameter sets for each basin (number P0, e.g., 400)

Decide the number of basins that will be run on one node (number BB0, e.g., 10)

Get the batch number for iteraio-0 (i.e,. B0 = int(TNB/BB0) + 1=63)

Create submission structure

run_model

iter0

iter1

iterX

batch0

batch1

batchX

batch_0.txt

idlecpu_0

idlecpu_1

idlecpu_128

submission.sh

Contain P0*BB0 (e.g., 4000) lines. Each line finish CTSM simulation, evaluation and archive for one basin and one parameter set

idlecpu_0 is just an empty file. If one CTSM simulation occupies a CPU 0, idlecpu_0 will be changed to busycpu_0 so other CTSM simulations won’t use CPU 0. Once the simulation is done, busycpu_0 will be changed back to idlecpu_0.

This is done by inserting <arg name=“cpu_bind”> --cpu-bind list:n1:n2</arg> to env_mach_specific.xml for each cloned case

Require one 128-core node

Run parallel -j 128 < batch_0.txt

Folders

Files

Explanations

Submission structure

Submit to Derecho

Multiple 1-node jobs

(B0, e.g., 63)

Archive

Save simulations and evaluation results

Iteratio-X: Build an emulator and generate new parameter sets for CTSM simulation

Decide batch number

Create submission strucutre

Stop Criteria

12 of 12

HPC workload: Job submission on Derecho

idlecpu_0

idlecpu_1

idlecpu_2

idlecpu_3

idlecpu_4

idlecpu_127

Before submission

Task-0

Taks-1

Taks-2

Task-3

Task-4

Task-127

Task-128

Taks-129

Taks-130

Task-131

Task-132

Task-3999

128 CPUs

4000 tasks

Start moment

busycpu_0

busycpu _1

busycpu _2

busycpu _3

busycpu _4

busycpu _127

128 tasks on 128 CPUs

Task-0

Taks-1

Taks-2

Task-3

Task-4

Task-127

Moment-2

Three jobs are done, releasing 3 CPUs

ildecpu_7

ildecpu _43

ildecpu _111

Assign the next three tasks to the three CPUs

busycpu_7

busycpu _43

busycpu _111

Task-128

Taks-129

Taks-130

Repeat Moment-2 means there will be always 128 CTSM cases running on 128 CPUs, except the very end