Update: Using large-sample machine learning emulators for continental-scale land/hydrology model calibration and regionalization
Andy Wood, Guoqiang Tang, Mozhgan Farahani, Sean Swenson, Naoki Mizukami
Climate and Global Dynamics, National Center for Atmospheric Research
April 17, 2025
Traditional model calibration is often based on single sites -- watersheds
2
Individual basin calibration:
Each basin is trained separately
Large-sample basin calibration: All basins are trained together.
This approach has been less effective than individual basin calibration for complex PB models
For parameter regionalization,
(1) hydrologic models are trained on gauged basins; then,
(2) parameters are transferred to ungauged regions for application.
Note: This is similar in concept from the LM community practice of calibrating at flux towers (eg NEON, Plumber) before applying the parameters globally
671 watersheds
from CAMELS
Workflow of the emulator-based optimization
3
LSE calibration performance for CTSM
4
Large-scale hydrology researchers often summarize performance across sites using CDFs of results across all sites.
In this case, we use a common informal likelihood metric, KGE’ (Kling-Gupta Efficiency score)
better results
performance across all basins
WRR Preprint: Guoqiang Tang, Andrew W. Wood, Sean Swenson, 2024. On AI-based large-sample emulators for land/hydrology model calibration and regionalization. ESS Open Archive (DOI pending).
Application to a simpler process-based hydrologic model gives better results
5
We also applied the LSE-based calibration approach to the Structure for Unifying Multiple Modeling Alternatives (SUMMA) model over the same CAMELS basins.
paper in review
CTSM parameter optimization
6
Iterative sequential optimization and refinement of the emulator not only constrains but also shifts a priori parameters to new, better values
Example of outcomes from calibrated CTSM
Parameter spatial transfer
7
The LSE relationship between geo-attributes, parameters, and performance gives a basis for regionalization
example – SUMMA US-wide model for USACE and Reclamation water security study
We’re currently working on improving the emulator design to enhance parameter transfer performance
Code repos and datasets update
8
3+ code repositories for calibration, still private until papers accepted (hopefully soon)
Conceptual model LSE
SUMMA LSE
CTSM LSE
Data resources (currently being staged)
Contacts related to calibration effort
Contacts also related to modeling effort
Extra slides
9
HPC workload
A big challenge: how to run multiple one-grid CTSM cases on one node?
Unfortunately, although the solutions seem simple, they are obtained after extensive experiments and discussions with many people
Cheyenne (NCAR’s old HPC)
- GNU parallel + “MPI_DSM_DISTRIBUTE=0”
Derecho (NCAR’s new HPC)
- GNU parallel + cpu-bind
- GNU parallel + mpi-serial
Both approach are successful after extensive tests. Mpi-serial is simpler and better, but if it is not supported in the future, the cpu-bind approach would still work.
Migration
HPC workload: MO-ASMO job submission on Derecho
Target basins
(TBN, 627)
Create cases for each basin (compiling, forcing, namelist, etc..)
Iteratio-0: generate initial parameter sets for each basin (number P0, e.g., 400)
Decide the number of basins that will be run on one node (number BB0, e.g., 10)
Get the batch number for iteraio-0 (i.e,. B0 = int(TNB/BB0) + 1=63)
Create submission structure
run_model
iter0
iter1
iterX
batch0
batch1
batchX
batch_0.txt
idlecpu_0
idlecpu_1
idlecpu_128
submission.sh
…
Contain P0*BB0 (e.g., 4000) lines. Each line finish CTSM simulation, evaluation and archive for one basin and one parameter set
idlecpu_0 is just an empty file. If one CTSM simulation occupies a CPU 0, idlecpu_0 will be changed to busycpu_0 so other CTSM simulations won’t use CPU 0. Once the simulation is done, busycpu_0 will be changed back to idlecpu_0.
This is done by inserting <arg name=“cpu_bind”> --cpu-bind list:n1:n2</arg> to env_mach_specific.xml for each cloned case
Require one 128-core node
Run parallel -j 128 < batch_0.txt
Folders
Files
Explanations
Submission structure
Submit to Derecho
Multiple 1-node jobs
(B0, e.g., 63)
Archive
Save simulations and evaluation results
Iteratio-X: Build an emulator and generate new parameter sets for CTSM simulation
Decide batch number
Create submission strucutre
Stop Criteria
HPC workload: Job submission on Derecho
idlecpu_0
idlecpu_1
idlecpu_2
idlecpu_3
idlecpu_4
…
idlecpu_127
Before submission
Task-0
Taks-1
Taks-2
Task-3
Task-4
…
Task-127
Task-128
Taks-129
Taks-130
Task-131
Task-132
…
Task-3999
128 CPUs
4000 tasks
Start moment
busycpu_0
busycpu _1
busycpu _2
busycpu _3
busycpu _4
…
busycpu _127
128 tasks on 128 CPUs
Task-0
Taks-1
Taks-2
Task-3
Task-4
…
Task-127
Moment-2
Three jobs are done, releasing 3 CPUs
ildecpu_7
ildecpu _43
ildecpu _111
Assign the next three tasks to the three CPUs
busycpu_7
busycpu _43
busycpu _111
Task-128
Taks-129
Taks-130
Repeat Moment-2 means there will be always 128 CTSM cases running on 128 CPUs, except the very end