1 of 35

Dynamical Tests For Deep-Learning Weather Prediction Models

Gregory J. Hakim

University of Washington

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

DCMIP 2025

Hakim & Masanam (2024)

2 of 35

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

github.com/modons/DL-weather-dynamics

Experiments are CPU friendly!

3 of 35

Motivation

  • Rise of deep-learning models for global NWP
    • 2022—present: Forecast skill >= best in world (ECMWF IFS)
    • many new opportunities (e.g. very large ensembles; autograd)
  • Q: Have these models encoded physics?
  • Q: Can these models be used for basic science? Data assimilation?

Take-away: a set of physics-based tests would be a helpful part of deep-learning model training, evaluation, and inter-comparison

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

4 of 35

Physical Tests for Deep Learning Models

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

Previous evaluation mostly forecast RMSE aiming for “We beat IFS”�

Here we conduct two sets of experiments aimed at physics:

Test #1: Space—time initial value problems

  • signal propagation
  • Pangu-Weather model

Test #2: global large-ensemble data assimilation

  • covariance evolution and response to observation perturbations
  • Pangu-Weather: 1K members, 4M obs

5 of 35

Comparisons not in our experiments

We do not make direct comparisons to solutions from physics models

  • initialization is harder in physics models
    • there are a lot of “MIPs,” but few initial-value MIPs

Boundary conditions inevitably different with ML models

    • orography; land-ocean mask; surface fluxes (ERA5 or model)

A physics test suite should help distinguish ML models

    • models with similar forecast RMSE may differ on physical tests

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

6 of 35

Physics from localized disturbances

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

  • “Green’s function” approach
    • localized disturbance in space & time
  • signal propagation determined by physics
    • linear limit: max/min group velocity
    • nonlinear: coherence (solitons & modons)
  • efficient method to test all scales at once

Hakim (2003)

7 of 35

Experiment Design

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

Given a neural network, N, that autoregressively maps inputs to outputs

Bar fields are time-averages (“mean state”)

  • marenders the mean state time independent
  • provides space—time control for localized inputs

8 of 35

Experiments

  • Tropical heating localized, steady, heating (external to the model)
  • Baroclinic development
    • localized disturbance at upstream end of Pacific storm track
  • Geostrophic adjustment
    • impulsive local change to height field at one level
  • Atlantic hurricane development
    • seed climatological main development region with weak disturbances

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

9 of 35

Tropical Heating

  • classical experiment: tropical response; extratropical teleconnections
  • heating function spatial structure
    • Latitude: cos(6φ) within 15 of the Equator
    • Longitude: Gaspari & Cohn (1999) ~Gaussian, with compact support. L=10k km
    • Vertical: uniform up to 200 hPa
    • Heating rate: 0.1 K/day

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

10 of 35

Tropical Heating: Extratropical Teleconnections

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

Ting and Sardeshmukh (1993; fig. 7)

11 of 35

Matsuno (1966)—Gill(1980) Model

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

  • Steady heating & dissipation
  • Equatorial-trapped Kelvin wave
  • Off-equator Rossby wave

12 of 35

Pangu-Weather Solutions

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

13 of 35

Heating experiment (500 hPa Z)

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

t = 5 days; cint = 0.3 m

14 of 35

Heating experiment (500 hPa Z)

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

t = 10 days; cint = 2 m

15 of 35

Heating experiment (500 hPa Z)

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

t = 20 days; cint = 20 m

16 of 35

Heating experiment (850 hPa)

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

17 of 35

Baroclinic Wave Simulations

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

Simmons & Hoskins (1975,1978)

* baroclinic wave simulations on sphere

Rotunno et al. (1994)

* analytical jet & initial condition

Polvani et al. (2004)

* convergence in resolution

Jablonowski & Williamson (2007)

* standardized test case across dry “dy-cores”

Jablonowski & Williamson (2007)

18 of 35

Baroclinic Wave Packet Theory

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

Hakim (2003)

  • downstream development on the tropopause (Simmons & Hoskins 1978)
  • upstream development at the surface
  • shorter waves at the edges; longer waves near the peak (most unstable)

19 of 35

Baroclinic Development Experiment

  • Localized precursors to Pacific extratropical cyclone development
  • December-January-February mean state
    • steady state
  • Initial condition
    • regress all fields on ERA5 500 hPa height at 40N, 150E
    • horizontally localized to zero at r=2000 km

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

20 of 35

Pangu-Weather Solutions

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

21 of 35

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

22 of 35

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

23 of 35

Further idealization: DJF zonal-mean basic state��Will this break the model? (nothing like this in training)

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

24 of 35

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

25 of 35

Simulated Wave Packet Structure

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

Composite average observations (300 hPa)

Hakim (2003)

Pangu-Weather

  • similar structure to obs
  • smaller amplitude (IC ?)

26 of 35

Geostrophic Adjustment

  • Evolution of pressure and wind field to ~geostrophic/gradient balance
    • gravity wave radiation leaves slower Rossby waves
  • Typically illustrated in shallow water “dam break”
    • Not easy to configure in a full physics model; large initial tendencies
  • Initial condition same as cyclone case, except just 500 hPa Z
    • every other field has zero anomaly (no wind, no temperature, etc.)
    • this setup is really geostrophic & hydrostatic adjustment

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

27 of 35

Simulated geostrophic/hydrostatic adjustment

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

Lelong & Sundermeyer (2005)

  • Bousssinesq equations
  • periodic boundary conditions
  • initial localized density anomaly
  • initial velocity at rest
  • T: inertial period

density, V

(x-y plane)

potential

vorticity, V

(x-z plane)

28 of 35

Pangu-Weather Solutions

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

29 of 35

Geostrophic adjustment on the equator

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

This looks good:

  • Convergence, filling
  • Turning right in NH
  • Turning left in NH
  • Shorter timescales off equator

This doesn’t look good:

  • response too localized ; no long waves
  • jumps every 3h; largest at 24h

30 of 35

Hurricane Development

  • Localized precursors to Atlantic hurricane development
  • July-August-September mean state
    • steady state
  • regress all fields on mean-sea-level pressure at 15N, 40W
    • localized to zero at r=1000 km
    • multiplicative scaling to explore finite-amplitude response
    • one experiment that sets specific humidity to zero

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

31 of 35

Pangu-Weather Solutions

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

32 of 35

Storm tracks

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

  • Stronger intial storms track NW
  • β gyres”
    • not well resolved!
    • but implicitly captures this

33 of 35

Storm Intensity in MSLP Anomaly

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

Need initial low <~5 hPa in MSLP

  • finite amplitude important
  • reduces time to development (Nolan et al. 2007)

Remove water vapor: storm quickly dissipates

34 of 35

Summary & Conclusions

  • Physical tests for ML model development & comparison
    • response to local disturbances provides an efficient set of tests
    • only a starting point; many other tests can be added
  • Initial-value problems show promising results
    • a surprising range of “physics” appears encoded despite no explicit constraints
    • direct comparisons with physics models needed, but harder
  • What’s needed?
    • a standardized set of tests for ML model development; publish along with RMSE
    • model hierarchy, ideally, configurable rather than differently trained models

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025

35 of 35

Thank You!

G. J. Hakim, DCMIP 2025 Webinar

3 June 2025