1 of 19

Slides: https://docs.google.com/presentation/d/1bbvmgWN18cH5klP-73P-_XE2swTlgTIv7J-euZ3OGdc/edit?usp=sharing

Recording: https://drive.google.com/file/d/1kYWtFgnuGEh2Q2MAG6CJf4jpYmsup7eA/view?usp=sharing

Contact: https://carlboettiger.info

2 of 19

Cyberinfrastructure for

Ecological Forecasting

Carl Boettiger

3 of 19

Traditional Workflow

4 of 19

Forecasting Workflow

5 of 19

InfrastructureWithout Duct Tape

6 of 19

Laptop to Cloud

MINIO

Docker

7 of 19

Raw data (NEON)

Driver data (NOAA forecasts, etc)

Prediction Targets

Forecasts

Scores

8 of 19

drivers

  • NOAA Forecasts: 4x daily, 30 day forecast
  • Down-scaled to 80+ NEON sites
  • Approaching 1 TB data
  • MINIO downloads

9 of 19

drivers

  • NOAA Forecasts: 4x daily, 30 day forecast
  • Down-scaled to 80+ NEON sites
  • Approaching 1 TB data
  • MINIO downloads
  • cron schedule
  • neon4cast R package

10 of 19

neonstore

  • NEON Data:
  • Highly atomized; product-site-month-table files
  • Re-names files without changing data
  • Corrections change data, old data no longer accessible
  • ...

11 of 19

neonstore

  • Provenance: download once, store & trace raw files
  • Performance: stack into columnar relational database
  • NEON Data:
  • Highly atomized; product-site-month-table files
  • Re-names files without changing data
  • Corrections change data, old data no longer accessible
  • ...

12 of 19

neonstore

  • Provenance: download once, store & trace raw files
  • Performance: stack into columnar relational database
  • cron schedule
  • NEON Data:
  • Highly atomized: product-site-month-table files
  • Re-names files without changing data
  • Corrections change data, old data no longer accessible
  • ...
  • neonstore R package

13 of 19

targets

  • 5 challenges (beetles, ticks, phenology, terrestrial, aquatic)
  • Forecast target ≠ raw data
    • Subset (site, period, …)
    • Aggregate (average across sensors, traps, etc)
    • Additional processing (taxonomy)
  • Continuously versioned

14 of 19

targets

  • cron schedule
  • 5 challenges (beetles, ticks, phenology, terrestrial, aquatic)
  • Forecast target ≠ raw data
    • Subset (site, period, …)
    • Aggregate (average across sensors, traps, etc)
    • Additional processing (taxonomy)
  • Continuously versioned

15 of 19

forecasts

  • Standardized formats (csv+netcdf) & EML metadata

  • Null forecasts
  • Submitted forecasts
    • submissions bucket → validator → forecasts

16 of 19

forecasts

  • Standardized formats (csv+netcdf) & EML metadata

  • Null forecasts
  • Submitted forecasts
    • submissions bucket → validator → forecasts

  • cron schedule (null forecasts + validator)
  • EFIstandards, EML packages

17 of 19

scores

  • CRPS scores for every forecast
  • Visualization of scores + forecasts

  • cron schedule

18 of 19

19 of 19

Thanks

  • Quinn Thomas
  • Mike Dietze
  • Claire Lunch
  • Christine Laney
  • Tyson Swetnam
  • EFI Cyberinfrastructure Working Group
  • NEON, CyVerse, XSEDE
  • Jorrit Poelen
  • Matt Jones