1 of 6

epiprocess

https://cmu-delphi.github.io/epiprocess

A package for epidemic signal processing��Ryan Tibshirani, CMU Delphi group

2 of 6

Broader context

Goal: build interworking community-driven packages for epi tracking & forecasting

epiprocess

Fetch data

  • Scrape websites
  • Fetch GitHub files
  • Make API calls
  • R/Python packages
  • However you want

Standardize, clean, & process data

Build & evaluate predictive models

epipredict

E.g., Epidata/COVIDcast and epidatr, epidatpy

{

(today’s focus)

3 of 6

epiprocess

Currently an R package (Python coming?). GitHub repo, documentation site (with package index, getting started guide, and several vignettes)

Two main data formats:

epi_df: snapshot of a data set (as of a particular data version)

epi_archive: historical archive of a data set (full version history)

  • Minimal columns: geo_value, time_value, version (archive only)
  • Minimal metadata: geo_type, time_type, as_of (df only)
  • Backend: epi_df is a standard data frame, whereas epi_archive uses data.table package

4 of 6

geo_value

time_value

percent_cli

age_group

PA

2020-03-01

1.0563

0-4

PA

2020-03-01

1.2781

5-12

...

geo_value

time_value

percent_cli:�0-4

percent_cli:�5-12

...

PA

2020-03-01

1.0563

1.2781

PA

2020-03-01

1.1145

1.3548

...

geo_value

time_value

percent_cli

age_group

version

PA

2020-03-01

0.9924

0-4

2020-03-03

PA

2020-03-01

1.0272

0-4

2020-03-04

...

This is an epi_df

This is an epi_df

This is an epi_archive

5 of 6

epi_df functionality

Existing functionality:

  • epi_slide(): slide a computation over signals in an epi_df to create new signals
    • Example: 7-day trailing average. However, computation here can be completely arbitrary
  • epi_cor(): compute lagged correlations between signals in an epi_df object
    • Example: lag signal back 20 days, correlate with another signal, per geo value & age group
  • growth_rate(): estimate growth rate of a signal, using various methodologies
  • detect_outlr(): detect outliers in a signal, using built-in or custom methodologies
  • Time aggregation and missing value imputation over time: powered by tsibble pkg

In development:

  • Geo aggregation and missing value imputation over space: this will be its own pkg, in development, called gtsibble

6 of 6

epi_archive functionality

Existing functionality:

  • epix_as_of(): derive a snapshot (in epi_df format) from a data archive as of a given version
  • epix_merge(): join two data archives together, doing missing value imputation via LOCF (last observation carried forward)
  • epix_slide(): slide a computation over signals in an epi_archive in a version-�aware fashion. For the computation at reference time t, it uses only data that was available as of time t
  • epix_fill_through_version(): extrapolate missing versions with LOCF or NAs