Software ecosystem concepts for federated genomic analysis
Vince Carey PhD
Bioc Europe 2019
100,000 users (downloads)
local clusters
laptops
100,000 users (downloads)
local clusters
laptops
1000+ packages
~50GB of software/reference data
~ daily CI/CD; 6-month release cycle
CRAN + Bioc git + system req.
100,000 users (downloads)
local clusters
laptops
1000+ packages
~50GB of software/reference data
~ daily CI/CD; 6-month release cycle
CRAN + Bioc git + system req.
~1000 developers
bugfixes in release
enhancing devel
100,000 users (downloads)
local clusters
laptops
1000+ packages
~50GB of software/reference data
~ daily CI/CD; 6-month release cycle
CRAN + Bioc git + system req.
~1000 developers
bugfixes in release
enhancing devel
1x commons -- commercial cloud
100,000 users (downloads)
local clusters
laptops
1000+ packages
~50GB of software/reference data
~ daily CI/CD; 6-month release cycle
CRAN + Bioc git + system req.
~1000 developers
bugfixes in release
enhancing devel
1x commons -- commercial cloud
Can it all
fit together???
Road map of the talk
Why consider federated approach?
from RL Grossman PMID 30691868
What is a data/analysis commons?
Components of a working data/analysis commons
Aspects of the AnVIL schematic
Under the hood -- Morgan, Turaga, Stubbs
Summary
What is the "data commons" aspect?
Sean Davis' BigRNA: towards a data service for RNA-seq, based in NCBI SRA
BigRNA + HSDS + rhdf5client + DelayedArray =
here HDF Scalable Data Service provides the RESTful back end
Upshots
Example: find RNA-seq studies in which a selected gene exhibits unusually large variation across samples
defaults: order by MAD for selected gene, filter top 10% of studies
Oddity: list studies in which a housekeeping gene exhibits unusually large variation across samples
Upshots
The associated ecosystem slice
Proposal
A definition of "analysis ecosystem"
© Carey et al., 2029
Caveats
Aspects of the AnVIL schematic
Conclusions
Extra slides follow
if 181000 are not enough ….
Grossman et al. 2016
PMID
29033693
OSDC has been running since 2009
ODSC schema