Open Data in Biology
Tim Hubbard, @timjph
First International Open Economics Workshop
17th December 2012
Biology is a grand project
Data is organised towards this goal
Data, Databases & Bioinformatics
Researchers
Data
Resources
Repositories
Global Infrastructure
Curation
Website
Data mining
APIs
Downloads
Researchers
Submission
Policies
Systems
Sustainable
funding
Reuse
Discoverability
Easy of use (+access)
Scientific Record
Complete sets of components
Human genome race��won by public project��open access for all
Scale
Scale of data
Data infrastructure for biology
Sanger implementation of �Data Sharing
Data sharing issues for institutes
Ensuring data submission
New policy: sanctions for non-compliance
Data Sharing
Traditional: (Honest Broker)
Data set A
Researcher
“Run X on A & B”
Data set B
Results
“Request A & B data set”
Algorithm X
Data set combination
and anonymisation
process
Honest Broker Model:
Honest Broker
Anonymised data set
Proposed: (SVM)
Data set A
Researcher
“Run X on A & B”
Data set B
Results
“Run VM on A & B”
Summary data only
via output API
(no raw data)
Algorithm X
Secure Virtual machine (SVM):
Honest Broker (with local cloud)
Download VM Template
Secure Virtual
Machine (SVM)
----------------
API
API
Template
API
Secure Virtual
Machine (SVM)
Algorithm X
API
API
API
Changing sources of biological data
Healthcare Professional
Component 4
Individual query analysis
Component 3
Additional clinical annotation
Component 2
Genotype and Phenotype relationship capture
Component 1
Human sequence data repositories
Component 5
Electronic Health Record &
Personal Genome Sequence
Data from Patients
(NHS)
Data from Collections
(Research Institutes)
E-Health, Genomic Medicine & Linked Data
Healthcare Professional
Component 4
Individual query analysis
Component 3
Additional clinical annotation
Component 2
Genotype and Phenotype relationship capture
Component 1
Human sequence data repositories
Component 5
Electronic Health Record &
Personal Genome Sequence
Data from Patients (NHS)
CPRD
Data from Collections
(Research Institutes)
DH/NHS
DWP
HMRC
ONS
DfE, BIS
DJ, IC
ESRC
WT
MRC
Openness Privacy
Acknowledgements
Discussions with many at Sanger Institute, EBI, Wellcome Trust, NCBI, NHGRI, europePMC, Human Genome Strategy Group, Administrative Data Taskforce