Datasets
Sebastian
@s_urchs (slack: @surchs)
Before we start:
If you ask yourself...
how
then this is for you!
How do I choose a good dataset?
Ask yourself:
Copied liberally from Chris Gorgolewski’s slides (@chrisgorgo)
Ease of access
Signed Data Usage Agreement
Access through managed database
“Just get it”
Direct download
“It’s right there”
Hard
Easy
How raw is the data set
Organized
Preprocessed
Derivatives
More control
Less work
How useful is the data set
Data Quality
Meta Data Quality
Data Cost
How do I find open datasets
FCP-INDI
Open-Neuro
Canadian Open Neuroscience Portal
openMorph
How do I get the data
“I just want the data”
“I’m just testing things”
“I want to know what happened to my data”
Nilearn example
Nilearn - pick a dataset
https://nilearn.github.io/modules/reference.html#module-nilearn.datasets
Pick the ADHD 200 dataset
Nilearn - get the data
Nilearn - access the data
Nilearn - access the data
Amazon S3 example
Amazon S3 - pick a dataset
Amazon S3 - pick a dataset
�
Amazon S3 - get the data
�
How do I work with the data
How do I work with the data
How do I work with the data
I want more
data
More databases
Repository for data associated with publication
Digital object identifier
Any license
Hosted by CERN (cool)
Repository for data
Digital object identifier
Creative Commons license
Has commercial side
https://datasetsearch.research.google.com/
Dataset search engine
Doesn’t store anything
Let’s you search other databases
Preprocessed data
Longitudinal / reproducibility data
Derivative data
Repository of statistical maps of completed studies
Aggregated activation data maps with keyword search
Neurosynth-genes has the gene expression data from the Allen brain institute
Good people to talk to
Great resource for asking question and getting feedback
Cool data that take longer to get
> 100.000 individuals
Deep meta data
Genetics (prospective whole genome sequencing)
Extensive imaging data
Medical records
Human brain gene expression maps
Histological and developmental atlases
Extensive mouse data
https://db.humanconnectome.org/
Very high resolution data
Publicly available data for 1200 healthy individuals
Long, repeat imaging data (task and resting state)
Deep meta data
Share that brain
Katherine Karlsgodt, 2019