AnVIL Dataset Onboarding Application
A form to help our AnVIL committee determine readiness and operations relating to onboarding of a new dataset

This form is currently under development. If you have any feedback about potential improvements, please email
Email address *
What is your name / affil?
Your answer
What is the name of your dataset or consortium (e.g. eMERGE, 1000 Genomes)? Include version number.
Your answer
How was this dataset or consortium funded?
Please name the institutions that are a part of this dataset? Please list.
Your answer
We require a contact who will coordinate onboarding of this dataset into the AnVIL, which includes activities like depositing files into cloud buckets, determining data use limitations, and ensuring phenotypic data adheres to an agreed upon model. Please provide the name, email, institution, and affiliation of this contact with the dataset organization or consortium.
Your answer
Do program officers affiliated with this dataset organization or consortium approve onboarding into the AnVIL?
Please list the name(s) of the Program Officer(s) affiliated with this dataset organization or consortium
Your answer
Has this contact been approved by the dataset organization or consortium to represent them?
What is the scientific goal of this dataset or consortium?
Your answer
What makes this dataset impactful to the research community? Please include references any relevant advancements, for example in defining disease etiology, or advancing technology/tool development.
Your answer
How can your dataset benefit from being hosted by the AnVIL?
Your answer
Does this dataset contain data generated from human-derived samples?
Was this data generated in an ethical manner and have IRB approvals as needed?
Are the data use limitations on the dataset defined? (e.g. GRU, HMB, etc.)
Has this dataset (and all associated cohorts expected to be deposited in the AnVIL) been registered in dbGaP?
How many cohorts are included in this dataset?
Your answer
Does your dataset organization or consortium have a data sharing agreement in place that allows members to access restricted data outside of dbGaP data access requests?
Is this dataset currently available to the public via other sources?
What types of data files are you interested in hosting in the AnVIL?
Will your dataset have genotypic and phenotypic data available?
What type of analysis was performed to generate data?
What sequencing metrics do you have available for your genotypic data?
Your answer
Was your genotypic data aligned using a functionally equivalent pipeline?
What genome build was your genotypic data aligned to?
What is the total size (in TB) of the genotypic files you would like hosted in the AnVIL? Provide estimate per file type and number of files.
Your answer
What is the total size (in TB) of the phenotypic files you would like hosted in the AnVIL? Provide estimate per file type and number of files.
Your answer
What data model do you currently use to organize your data?
Your answer
Are there any analysis tools or apps that would be useful for your consortium to be able to use within the AnVIL?
Your answer
Never submit passwords through Google Forms.
This form was created inside of Broad Institute of MIT and Harvard. Report Abuse