AnVIL Dataset Onboarding Application
A form to help our AnVIL committee determine readiness and operations relating to onboarding of a new dataset

This form is currently under development. If you have any feedback about potential improvements, please email
Email *
1) What is your name / professional affiliation?
2) What is the name of your dataset or consortium (e.g. eMERGE, 1000 Genomes)? Include version number if available.
3) How was this dataset or consortium funded?
Clear selection
4) Please name the institutions that are a part of this dataset? Please list.
5) We require a contact who will coordinate onboarding of this dataset into the AnVIL, which includes activities like depositing files into cloud buckets, determining data use limitations, and ensuring phenotypic data adheres to an agreed upon model. Please list the institutions that are contributing to this dataset (along with contact information).
6) If this dataset is NIH funded, do program officers affiliated with this dataset organization or consortium approve onboarding into the AnVIL?
Clear selection
7) If this dataset is NIH funded, please list the name(s) of the Program Officer(s) affiliated with this dataset organization or consortium
8) Are you an authorized representative capable of making decisions on behalf of the dataset?
Clear selection
9) What is the scientific goal of this dataset or consortium?
10) What makes this dataset impactful to the research community? Please include references any relevant advancements, for example in defining disease etiology, or advancing technology/tool development.
11) How can your dataset benefit from being hosted by the AnVIL?
12) Does this dataset contain data generated from human-derived samples?
Clear selection
13) Was this data generated in an ethical manner and have IRB approvals as needed?
Clear selection
14) Are the data use limitations on the dataset defined? (e.g. GRU, HMB, etc.)
Clear selection
15) Does your dataset contain any Personally Identifiable Information (PII) or Personal Health Information (PHI)? *
Examples of PII and PHI include: Names, Locations smaller than a state, dates more specific than a year, telephone numbers, vehicle identification numbers, licence plate numbers, fax numbers, serial numbers, email addresses, URLs, SSN, IP addresses, medical record numbers, biometric identifiers, beneficiary numbers, photographs, account numbers, or any other unique identifier. For additional guidance, see:
16) If it does contain PII/PHI above, is it de-identified? *
17) Has this dataset (and all associated cohorts expected to be deposited in the AnVIL) been registered in dbGaP?
Clear selection
18) How many cohorts are included in this dataset? (A cohort here is defined as "An organization of data that corresponds to a single IRB-approved study protocol")
19) Does your dataset organization or consortium have a data sharing agreement in place that allows members to access restricted data outside of dbGaP data access requests?
Clear selection
20) Is this dataset currently available to the public via other sources?
Clear selection
21) What types of data files are you interested in hosting in the AnVIL?
22) Will your dataset have genomic and phenotypic data available?
Clear selection
23) What type of analysis was performed to generate data?
24) What sequencing metrics do you have available for your genomic data?
25) Was your genomic data aligned using a functionally equivalent pipeline? (See more info here:
Clear selection
26) What genome build was your genomic data aligned to?
Clear selection
27) What is the total size (in TB) of the genomic files you would like hosted in the AnVIL? Provide estimate per file type and number of files.
28) What is the total size (in TB) of the phenotypic files you would like hosted in the AnVIL? Provide estimate per file type and number of files.
29) What data model do you currently use to organize your data? (ie: OMOP, dbGaP, FHIR, i2b2, etc)
30) Are there any analysis tools or apps that would be useful for your consortium to be able to use within the AnVIL?
Never submit passwords through Google Forms.
This form was created inside of Broad Institute of MIT and Harvard. Report Abuse