A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Data Source | URL | Description | Owner | Permission Level* | Priority Ranking** | Data Origination Source† | Primary Data Elements | Data Standards | Ease of Extraction† | Frequency of Updates | Data Quality Estimate‡‡ | Linkage Issues | Other Notes | ||||||||||||
2 | PRIMARY SOURCE OF PATIENT DATA | |||||||||||||||||||||||||
3 | Carolina Data Warehouse for Health (CDWH) | https://tracs.unc.edu/index.php/services/biomedical-informatics/cdw-h | Repository for clinical and administrative data from UNC Health Care System’s Electronic Health Record system | UNC NC TraCS | R§ | 1 | UNC Health Care System | Patient-level data across a variety of clinical domains, containing the entirety of the patient's electronic health record. Example domains include (but are not limited to): demographics, encounter details, diagnoses, procedures, medications, lab results, vital signs, social history, medical history. | Data is coded according to best practices laid out by the Office of the National Coordinator for Health IT. Examples include ICD-9/10, CPT, HCPCS, LOINC, RxNorm, and OMB standards. | 1 | CDW-H is updated daily with the previous day's EHR data. | 1 | ||||||||||||||
4 | MEDICATION AND CLINICAL OUTCOMES DATA SOURCES | |||||||||||||||||||||||||
5 | NIH Clinical Trials Data | http://www.clinicaltrials.gov | Data on existing and investigational drugs, including approved and investigational treatments, adverse event profiles, pharmacokinetics, and pharmacodynamics | US DHHS, NIH and NLM | O | 1 | US FDA Registration, IND application, Protocol, CSR and associated SAS analysis tables | Study design (incl. treatment/intervention & trial duration), participant population (demographics, disease state) and flow/disposition, baseline characteristics, primary & secondary endpoints, statistical analyses, adverse events | US FDA GCP Guidelines and Reporting Standards; NLM; MeSH terminology | 1 (data are extracted by search term(s); a given query may require multiple searches in order to identify all relevant trials) | Database is updated continuously; investigators are required to update data for individual studies within 30 days (recruitment/completion data) or 12 months (other data) | 1 | Linkage will depend on query; can link by participant characteristics (e.g., females aged ≥18 years), disease state, treatment (incl. dose formulation and dosage(s) for medications), outcome measure (e.g., ED visits, rescue medication), etc.; can link with variables of varying granularity | Can download data as xml, csv, tsv, or plain text; xml provides richer (more complete) dataset; a cursary test of the quality of the available data (recent CSR/SAS tables vs. clinicaltrials.gov data) showed highly favorable results, with only one obvious error (study completion date, 1 month difference in reported study completion date) | ||||||||||||
6 | CTTI AACT Clinical Trials Data | https://www.ctti-clinicaltrials.org/aact-database | User-friendly data on clinical trials | CTTI (Clinical Trials Transformation Initiative) | O | 2 | clinicaltrials.gov (aggregated & restructured as a relational database) | Same as clinicaltrials.gov | AACT data elements mapped to NLM definitions, MeSH terminology | 1 (but extracted data are largely provided as distinct txt files based on pre-defined categories) | Bi-annual, March and September (although as of 10/4/2016, most recent data are from clinicaltrials.gov time-stamped 3/27/2016) | 1 (except that data are not up to date and variables have been simplified) | Linkages will depend on query; less granular linkages than with clinicaltrials.gov (e.g., asthma vs. asthma of duration >3 months not responsive to treatment X) | Can download data as oracle dmp, pipe delimited text output, or SAS CPORT transport; pipe delimited text download is provided as categorized, distinct txt files; comprehensive data dictionary is available | ||||||||||||
7 | DrugBank | https://www.drugbank.ca/ | Bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information | Wishart Research Group | O | 1 | numerous sources, including US FDA, PubChem, KEGG, ChEBI, PubMed, PharmGKB, etc. | drug/chemical, drug target, protein data (over 200 fields) | DrugBank ID | |||||||||||||||||
8 | SIDER | http://sideeffects.embl.de/ | Drug - AE interactions | Max Plank, Biobytes, Novo Nordisk, European Molecular Biology Laboratory | O | 2 | package inserts, public documents | drug vs placebo AEs (frequency, name, organ system) | MedDRA | |||||||||||||||||
9 | FAERS | https://open.fda.gov/data/faers/ | voluntary reporting and surveillance system for drug-related AEs and medication errors | US FDA | O | 3 | voluntary reporting system (patients, providers, lawyers, etc.) | AEs, text descriptions | ICH E2B, MedDRA | |||||||||||||||||
10 | NORTH CAROLINA–SPECIFIC ENVIRONMENTAL EXPOSURE DATA SOURCES | |||||||||||||||||||||||||
11 | NC Air Quality Data | https://deq.nc.gov/about/divisions/air-quality/air-quality-data | Geocoded North Carolina monitoring data on pollen, nitrogen dioxide, sulfur dioxide, ozone, particulates | NC Department of Environmental Quality (DEQ), Division of Air Quality | O | 1 | North Carolina Sensors | Varied: http://deq.nc.gov/about/divisions/air-quality/air-quality-data/pollen-monitor-information | https://deq.nc.gov/about/divisions/air-quality/air-quality-monitoring/current-monitoring-data-pollutant/meteorological-and-pollutant-parameters | Manual via GIS and interactive maps. | Varied: http://deq.nc.gov/about/divisions/air-quality/air-quality-data/pollen-monitor-information | 2 Sensor sampler errors are reviewed and removed if technical issues. | May have to identify a patient subject, identify their location, and then manually extract information for that area. If looking for trend information across several patients, this could be tedious. | |||||||||||||
12 | NC Water Quality Data | http://deq.nc.gov/about/divisions/water-resources | Geocoded North Carolina monitoring data on water quality, algal blooms, water supply watershed | NC DEQ, Division of Water Resources | O | 1 | North Carolina Sensors | Water assessment, monitoring, specific incident, and toxicity in the form of reports: http://deq.nc.gov/about/divisions/water-resources/water-resources-data/water-sciences-home-page/reports-publications-data | https://deq.nc.gov/about/divisions/air-quality/air-quality-monitoring/current-monitoring-data-pollutant/meteorological-and-pollutant-parameters | Manual reading of reports. | Approximately every five years (during report years, data sampled twice per month from May - Sept). The Random Ambient Monitoring System is run every 2 years in various areas sampled once per month. | 2 Follows EPA guidelines. | May have to identify a patient subject, identify their location, and then manually extract information for that area. If looking for trend information across several patients, this could be tedious. | |||||||||||||
13 | NC Waste Management Data | https://deq.nc.gov/about/divisions/waste-management/waste-management-rules-data | Geocoded data on North Carolina unregulated and regulated landfills, registered underground storage tanks, underground storage tank incidents, dry cleaning solvent sites, hazardous waste sites, and inactive hazardous waste sites | NC DEQ, Division of Water Management | O | 1 | North Carolina Monitoring | Varied: http://deq.nc.gov/about/divisions/waste-management/waste-management-rules-data/waste-management-gis-maps/tsd-map-viewer | https://deq.nc.gov/about/divisions/air-quality/air-quality-monitoring/current-monitoring-data-pollutant/meteorological-and-pollutant-parameters | Manual via GIS and interactive maps. | Approximately annual. | 2 Best effort, no warranty. | May have to identify a patient subject, identify their location, and then manually extract information for that area. If looking for trend information across several patients, this could be tedious. | |||||||||||||
14 | NC DOT Roadway Data | https://ncdot.maps.arcgis.com/home/group.html?id=d178173a2c8247638d6cc58f31329472 | Geocoded roadway data for NC | NC DOT | O | 1 (may not be needed any longer, but likely useful to expand GT offerings) | NC DOT map data | Type of road, name, functional features, bridges | ArcGIS | Updated daily | ||||||||||||||||
15 | NATIONAL/INTERNATIONAL ENVIRONMENTAL EXPOSURE DATA SOURCES | |||||||||||||||||||||||||
16 | National Allergen Exposure Data | http://www.aaaai.org/global/nab-pollen-counts/south-atlantic-region | Geocoded monitoring data on exposures to pollen and mold | National Allergy Bureau | O/R (different levels of access) | 2 | Select sites run by states. | Low, Moderate, High, Very High rating for grass, trees, weeds, and sometimes mold. Sometimes raw count mold/pollen count data avialble. Sometimes mold/pollen count provided by species. | None. | 3 Manual extraction required, and data provided is non-uniform across states and sensors. | Most days between Feb and Oct or Nov (sometimes gaps, sometimes weekends not provided). | Data appears to be manually reviewed prior to posting. | Not clear what the coverage is for a given city's pollen data. | |||||||||||||
17 | National Air Quality System Pollution Data | http://www.epa.gov/AQS | Geocoded monitoring data on levels of carbon monoxide, lead, sulfur dioxide, nitrogen oxide, and particulates | US EPA | O/R (different levels of access) | 2 | US EPA Sensors | Over 1000 listed here: https://aqs.epa.gov/aqsweb/documents/codetables/parameters.html , also historical Air Quality Index | Follows EPA Pollutant Standards: https://aqs.epa.gov/aqsweb/documents/codetables/pollutant_standards.html | 1 Yes: web page, REST API (note: AirNow API available for real-time data) | Varies: https://aqs.epa.gov/aqsweb/documents/codetables/collection_frequencies.html | 2 Varies: https://www3.epa.gov/ttn/amtic/qacert.html | ||||||||||||||
18 | Open, Real-time Air Quality Data | https://openaq.org/ | Our community aggregates and shares open air quality data from around the world. We believe open access to air quality data empowers the public to fight air inequality. | developmentSEED, AWS, AGU, Earh journalism network, a project of internews, keen io, nih, the open science prize, thriving earth exchange, wellcome trust, howard hughes medical institute, echoing green | O | 2 | Aggregated physical air quality data from public data sources provided by government, research-grade and other sources. | 30,239,074 air quality measurements from4,334 locations in 33 countries. Data are aggregated from 47 government level and research-grade sources. | It's own standard. | 1 REST API: https://docs.openaq.org/ | Varies: per individual source. | 2: Varies: per individual source. | ||||||||||||||
19 | National Water Quality System: STOrage and RETrieval and Water Quality eXchange (STORET) | https://www.epa.gov/waterdata/storage-and-retrieval-and-water-quality-exchange | Geocoded monitoring data on water quality | US EPA | O | 3 | Monitoring by individual state sensors aggregated into a single EPA system. | Dashboards by region, state, tribes, state profile, %, top ten substances. Including metals, chemicals, water/air temps, pH, DO, | Water Quality Exchange (WQX) uses the technology, standards and protocols of the National Environmental Information Exchange Network, or Exchange Network, to provide a means for data partners to share water quality monitoring data to the STORET Data Warehouse. | 1 Web page, or ftp. | The STORET Data Warehouse is refreshed with new data submitted via WQX weekly. The refresh process is routinely started every week on Wednesday evening, so that data is available the following Thursday. | 2 Ties back to states, which typicall follow EPA guidelines. | ||||||||||||||
20 | SOCIOENVIRONMENTAL EXPOSURE DATA SOURCES | |||||||||||||||||||||||||
21 | American Community Survey | https://www.census.gov/programs-surveys/acs/ | Demographic, social, economic, and housing data, available at multiple geographic levels (US Census–tract level, state, nation) | US Census Bureau | O/R (different levels of access) | 1 (suggest downgrading to 2 or 3, CDWH data is of much higher quality) | Annual (ongoing) nationwide survey of roughly 3.5 million randomly selected US postal addresses; conducted by the US Census Bureau (ACS supplements the 10-year US Census) | US Census Bureau–type data elements (general demographic/employment/income/housing) | US Census Bureau | 2 (site is difficult to navigate; data are largely available as csv files only containing tons of [most useless, non-intuitive] variables; some data available only as visual maps) | Annual | 2 (the survey is conducted by the US Census Bureau, so as far as surveys go, the data are high quality, but the data are of questionable utility) | Data are geocoded by US Census tract but will have to be re-geocoded (at some level) for linkage to other data sources; the re-geocoding will depend on the query and linkage dataset | General demographic data, general employment data, general income data, general housing data; data on all persons who reside in a single address; sample forms: https://www.census.gov/programs-surveys/acs/about/forms-and-instructions.html; Several types of datasets are theoretically available, but many links point to the same set of csv or (useless) txt files; Data dictionary available; Restricted-use microdata: "Qualified researchers with approved projects can work under strictly controlled secure Census Bureau facilities administered by the Center for Economic Studies". | ||||||||||||
22 | FBI Uniform Crime Reporting System | https://ucr.fbi.gov/ucr | Geocoded data on reported crimes, available at multiple geographic levels (city, county, state, nation) | US FBI, Department of Justice | O/R (different levels of access) | 3 (suggest upgrading to 2) | Voluntarily reported crime data contributed by participating local, county, state, tribal, and federal law enforcement agenceis | Data on violent crime (murder, non-negligent manslaughter, forcible rape, robbery, aggravated assault) and property crime (burglary, larceny-theft, motor vehicle theft) | US Department of Justice | 1 (site is extremely easy to navigate; data are available as tables or csv files and are very easy to interpret; downside is that the data are somewhat out of date [2012]) | Annual, although the data are not released every year (the most recent available data are from 2012); participating agencies provide monthly reports | 1 (voluntary system, but participation is high across the country and across jurisdictions; reporting variables are defined by legal statutes) | Data are geocoded by city (10,000+), county (25,000+), state, and nation; linkage at the city and county level are limited by missing data on smaller jurisdictions | City data only available for jurisdictions of 10,000+, and county data available only for jurisdictions of 25,000+; can request additional data (provided as ASCII files) on additional crime statistics, including arson, hate crimes, and arrests; Data dictionary available; US Bureau of Justice also provides survey data on crime through the National Crime Victimization Survey (http://www.bjs.gov/index.cfm?ty=dcdetail&iid=245) | ||||||||||||
23 | BIOCHEMICAL AND MOLECULAR DATA SOURCES | |||||||||||||||||||||||||
24 | ToxCast | http://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data | High-throughput toxicity data on thousands of chemicals, including mechanism of action and signaling pathways | US EPA | O | 1 (suggest downgrading to 3) | Chemical and high-throughput toxicity data from the US EPA's Toxicology in the 21st Century (Tox21) Federal Collaboration | Data on chemicals, high-throughput assays, concentration-response curves | DSSToX for chemicals, BioAssay Ontology (BAO) for assays (all data/assays/processes are standardized according to ToxCast documented standards) | 3 (site is extremely easy to navigate, but databases are extremely large and require at least a bit of expertise on databases; datasets are also quite large and data interpretation entails a strong background in basic science/pharmacology [down to the level of reagants, etc.]) | Appears to be annual (animal toxicity data are from 2014, but all other data are from 2015 or 2016) | 1 (highly stringent QA/QI process in place for all assay protocols, data types, and data reporting) | Linkage via chemical for DSSTox data; linkage via chemical, organism, tissue for INVITRO/ASSAY data (and subsequent secondary analysis to understand assay type/target and effect direction) | Databases/datasets include: ToxCast/Tox21 Chemicals (DSST) (useful, ToxCast/Tox21 High-throughput Assays (INVITRO, ASSAY) (in vitro assays, effect direction on endpoint useful), ToxCast/Tox21 Summary Data, ToxCast/Tox21 Concentration-Response Curves (visual displays [I think] from in vitro assay data, so not very useful), Animal Toxicity Data (dose levels only, so not useful), and a variety of other (mostly irrelevant) databases/datasets; Data dictionaries and lots of documentation available; Learning curve is steep and data interpretation entails a strong background in pharmacology and basic science (i.e., "wet lab" work), down to the level of reagent(s), plate reader, staining agent(s), cell line(s), etc. | ||||||||||||
25 | Chemotext | http://chemotext.mml.unc.edu | Integrated graph database of drugs, protein targets, and diseases; contains observed and predicted assertions linking drugs, targets, and diseases | UNC School of Pharmacy (co-PI Tropsha) | O | 2 | All Pubmed Abstracts | Auto-curated literature based relationships between diseases, chemicals, genes/proteins | MESH terminology | 2, direct queries using Neo4j query languages available once given permission. Batch queries not directly supported, but feasible | Approximately once a year | 1 (Entities are high, relationships are based on statistical correlations) | Matching of naming conventions with other data systems | |||||||||||||
26 | Comparative Toxicogenomics Database (CTD) | http://ctdbase.org/ | Manually curated information about chemical-gene/protein interactions, chemical-disease, and gene-disease relationships integrated with functional and pathway data; correlate exposures with human health outcomes, to identify underlying potential molecular mechanisms, and to improve understanding about the exposome | MDI Biological Laboratory and NCSU | O | 2 | Human curation of PubMed articles | Manually curated literature based on relationships between diseases, chemicals, genes/proteins. Includes Pathways, Organisim, and Gene Ontology associations | Chemical: Mesh; Diseases: Mesh,OMIM; Genes: NCBI Gene Id; Organism: NCBI id; Pathways: KEGG, Reactome | 1, rest like query and batch query provided | Monthly | 1 (High; results of biocreative workshop give entity extraction in 90-100% accuracy and relationships in 80-90% accuracy) | Matching of naming conventions with other data systems | Overview: http://ehp.niehs.nih.gov/EHP174/ | ||||||||||||
27 | KEGG | http://www.genome.jp/kegg/pathway.html | Manually curated, integrated database consisting of sixteen separate databases categorized into systems information, genomic information, molecule/chemical information, and disease information | Kanehisa Laboratories | O | 2 | Human curation of biological pathway maps published in the scientific literature and laboratory notebooks | Manually curated literature based on relationships between molecules/chemicals, genes, biological systems, and diseases | KEGG Orthology (KO) groups (KEGG object identifiers) | |||||||||||||||||
28 | Reactome | http://www.reactome.org/ | Manually curated and peer-reviewed pathway database; aim is to translate the visual representations of biological pathways into a digital, computationally accessible format | Reactome | O | 2 | Human curation of biological pathway maps published in the scientific literature and laboratory notebooks | Manually curated literature based on proteins, complexes, reactions, pathways | Reactome, KEGG, NCBI, UniProt, ChEBI, PubMed, etc. | |||||||||||||||||
29 | ONTOLOGIES | |||||||||||||||||||||||||
30 | Exposure Ontology (ExO) | http://ctdbase.org/downloads/#exposures | Developed by CTD group; provides exposure context for CTD data. | NCSU (Carolyn Mattingly) | O | OWL | ||||||||||||||||||||
31 | Environmental conditions, treatments and exposures ontology (ECTO) | https://github.com/cmungall/environmental-conditions | Modular environmental conditions ontology. The purpose of this ontology is to create compositional classes that assemble existing OBO ontologies such as ExO, CHEBI and ENVO to make ready-made precomposed classes for use in describing: experimental treatments of plants and model organisms (e.g. modification of diet, lighting levels, temperature) exposures of humans or any other organisms to stressors through a variety of routes, for purposes of public health, environmental monitoring etc stimuli, natural and experimental any kind of environmental condition or change in condition that can be experienced by an organism or population of organisms on earth The scope is very general and can include for example plant treatment regimens, as well as human clinical exposures (although these may better be handled by a more specialized ontology) | Chris Mungall/LBNL | O | OWL | Uses ExO as an upper ontology, and provides template for composition with other domain-specific ontologies for chemical entities, anatomy, phenotype, etc. | |||||||||||||||||||
32 | Human Phenotype Ontology | http://human-phenotype-ontology.github.io | The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains approximately 11,000 terms (still growing) and over 115,000 annotations to hereditary diseases. The HPO also provides a large set of HPO annotations to approximately 4000 common diseases. | Peter Robinson (Jackson Lab) & Monarch Initiative | O | |||||||||||||||||||||
33 | ||||||||||||||||||||||||||
34 | *Permission Levels: O = open access; R = restricted access (permissions and approvals required). | |||||||||||||||||||||||||
35 | **Priority Ranking: 1 = essential; 2 = extremely important; 3 = important. | |||||||||||||||||||||||||
36 | †For geocoded data, list sensor locations(s). | |||||||||||||||||||||||||
37 | ‡1 = easy; 2 = acceptable; 3 = challenging. | |||||||||||||||||||||||||
38 | ‡‡1= high; 2 = adequate; 3 = poor. | |||||||||||||||||||||||||
39 | §CDWH EHR data are not open access; however, with requisite permissions and approvals, the data can be harmonized into an open-source data sharing and management platform such as iRODS or i2b2/Shrine. We have experience with this, e.g., the Carolinas Collaborative and Mid-South PCORI-funded Clinical Data Research Sharing Networks (CDRNs). | |||||||||||||||||||||||||
40 | ||||||||||||||||||||||||||
41 | ||||||||||||||||||||||||||
42 | ||||||||||||||||||||||||||
43 | ||||||||||||||||||||||||||
44 | ||||||||||||||||||||||||||
45 | ||||||||||||||||||||||||||
46 | ||||||||||||||||||||||||||
47 | ||||||||||||||||||||||||||
48 | ||||||||||||||||||||||||||
49 | ||||||||||||||||||||||||||
50 | ||||||||||||||||||||||||||
51 | ||||||||||||||||||||||||||
52 | ||||||||||||||||||||||||||
53 | ||||||||||||||||||||||||||
54 | ||||||||||||||||||||||||||
55 | ||||||||||||||||||||||||||
56 | ||||||||||||||||||||||||||
57 | ||||||||||||||||||||||||||
58 | ||||||||||||||||||||||||||
59 | ||||||||||||||||||||||||||
60 | ||||||||||||||||||||||||||
61 | ||||||||||||||||||||||||||
62 | ||||||||||||||||||||||||||
63 | ||||||||||||||||||||||||||
64 | ||||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||
66 | ||||||||||||||||||||||||||
67 | ||||||||||||||||||||||||||
68 | ||||||||||||||||||||||||||
69 | ||||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||
72 | ||||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||
76 | ||||||||||||||||||||||||||
77 | ||||||||||||||||||||||||||
78 | ||||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||
81 | ||||||||||||||||||||||||||
82 | ||||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||
84 | ||||||||||||||||||||||||||
85 | ||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||
100 |