VRS AnVIL: Connecting VCF to Clinical Evidence
Brian Walsh
Quinn Wai Wong
Introduction
https://github.com/ga4gh/va-spec
https://github.com/ga4gh/vrs
https://github.com/cancervariants/metakb
Introduction
Subject VCF/VUS
Genotype and Phenotype Search tools
VRS AnVIL Toolkit
is a
Precomputed indices
Genomic Knowledge
Evidence Mapper
meta_kb
is a
normalizes
hits
Cohort Building
Downstream
Workflows
Presentation Outline
VRS AnVIL Toolkit as a solution
Existing tools to annotate VCFs with evidence
Proof of concept with 1000 Genomes
Usage, discussion, & future work
Existing Tools: Annotating VCFs with Evidence
The GA4GH Variant Representation specification has helped standardize the exchange of variant data (vrs-python)
VRS enables fixed-length ID creation from a given genomic expression by means of a VRS object (vrs-python)
[Location]
[Prefix]
ga4gh:VA.
+
[Digest]
rRPCnh0XXjuePRGWerw6PhVXFYjhchwP
+
[State]
The VICC MetaKB is a harmonized data warehouse for clinical variant interpretations (MetaKB)
Solution: VRS AnVIL Toolkit
vrs_anvil_toolkit provides a CLI to retrieve clinical data from VCF file in a configurable fashion
vrs_anvil_toolkit combines variant translation and clinical interpretations into a single data collection workflow
vrs_anvil_toolkit combines variant translation and clinical interpretations into a single data collection workflow
nohup vrs_bulk annotate —scatter &
vrs_bulk ps
vrs_bulk annotate
vrs_anvil_toolkit combines variant translation and clinical interpretations into a single data collection workflow
vrs_anvil_toolkit combines variant translation and clinical interpretations into a single data collection workflow
Proof of Concept : 1000 Genomes
About the 1000 Genomes Dataset
Using public 1000 Genomes Project data on Terra, we can consolidate cohort-level stats and evidence.
Using 3202 samples from the 1000 Genomes Project on Terra, we can consolidate cohort-level stats and evidence
Patients had successful variant ID matches to the CIViC knowledge but not to Molecular Almanac.
The spread varies between the germline and somatic-labelled MetaKB variants.
Even for doing cohort-level aggregation, MetaKB evidence is still accessible in the processed results.
Most common variant: 2952/3202 = 92.2%
(19-43551574-T-C) (ga4gh:VA.SPP7r7F_Wb3XbNY8Fawk91yt1U03eIVV) (civic.eid:673)
The XRCC1 R399Q variant was correlated with increased response to platinum-based neoadjuvant chemotherapy in patients with cervical cancer. Tumor samples from 36 patients with Stage IB or IIA bulky (greater than 4 cm in size) cervical carcinomas were used in this study.
Therapeutic Evidence for Top 2 VRS IDs
vrs_id | study_id | type | strength | direction | predicate | therapeutic | tumor_type |
ga4gh:VA.SPP7r7F_Wb3XbNY8Fawk91yt1U03eIVV | civic.eid:673 | VariantTherapeuticResponseStudy | clinical cohort evidence | supports | predictsSensitivityTo | Carboplatin | Cervical Cancer |
ga4gh:VA.SPP7r7F_Wb3XbNY8Fawk91yt1U03eIVV | civic.eid:673 | VariantTherapeuticResponseStudy | clinical cohort evidence | supports | predictsSensitivityTo | Cisplatin | Cervical Cancer |
ga4gh:VA.ZZIGEC0okanDOaqbTEXEWuXNZTrz5qYz | civic.eid:1995 | VariantTherapeuticResponseStudy | clinical cohort evidence | supports | predictsResistanceTo | Erlotinib | Lung Non-small Cell Carcinoma |
ga4gh:VA.ZZIGEC0okanDOaqbTEXEWuXNZTrz5qYz | civic.eid:1995 | VariantTherapeuticResponseStudy | clinical cohort evidence | supports | predictsResistanceTo | Gefitinib | Lung Non-small Cell Carcinoma |
ga4gh:VA.ZZIGEC0okanDOaqbTEXEWuXNZTrz5qYz | civic.eid:2895 | VariantTherapeuticResponseStudy | clinical cohort evidence | supports | predictsResistanceTo | Cisplatin | Esophageal Cancer |
ga4gh:VA.ZZIGEC0okanDOaqbTEXEWuXNZTrz5qYz | civic.eid:2895 | VariantTherapeuticResponseStudy | clinical cohort evidence | supports | predictsResistanceTo | Fluorouracil | Esophageal Cancer |
Usage, Discussion, & Future Work
Usage
vrs-anvil (private)
pip install vrs_anvil_toolkit
Depending on your use case, it might be helpful to use vrs-python and MetaKB individually.
There’s seems to be a performance difference between Terra and other platforms
Throughput (vrs_bulk annotate)
Pytest (test_gnomad)
We want to continue to build out the toolkit’s functionality to support GREGoR and other dataset
Acknowledgements
Wagner Lab: Kori Kuzma
Ellrott Lab: Brian Walsh, Kyle Ellrott